-
Notifications
You must be signed in to change notification settings - Fork 538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kernel panic: list_del corruption. next->prev should be A, but was B #532
Comments
not sure, shouldn't all calls that meddle with the queue be locked?
is that intended behaviour? |
hmmm digging into history it looks like the lock was never intended to lock the list, instead it was intended to lock the read stats if multiple consumers are present. hence its only applied to the reading. there is no protection of the buffer list in the design, but linux's list is not threadsafe. that sounds like a possible root-cause for the behaviour above? i'm leaning a bit out of a window here. after i applied a mutex lock to the dequeueing as well, so far i have not yet seen kernel panic 🤞 . but to positively test it i would need to fix all possible concurrency issues.... 😬 |
just fyi: this is the change i applied here case V4L2_BUF_TYPE_VIDEO_OUTPUT:
+ spin_lock_bh(&dev->lock);
b = list_entry(dev->outbufs_list.prev, struct v4l2l_buffer,
list_head);
list_move_tail(&b->list_head, &dev->outbufs_list);
+ spin_unlock_bh(&dev->lock);
dprintkrw("output DQBUF index: %d\n", b->buffer.index);
unset_flags(b);
*buf = b->buffer;
buf->type = V4L2_BUF_TYPE_VIDEO_OUTPUT; but i think we should maybe spend a second mutex lock to clearly distinguish locking the stats and locking the list. |
apparently this fixes the kernel panic by simply freezing the output 🤦 ill try further to get it working properly |
nope, turns out this fix should be fine, output is down to a single frame not because of this change, but because of some change in v0.12.7...HEAD. |
thanks for investigating. i agree that re-using the stats mutex just because it's already there, is not a good idea though... |
Environment
v4l2loopback
version:0.12.7-321-gfb410fc
Linux ci5 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux
Debian GNU/Linux 11 (bullseye)
note: we tried several different versions, debian package, 0.12.7 release and the above commit.
Step 3: Describe the problem:
I have not yet narrowed down to an easily reproducible setup, but i can explain what we try to do. We have a camera based android app that we try to end-to-end test. For that purpose we run an android emulator and use
v4l2loopback
andgstreamer
to make the emulator recognize/dev/video0
as camera input. Then we use pytest and selenium to feed an image into the loopback device and check whether the app correctly reacts on the application level.These end-to-end tests run in our CI environment. We use gitlab-runner on top of kubernetes and funnel the devices through to the containers using generic-device-plugin. This way each container sees a device
/dev/video0
, but they are mapped onto different virtual devices in the underlying host system. The driver is installed on the host system directly.Now the problem is that from time-to-time (over weeks) we got a kernel panic from the driver resulting in the host server to reboot. I have now a setup where i can reproduce this in roughly 20mins intervals but I'm not sure how to proceed to get more useful information out.
Observed Results:
kernel panic on the underlying system, causing reboot of the kubernetes host.
Expected Results:
no kernel panic
Relevant Code:
ill try to narrow it down further, currently its a quite large project with a lot of moving parts (its an e2e test after all 😉). in case i can create a minimal setup ill post it here.
The text was updated successfully, but these errors were encountered: