-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ztest failed due to assertion while trying to acquire mutex in zio_add_child #11957
Comments
Additional validation of this theory can be seen in this AddressSanitizer output from a ztest run:
The parent IO being referenced in the read request from |
OpenZFS compiled with ASan and UBSan. I'm running
in an infinite loop. Not a single reproduction. As a matter of fact Would you share more details on how you managed to hit that issue? |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
System information
Describe the problem you're observing
A
ztest
run fails with an assertion error when attempting to obtain a hold on an invalid mutex. The panic is occurring inzio_add_child()
when we attempt to obtain theio_lock
for the "parent" io structure that is being passed in (pio
). The error code (EINVAL
) implies that the lock is not initialized. In this context (thevdev_rebuild_thread
) the parent IO is coming fromspa_txg_zio[]
:Note that this is an array for IO structure pointers. Each element of the array is associated with a transaction group, and the IO structure is freed once that group has synced to disk. The one chosen above corresponds to the "current"
txg
. It is obtained above within the context of an open transaction:Now note that this transaction (
tx
) is committed prior to the read request:It therefor seems possible (if unlikely) that the
txg
associated with the tx could "sync" before we issue the read. If this happens, the IO structure in thespa_txg_zio[]
array will be freed, and theio_lock
field "cleared", resulting in this error.Describe how to reproduce the problem
This problem has only been seen once with
ztest
to my knowledge.Include any warning/errors/backtraces from the system logs
ztest.out
ztest.gdb
The text was updated successfully, but these errors were encountered: