You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After a pool becomes suspended due to losing too many disks, some files that were written just before the pool was suspended are unrecoverable. ZFS should know if the write completed successfully, and not discard the dirty data until it is written properly.
We expect PART of this problem is that zio_flush() sets the ZIO_FLAG_DONT_PROPAGATE flag, so errors are not sent to the parent ZIO. Even without that, we still see this problem. We are investigating further.
Describe how to reproduce the problem
We used zinject to FAULT more disks than the RAID-Z configuration can withstand. After removing the zinject handlers, and running zpool clear there are persistent checksum errors or completely unreadable files.
We were able to better reproduce this on real hardware, by using enclosure management tools to power off multiple disks from the pool at once causing it to become faulted.
Include any warning/errors/backtraces from the system logs
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
System information
Describe the problem you're observing
After a pool becomes suspended due to losing too many disks, some files that were written just before the pool was suspended are unrecoverable. ZFS should know if the write completed successfully, and not discard the dirty data until it is written properly.
We expect PART of this problem is that
zio_flush()
sets theZIO_FLAG_DONT_PROPAGATE
flag, so errors are not sent to the parent ZIO. Even without that, we still see this problem. We are investigating further.Describe how to reproduce the problem
We used
zinject
toFAULT
more disks than the RAID-Z configuration can withstand. After removing thezinject
handlers, and runningzpool clear
there are persistent checksum errors or completely unreadable files.We were able to better reproduce this on real hardware, by using enclosure management tools to power off multiple disks from the pool at once causing it to become faulted.
Include any warning/errors/backtraces from the system logs
then the pool was resumed once the HDDs powered back up with
zpool clear
The text was updated successfully, but these errors were encountered: