-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pool with ashift 12 on luks2 devices with sector size 4k causes repeated io errors #14533
Comments
#13431 describes an issue that might be the same or related (it involves likely newer versions of cryptsetup, which will use luks2). It shows similar zio errors. #13362 also involves luks, and has similar zio errors, but is observing other symptoms (io hangs, though perhaps this is from different operations causing the io). |
Can you share what |
Here's the `zdb -C tank`
In the details, one can see the replacement of Here are the details for
|
From the Taking
Note that while |
I did notice that, yes. While trying to reproduce this, I discovered I can convince ZFS without much work to try replacing a 512n device with a 4kn device on an ashift 9 vdev, which, uh, Does Not End Well At All. But that doesn't seem to be what happened to you, here. I'm now wondering if somehow it stashed somethings on the vdev not 4k aligned and because it was a 512n device it went fine, but now trying to 1:1 mirror is going bonkers. ...o-oh. I had a bad idea, actually. I wonder if the partition isn't 4k aligned, and in trying to replicate the partition table it's resulting in non-4k aligned accesses...let me go read those IO errors you pasted again. e: well, LUKS, so not exactly a partition, but like, the leading offset...anyway. edit 2: are you seeing any errors not from ZFS in your syslog from the disk itself? |
Watching Here's another set: (detached z14.3 and re-luksFormatted it in luks2 to get this output)
so: no errors from the device, sd, etc about unaligned writes (or any other error of any kind). And the offsets listed in the zio messages are aligned to 4096, and all the sizes logged are also multiples of 4096. As far as offsets in the luks data, the The All of the luks formatting is done on the entire disk (iow, running |
An interesting data point would be to see if the first 100 or so errors you get when doing the replace with a LUKS2 header are the same every time, as that might tell us more about whether it's deterministic or something very strange... |
@jmesmon thank you for this thread, I really was confused if I have problem with XHCI, disk, zfs, LUKS or all together. It seems tho that LUKS device sector size doesn't matter, I use 4K physical/logical sector size disks so my LUKS device is 4K sector size event with --type luks1
I use -o ashift=12, but the math is simple:
The problem, when I use LUKS2 header, occurs when constantly writing to the device with max speed (ie. copy large set of data). It's funny that it is less probable to occur if writing with lower speed, but eventually it is happening also. I really have no idea if this is a problem with zfs of LUKS. Any ideas? |
How full is your pool? I hit this issue a good while back and thought it might be my SSD (WD SN850) randomly disconnecting since I could find some reports about that on the WD forums. Though it seemed to be related to how full the pool/drive was, with enough free space I couldn't trigger it. Switching to another (and larger) drive solved the issue, until now that I've filled it too. The best way for me to trigger it also hasn't been writing at full (sequential) speed but decompressing and compiling chromium, which is ~906000 files over 19GB. Currently on a 3200GiB partition it seems like the breakpoint when it starts occuring is somewhere around 90% allocated, and it came to mind that maybe it's here that zfs changes allocation method? Slightly over 200GB is reserved for zvols though, so the free space for filesystems is ~100GB. Unclear if this affects it.
The drive itself is formated to 4K sectors so misalignment shouldn't be possible (?)
|
Almost empty, when triggering the bug. It's a new system (brand new disks). I admit that it's a bit unusual - Gentoo arm64 /w Asahi kernel @ Mac Mini M1 ;) But otherwise it is rock stable, zfs-2.1.13. Also I don't see any problems with drives in dmesg. Currently, with LUKS1 header I'm seeing 4TB usage: couple of zvol's and 10 697 081 files in fs - no zio errors whatsoever. |
Hitting the recent bug 15533 I saw the same zio error=5 type=2 as in this bug, so I started to search for this issue to check similarities. While searching I happened upon this pull request: The identified triggers there interestingly are:
which seem to match the identified triggers in this issue too. @robn I can't find any new PR, but I'd be interested in testing if what you got fixes this issue too. |
I should be posting a significant rework of Its not totally clear to me that this is a result of misaligned aggregation, but you might try drastically lowering (more in #15533 (comment)). |
PR with possible fix from robn: #15588 (linking for my reference) |
FYI, 2.2.4 just shipped, with #15588 and followup patches included. If you are still having this problem, you might try setting |
on 6.8.10-asahi nixos, zfs 2.2.4, macbook air m2, zfs_vdev_disk_classic=0 and zfs_vdev_disk_classic=1 both result in several hundred zio error=5 type=2 with a luks2 header while trying to install. LUKS1 results in no errors. fyi @robn |
Please see here for a debugging patch that I hope will reveal more info about what's going on: #15646 (comment) (if possible, I would prefer to keep discussion going in #15646, so its all in one place). |
It seems out our notion of "properly" aligned IO was incomplete. In particular, dm-crypt does its own splitting, and assumes that a logical block will never cross an order-0 page boundary (ie, the physical page size, not compound size). This effectively means that it needs to be possible to split a BIO at any page or block size boundary and have it work correctly. This updates the alignment check function to enforce these rules (to the extent possible). Our response to misaligned data is to make some new allocation that is properly aligned, and copy the data into it. It turns out that linearising (via abd_borrow_buf()) is not enough, because we allocate eg 4K blocks from a general purpose slab, and so may receive (or already have) a 4K block that crosses pages. So instead, we allocate a new ABD, which is guaranteed to be aligned properly to block sizes, and then copy everything into it, and back out on the way back. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16687 #16631 #15646 #15533 #14533
It seems out our notion of "properly" aligned IO was incomplete. In particular, dm-crypt does its own splitting, and assumes that a logical block will never cross an order-0 page boundary (ie, the physical page size, not compound size). This effectively means that it needs to be possible to split a BIO at any page or block size boundary and have it work correctly. This updates the alignment check function to enforce these rules (to the extent possible). Our response to misaligned data is to make some new allocation that is properly aligned, and copy the data into it. It turns out that linearising (via abd_borrow_buf()) is not enough, because we allocate eg 4K blocks from a general purpose slab, and so may receive (or already have) a 4K block that crosses pages. So instead, we allocate a new ABD, which is guaranteed to be aligned properly to block sizes, and then copy everything into it, and back out on the way back. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes openzfs#16687 openzfs#16631 openzfs#15646 openzfs#15533 openzfs#14533
It seems out our notion of "properly" aligned IO was incomplete. In particular, dm-crypt does its own splitting, and assumes that a logical block will never cross an order-0 page boundary (ie, the physical page size, not compound size). This effectively means that it needs to be possible to split a BIO at any page or block size boundary and have it work correctly. This updates the alignment check function to enforce these rules (to the extent possible). Our response to misaligned data is to make some new allocation that is properly aligned, and copy the data into it. It turns out that linearising (via abd_borrow_buf()) is not enough, because we allocate eg 4K blocks from a general purpose slab, and so may receive (or already have) a 4K block that crosses pages. So instead, we allocate a new ABD, which is guaranteed to be aligned properly to block sizes, and then copy everything into it, and back out on the way back. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes openzfs#16687 openzfs#16631 openzfs#15646 openzfs#15533 openzfs#14533 (cherry picked from commit 63bafe6)
It seems out our notion of "properly" aligned IO was incomplete. In particular, dm-crypt does its own splitting, and assumes that a logical block will never cross an order-0 page boundary (ie, the physical page size, not compound size). This effectively means that it needs to be possible to split a BIO at any page or block size boundary and have it work correctly. This updates the alignment check function to enforce these rules (to the extent possible). Our response to misaligned data is to make some new allocation that is properly aligned, and copy the data into it. It turns out that linearising (via abd_borrow_buf()) is not enough, because we allocate eg 4K blocks from a general purpose slab, and so may receive (or already have) a 4K block that crosses pages. So instead, we allocate a new ABD, which is guaranteed to be aligned properly to block sizes, and then copy everything into it, and back out on the way back. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes openzfs#16687 openzfs#16631 openzfs#15646 openzfs#15533 openzfs#14533
It seems out our notion of "properly" aligned IO was incomplete. In particular, dm-crypt does its own splitting, and assumes that a logical block will never cross an order-0 page boundary (ie, the physical page size, not compound size). This effectively means that it needs to be possible to split a BIO at any page or block size boundary and have it work correctly. This updates the alignment check function to enforce these rules (to the extent possible). Our response to misaligned data is to make some new allocation that is properly aligned, and copy the data into it. It turns out that linearising (via abd_borrow_buf()) is not enough, because we allocate eg 4K blocks from a general purpose slab, and so may receive (or already have) a 4K block that crosses pages. So instead, we allocate a new ABD, which is guaranteed to be aligned properly to block sizes, and then copy everything into it, and back out on the way back. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes openzfs#16687 openzfs#16631 openzfs#15646 openzfs#15533 openzfs#14533 (cherry picked from commit 63bafe6)
It seems out our notion of "properly" aligned IO was incomplete. In particular, dm-crypt does its own splitting, and assumes that a logical block will never cross an order-0 page boundary (ie, the physical page size, not compound size). This effectively means that it needs to be possible to split a BIO at any page or block size boundary and have it work correctly. This updates the alignment check function to enforce these rules (to the extent possible). Our response to misaligned data is to make some new allocation that is properly aligned, and copy the data into it. It turns out that linearising (via abd_borrow_buf()) is not enough, because we allocate eg 4K blocks from a general purpose slab, and so may receive (or already have) a 4K block that crosses pages. So instead, we allocate a new ABD, which is guaranteed to be aligned properly to block sizes, and then copy everything into it, and back out on the way back. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes openzfs#16687 openzfs#16631 openzfs#15646 openzfs#15533 openzfs#14533
System information
Describe the problem you're observing
With an existing pool with ashift 12:
I create a new vdev with
cryptsetup luksFormat /dev/sdw
, with cryptsetup version 2.6.1This results in a luks (luks2) device (used as a vdev) with a sector size of 4096. Note that
/dev/sdw
is the underlying device (a 14 TB hard drive), and/dev/mapper/z14.3
is the cryptsetup device usingsdw
.I then add it to my existing pool with
zpool replace tank z4.2 z14.3
. Eventually (before the replace/resliver completes), zio reports errors and the vdev is considered failedNext, after detaching the vdev from the pool and
cryptsetup close
, usecryptsetup luksFormat --type luks1
instead (To force the use of luks1 instead of a luks2 header).This results in a vdev with 512 byte sectors:
With this vdev (luks1, 512B sectors), no zio errors are observed and the
zpool replace
completes successfully.The text was updated successfully, but these errors were encountered: