Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zpool add allows creating a zpool with a mix of block sizes so that zpool remove then fails with "all top-level vdevs must have the same sector size" #15965

Open
loop-evgeny opened this issue Mar 5, 2024 · 3 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@loop-evgeny
Copy link

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 22.04
Kernel Version 5.15.0-92-generic
Architecture x64
OpenZFS Version zfs-2.1.5-1ubuntu6~22.04.2 zfs-kmod-2.1.5-1ubuntu6~22.04.1

Describe the problem you're observing

zpool add allows creating a mix of block sizes (ashift values) in a zpool, so that it becomes impossible to remove vdevs from it. This is quite a "gotcha"!

Yes, it is a documented limitation of zpool remove, but it's still very user-surprising behavior. For one thing, even if the user does read the whole man page before using zpool remove (which, of course, everyone does) it's already too late to fix the problem by then! For another, the user may not even realize they're adding devices with different ashifts.

Describe how to reproduce the problem

zpool create tank /dev/sdc # This uses the default ashift of 9 for sdc
zpool set ashift=11 tank

zpool add tank /dev/sdd # This now uses an ashift of 11 without the user having any idea!
zpool remove tank /dev/sdd

cannot remove /dev/sdd: invalid config; all top-level vdevs must have the same sector size and not be raidz.

Gotcha! Imagine that the first 2 commands were run by a previous sysadmin and I had no idea about it. Now, I innocently add a disk - maybe not even knowing about the concept of "ashift" at all - and would never expect that I would be unable to remove that disk!

(On the real production system where I ran into this I have no idea whether anyone explicitly ran zpool set ashift or how it came to pass that the existing disks had a different ashift setting than the new disk, but it happened somehow. The above steps are just an easy way to repro it.)

Even a person who knows about block sizes would probably not expect that the block size of one disk would prevent another disk from being removed. I'm sure there is some implementation reason for this, but to a user, it's completely unexpected behavior!

Obviously the best fix would be to allow zpool remove to work for a mix of block sizes, but if that's not possible then zpool add should not allow such a mix to be created so easily. I'm not sure if there is a use case for allowing it at all, but if so, it should require some extra "I understand I will be unable to remove disks after this" kind of flag from the user.

@loop-evgeny loop-evgeny added the Type: Defect Incorrect behavior (e.g. crash, hang) label Mar 5, 2024
@AllKind
Copy link
Contributor

AllKind commented Mar 5, 2024

This is not the first issue for that matter. And I personally totally agree.
It took a while, but luckily a Pull Request is in the works: #15509

@clhedrick
Copy link

I assumed that if I did an add the new vdev would automatically take on the ashift of the pool. Do I need to set an explicit ashift for every add?

@AllKind
Copy link
Contributor

AllKind commented Mar 9, 2024

#14312 (comment)

From zpoolprops(7):

ashift=ashift
         Pool sector size exponent, to the power of 2 (internally referred to as ashift).  Values from 9 to 16, inclusive, are valid; also, the value 0 (the default) means to auto-detect using the kernel's block layer and a
         ZFS internal exception list.  I/O operations will be aligned to the specified size boundaries.  Additionally, the minimum (disk) write size will be set to the specified size, so this represents a space vs. perfor‐
         mance trade-off.  For optimal performance, the pool sector size should be greater than or equal to the sector size of the underlying disks.  The typical case for setting this property is when performance is important
         and the underlying disks use 4KiB sectors but report 512B sectors to the OS (for compatibility reasons); in that case, set ashift=12 (which is 1<<12 = 4096).  When set, this property is used as the default hint value
         in subsequent vdev operations (add, attach and replace).  Changing this value will not modify any existing vdev, not even on disk replacement; however it can be used, for instance, to replace a dying 512B sectors
         disk with a newer 4KiB sectors device: this will probably result in bad performance but at the same time could prevent loss of data.

So if I understand it correctly, if the default value of '0' is set (pool wise), the value is set to what the disk reports. If explicitly set, the manually chosen value is used when adding a disk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

3 participants