Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

btrfs-progs: doc: add a warning when converting to a profile with lower #945

Closed
wants to merge 23 commits into from

Conversation

adam900710
Copy link
Collaborator

duplication

[BUG]
There is a bug report that, a running btrfs with one of its device deleted using sysfs ('/sys/block//device/delete'), btrfs will still read write on that device.

Normally it's fine as long as all chunks can tolerate that removed device (e.g. all RAID1).

But the problem is when one is trying to lower the duplication by converting to a less-safe profile:

mkfs.btrfs -f -m raid1 -d raid1 /dev/sdd /dev/sde

mount /dev/sdd /mnt

echo 1 > /sys/block/sde/device/delete

btrfs balance start --force -mdup -dsingle /mnt

This will lead to the fs mounted RO, with the following error messages:

sd 6:0:0:0: [sde] Synchronizing SCSI cache
ata7.00: Entering standby power mode
btrfs: attempt to access beyond end of device
sde: rw=6145, sector=21696, nr_sectors = 32 limit=0
btrfs: attempt to access beyond end of device
sde: rw=6145, sector=21728, nr_sectors = 32 limit=0
btrfs: attempt to access beyond end of device
sde: rw=6145, sector=21760, nr_sectors = 32 limit=0
BTRFS error (device sdd): bdev /dev/sde errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
BTRFS error (device sdd): bdev /dev/sde errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
BTRFS error (device sdd): bdev /dev/sde errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
BTRFS error (device sdd): bdev /dev/sde errs: wr 3, rd 0, flush 1, corrupt 0, gen 0
btrfs: attempt to access beyond end of device
sde: rw=145409, sector=128, nr_sectors = 8 limit=0
BTRFS warning (device sdd): lost super block write due to IO error on /dev/sde (-5)
BTRFS error (device sdd): bdev /dev/sde errs: wr 4, rd 0, flush 1, corrupt 0, gen 0
btrfs: attempt to access beyond end of device
sde: rw=14337, sector=131072, nr_sectors = 8 limit=0
BTRFS warning (device sdd): lost super block write due to IO error on /dev/sde (-5)
BTRFS error (device sdd): bdev /dev/sde errs: wr 5, rd 0, flush 1, corrupt 0, gen 0
BTRFS error (device sdd): error writing primary super block to device 2
BTRFS info (device sdd): balance: start -dconvert=single -mconvert=dup -sconvert=dup
BTRFS info (device sdd): relocating block group 1372585984 flags data|raid1
BTRFS error (device sdd): bdev /dev/sde errs: wr 5, rd 0, flush 2, corrupt 0, gen 0
BTRFS warning (device sdd): chunk 2446327808 missing 1 devices, max tolerance is 0 for writable mount
BTRFS: error (device sdd) in write_all_supers:4044: errno=-5 IO failure (errors while submitting device barriers.)
BTRFS info (device sdd state E): forced readonly
BTRFS warning (device sdd state E): Skipping commit of aborted transaction.
BTRFS error (device sdd state EA): Transaction aborted (error -5)
BTRFS: error (device sdd state EA) in cleanup_transaction:2017: errno=-5 IO failure
BTRFS info (device sdd state EA): balance: ended with status: -5

[CAUSE]
Btrfs doesn't have any runtime device error handling, it fully rely on the extra copy provided.

For the sysfs block device removal, normally there is a device shutdown callback to the running fs, but unfortunately btrfs doesn't support this callback either.

Thus even with that device removed, btrfs will still access that removed device (both read and write, even if they will fail).

Normally for a full RAID1 btrfs, it will still be fine reading/write the fs as usual.
And the proper action is to replace the removed/missing/failing device with a newer one using btrfs device replace.

But when doing the convert, btrfs will allocate new metadata chunks on to the removed device (which will lose all writes).

And since the new metadata profile is DUP, which can not handle any missing device of that metadata chunk, finally it triggers the final protection at transaction commit time, and flips the fs RO, before it causing any real data loss.

[DOC ENHANCEMENT]
Add a warning to the convert filter about the dangerous doing convert to a less-safe profile when there is a known failing/removed device.

And mention the proper way to handle such failing/missing device.

The root fix is to introduce a failing/removed device detection for btrfs, but that will be a pretty big feature and will take quite some time before landing it upstream.

Reported-by: Jeff Siddall news@siddall.name
Link: https://lore.kernel.org/linux-btrfs/2cb1d81e-12a8-4fb1-b3fc-e7e83d31e059@siddall.name/

kdave and others added 23 commits January 22, 2025 20:34
Move the option documentation next to --rootdir, reword documentation.

Signed-off-by: David Sterba <dsterba@suse.com>
The compression support is optional, eg. also in 'btrfs-restore', so
print the support in help text.

usage: mkfs.btrfs [options] <dev> [<dev...>]

...
    --compress ALGO[:LEVEL]   compress files by algorithm and level, ALGO can be 'no' (default), zstd, lzo, zlib
                              Built-in:
                              - ZSTD: yes
                              - LZO: yes
                              - ZLIB: yes
...

Signed-off-by: David Sterba <dsterba@suse.com>
Put option parsing to its own helper to keep the switch/case smaller.

Signed-off-by: David Sterba <dsterba@suse.com>
The level is a small number, no need to pass it around as u64.

Signed-off-by: David Sterba <dsterba@suse.com>
The constants will be used in main() to validate command line options.

Signed-off-by: David Sterba <dsterba@suse.com>
Report invalid compression specification while parsing the options. Now
an ivalid level won't be silently accepted and capped when processing
the files. Other checks regarding conditional support of LZO and ZSTD
are left in place.

Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It does not make sense to pass only the compression option when there
are no files being added by --rootdir.

Signed-off-by: David Sterba <dsterba@suse.com>
  $ mkfs.btrfs --compress zlib:9 --rootdir Documentation img
  ...
  Rootdir from:       /path/Documentation
    Compress:         zlib:9
    Shrink:           no
  ...

Signed-off-by: David Sterba <dsterba@suse.com>
There are two configure options for compression so group them and update
the description so it mentions all commands that use compression.

Signed-off-by: David Sterba <dsterba@suse.com>
For completeness and parity with other tools add the option --version
and mention --help.

Signed-off-by: David Sterba <dsterba@suse.com>
For completeness and parity with other tools add the option --version
and mention --help.

Signed-off-by: David Sterba <dsterba@suse.com>
For completeness and parity with other tools add the option --version
and mention --help.

Signed-off-by: David Sterba <dsterba@suse.com>
Enhance information in the help text where some interesting information
was not missing and would require looking up the documentation.

Signed-off-by: David Sterba <dsterba@suse.com>
The new option for experimental featues was added in 6.12, mention it
where the DEBUG option was previously used for the same purpose.

[ci skip]

Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a bug report that read-only scrub on a read-write fs still
causes writes into the fs, and that will be caught if there is a
read-only block device among the storage stack.

This will cause a kernel warning on failed transaction commit:

  BTRFS info (device dm-3): first mount of filesystem e18f0c40-88de-413f-9d7e-dcc8136ad6dd
  BTRFS info (device dm-3): using crc32c (crc32c-intel) checksum algorithm
  BTRFS info (device dm-3): using free-space-tree
  BTRFS info (device dm-3): scrub: started on devid 1
  Trying to write to read-only block-device md127
  btrfs_dev_stat_inc_and_print: 362 callbacks suppressed
  BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
  BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
  BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
  BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 4, rd 0, flush 0, corrupt 0, gen 0
  BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 5, rd 0, flush 0, corrupt 0, gen 0
  BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 6, rd 0, flush 0, corrupt 0, gen 0
  BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 7, rd 0, flush 0, corrupt 0, gen 0
  BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 8, rd 0, flush 0, corrupt 0, gen 0
  BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 9, rd 0, flush 0, corrupt 0, gen 0
  BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 10, rd 0, flush 0, corrupt 0, gen 0
  BTRFS: error (device dm-3) in btrfs_commit_transaction:2523: errno=-5 IO failure (Error while writing out transaction)
  BTRFS info (device dm-3 state E): forced readonly
  BTRFS warning (device dm-3 state E): Skipping commit of aborted transaction.
  BTRFS error (device dm-3 state EA): Transaction aborted (error -5)
  BTRFS: error (device dm-3 state EA) in cleanup_transaction:2017: errno=-5 IO failure
  BTRFS warning (device dm-3 state EA): failed setting block group ro: -5
  BTRFS info (device dm-3 state EA): scrub: not finished on devid 1 with status: -5

[CAUSE]
The root cause is inside btrfs_inc_block_group_ro(), where we need to
hold a transaction handle, to prevent the transaction to be committed,
until we hold ro_block_group_mutex.

This will cause an empty transaction by itself, thus even if we can mark
the block group read-only without any extra workload, we still need to
commit the new and empty transaction.

Unfortunately this means RO scrub on RW filesystem will always cause the
fs to be updated.

[FIX]
The best fix is to make btrfs to avoid empty commit transaction, but
even with that done, read-only scrub on rw mount can still cause real
metadata updates (e.g. allocate new chunks and update device error
statistics).

It will be very complex to make read-only scrub to be fully read-only
on a read-write btrfs.

Thankfully read-only scrub on read-write mount with read-only device in
the storage stack is pretty rare, thus a documentation update should be
enough.

Issue: kdave#934
Pull-request: kdave#935
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If qgroups are not enabled the 'clear-stale' command unconditionally
calls sync on the whole filesystem, which is unnecessary and can slow
down the system. Do a check first if qgroups are enabled.

Issue: kdave#942
Signed-off-by: David Sterba <dsterba@suse.com>
There's a report that handling of --help in connection with other option
is confusing in some cases:

Various sub-commands to btrfs claim --help is an unrecognized option if
there are any other options on the CLI.

Examples of --help being unrecognized.

  $ btrfs filesystem defragment -v --help
  btrfs filesystem defragment: unrecognized option '--help'
  Try 'btrfs filesystem defragment --help' for more information

  $ btrfs balance start -v --help
  $ btrfs balabtrfs balance start -v --help
  btrfs balance start: unrecognized option '--help'
  Try 'btrfs balance start --help' for more information

Alternatively, some sub-commands support --help even if there are extra
options on the CLI:

  $ btrfs filesystem df -v --help
  usage: btrfs filesystem df [options] <path>

      Show space usage information for a mount point

      -b|--raw                  raw numbers in bytes
      -h|--human-readable       human friendly numbers, base 1024 (default)
      -H                        human friendly numbers, base 1000
      --iec                     use 1024 as a base (KiB, MiB, GiB, TiB)
      --si                      use 1000 as a base (kB, MB, GB, TB)
      -k|--kbytes               show sizes in KiB, or kB with --si
      -m|--mbytes               show sizes in MiB, or MB with --si
      -g|--gbytes               show sizes in GiB, or GB with --si
      -t|--tbytes               show sizes in TiB, or TB with --si

      Global options:
      --format TYPE             where TYPE is: text

Update clean_args_no_options() to detect unrecognized options properly
and do not show full help text (which usued to be done in the past but
is confusing and not pointing to the problem).

There's also a mixup of global verbosity options and command-specific
verbosity options (now deprecated), so in some commands they are
recognized.

Also handle the --help option for commands without options so the
following works:

  $ btrfs fi df -v --help /
  btrfs filesystem df: invalid option 'v'
  Try 'btrfs filesystem df --help' for more information

  $ btrfs fi df -H --help /
  usage: btrfs filesystem df [options] <path>

      Show space usage information for a mount point

      -b|--raw                  raw numbers in byte
  ...

Issue: kdave#889
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
By mistake in the original commit 992fd23 ("btrfs-progs: add
nodiscard option to device add") the --nodiscard could take an optional
parameter but this was not intended. This is also not documented so
there's little chance somebody would actually use it like that.

Signed-off-by: David Sterba <dsterba@suse.com>
duplication

[BUG]
There is a bug report that, a running btrfs with one of its device
deleted using sysfs ('/sys/block/<dev>/device/delete'), btrfs will still
read write on that device.

Normally it's fine as long as all chunks can tolerate that removed
device (e.g. all RAID1).

But the problem is when one is trying to lower the duplication by
converting to a less-safe profile:

  # mkfs.btrfs -f -m raid1 -d raid1 /dev/sdd /dev/sde
  # mount /dev/sdd /mnt
  # echo 1 > /sys/block/sde/device/delete
  # btrfs balance start --force -mdup -dsingle /mnt

This will lead to the fs mounted RO, with the following error messages:

 sd 6:0:0:0: [sde] Synchronizing SCSI cache
 ata7.00: Entering standby power mode
 btrfs: attempt to access beyond end of device
 sde: rw=6145, sector=21696, nr_sectors = 32 limit=0
 btrfs: attempt to access beyond end of device
 sde: rw=6145, sector=21728, nr_sectors = 32 limit=0
 btrfs: attempt to access beyond end of device
 sde: rw=6145, sector=21760, nr_sectors = 32 limit=0
 BTRFS error (device sdd): bdev /dev/sde errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
 BTRFS error (device sdd): bdev /dev/sde errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
 BTRFS error (device sdd): bdev /dev/sde errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
 BTRFS error (device sdd): bdev /dev/sde errs: wr 3, rd 0, flush 1, corrupt 0, gen 0
 btrfs: attempt to access beyond end of device
 sde: rw=145409, sector=128, nr_sectors = 8 limit=0
 BTRFS warning (device sdd): lost super block write due to IO error on /dev/sde (-5)
 BTRFS error (device sdd): bdev /dev/sde errs: wr 4, rd 0, flush 1, corrupt 0, gen 0
 btrfs: attempt to access beyond end of device
 sde: rw=14337, sector=131072, nr_sectors = 8 limit=0
 BTRFS warning (device sdd): lost super block write due to IO error on /dev/sde (-5)
 BTRFS error (device sdd): bdev /dev/sde errs: wr 5, rd 0, flush 1, corrupt 0, gen 0
 BTRFS error (device sdd): error writing primary super block to device 2
 BTRFS info (device sdd): balance: start -dconvert=single -mconvert=dup -sconvert=dup
 BTRFS info (device sdd): relocating block group 1372585984 flags data|raid1
 BTRFS error (device sdd): bdev /dev/sde errs: wr 5, rd 0, flush 2, corrupt 0, gen 0
 BTRFS warning (device sdd): chunk 2446327808 missing 1 devices, max tolerance is 0 for writable mount
 BTRFS: error (device sdd) in write_all_supers:4044: errno=-5 IO failure (errors while submitting device barriers.)
 BTRFS info (device sdd state E): forced readonly
 BTRFS warning (device sdd state E): Skipping commit of aborted transaction.
 BTRFS error (device sdd state EA): Transaction aborted (error -5)
 BTRFS: error (device sdd state EA) in cleanup_transaction:2017: errno=-5 IO failure
 BTRFS info (device sdd state EA): balance: ended with status: -5

[CAUSE]
Btrfs doesn't have any runtime device error handling, it fully rely on
the extra copy provided.

For the sysfs block device removal, normally there is a device shutdown
callback to the running fs, but unfortunately btrfs doesn't support this
callback either.

Thus even with that device removed, btrfs will still access that
removed device (both read and write, even if they will fail).

Normally for a full RAID1 btrfs, it will still be fine reading/write the
fs as usual.
And the proper action is to replace the removed/missing/failing device
with a newer one using `btrfs device replace`.

But when doing the convert, btrfs will allocate new metadata chunks on
to the removed device (which will lose all writes).

And since the new metadata profile is DUP, which can not handle any
missing device of that metadata chunk, finally it triggers the final
protection at transaction commit time, and flips the fs RO, before it
causing any real data loss.

[DOC ENHANCEMENT]
Add a warning to the `convert` filter about the dangerous doing convert
to a less-safe profile when there is a known failing/removed device.

And mention the proper way to handle such failing/missing device.

The root fix is to introduce a failing/removed device detection for
btrfs, but that will be a pretty big feature and will take quite some
time before landing it upstream.

Reported-by: Jeff Siddall <news@siddall.name>
Link: https://lore.kernel.org/linux-btrfs/2cb1d81e-12a8-4fb1-b3fc-e7e83d31e059@siddall.name/
Signed-off-by: Qu Wenruo <wqu@suse.com>
@adam900710 adam900710 added the docs Changes in documentation or help text label Feb 3, 2025
@kdave kdave force-pushed the devel branch 2 times, most recently from 24ab173 to 800d7b0 Compare February 11, 2025 19:43
@kdave kdave added this to the v6.13 milestone Feb 12, 2025
@kdave kdave self-assigned this Feb 12, 2025
@kdave
Copy link
Owner

kdave commented Feb 12, 2025

Merged to devel, thanks.

@kdave kdave closed this Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Changes in documentation or help text
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants