Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

btrfs check --mode=lowmem Segmentation fault (version 5.14.2) #412

Closed
wangyugui opened this issue Oct 9, 2021 · 15 comments
Closed

btrfs check --mode=lowmem Segmentation fault (version 5.14.2) #412

wangyugui opened this issue Oct 9, 2021 · 15 comments
Labels
bug check Changes in btrfs check
Milestone

Comments

@wangyugui
Copy link

steps to reproduce:

$ make test-check-lowmem
then a core file is left under tests/fsck-tests/012-leaf-corruption/

$ file tests/fsck-tests/012-leaf-corruption/core.67317
tests/fsck-tests/012-leaf-corruption/core.67317: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from '/ssd/git/os/btrfs-progs/btrfs check --mode=lowmem ./good.img.restored', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: '/ssd/git/os/btrfs-progs/btrfs', platform: 'x86_64'

$ gdb /ssd/git/os/btrfs-progs/btrfs tests/fsck-tests/012-leaf-corruption/core.67317
Core was generated by `/ssd/git/os/btrfs-progs/btrfs check --mode=lowmem ./good.img.restored'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 btrfs_inode_size (s=0x642e6cd1, eb=0xc28fc0) at ./kernel-shared/ctree.h:1709
1709 BTRFS_SETGET_FUNCS(inode_size, struct btrfs_inode_item, size, 64);
(gdb) where
#0 btrfs_inode_size (s=0x642e6cd1, eb=0xc28fc0) at ./kernel-shared/ctree.h:1709
#1 check_inode_item (root=root@entry=0xa43e90, path=path@entry=0x7fff8e5f36a0) at check/mode-lowmem.c:2628
#2 0x0000000000458c61 in process_one_leaf (level=, nrefs=0x7fff8e5f35c0, path=0x7fff8e5f36a0,
root=0xa43e90) at check/mode-lowmem.c:2896
#3 walk_down_tree (check_all=0, nrefs=0x7fff8e5f35c0, level=, path=0x7fff8e5f36a0, root=0xa43e90)
at check/mode-lowmem.c:4953
#4 check_btrfs_root (root=root@entry=0xa43e90, check_all=check_all@entry=0) at check/mode-lowmem.c:5254
#5 0x000000000045b908 in check_fs_root (root=0xa43e90) at check/mode-lowmem.c:5288
#6 check_fs_roots_lowmem () at check/mode-lowmem.c:5449
#7 0x0000000000432301 in do_check_fs_roots (root_cache=root_cache@entry=0x7fff8e5f43d8) at check/main.c:3911
#8 0x000000000043ea7f in cmd_check (cmd=, argc=, argv=)
at check/main.c:10818
#9 0x000000000040e130 in cmd_execute (argv=0x7fff8e5f4550, argc=3, cmd=0x6dce60 <cmd_struct_check>)
at cmds/commands.h:125
#10 main (argc=3, argv=0x7fff8e5f4550) at btrfs.c:405
(gdb)

@kdave kdave added the bug label Oct 11, 2021
@wangyugui
Copy link
Author

this problem is still happen on pre-release 5.15-rc1(branch v5.15.x)

@adam900710
Copy link
Collaborator

Can not reproduce here.

I'm testing commit 330b86c

@adam900710
Copy link
Collaborator

And the result shows it's indeed running in lowmem mode, and everything is fine:
fsck-tests-results.txt

@wangyugui
Copy link
Author

wangyugui commented Oct 31, 2021

upload the core the file and elf file.

upload.tar.gz

v5.15.x branch, b40d2c7

@adam900710
Copy link
Collaborator

Mind to use valgrind or "make D=asan" build and provide the full output?

It looks like some kind of memory corruption thus it has some randomness related to the memory layout.

@adam900710
Copy link
Collaborator

BTW, for both modes I'm seeing a WARN_ON() triggered inside __free_extent().

But I don't think that's the direct cause of the crash.

@wangyugui
Copy link
Author

build on centos 7(make D=asan) & test on centos 7 => NOT happen
this is the fsck-tests-results.txt of 'make test-check-lowmem'
fsck-tests-results.zip

@adam900710
Copy link
Collaborator

One trick, if you only need to run one test, it can be done like this:

$ sudo TEST=012\* make test-check-lowmem

And if D=asan is not detecting the problem, you may want to go with valgrind.

I guess the problem happens for the --repair part, thus what you need is:

$ cp tests/fsck-tests/012/good.img.xz /tmp
$ unxz /tmp/good.img.xz
$ ./btrfs-image -r /tmp/goog.img /tmp/image.raw
$ xfs_io -f -c "pwrite 4206592 32" -c "pwrite 20905984 32" /tmp/image.raw
$ valgrind ./btrfs check --mode=lowmem --repair --force /tmp/image.raw

@wangyugui
Copy link
Author

wangyugui commented Oct 31, 2021

valgrind catch something
fsck-tests-results.txt

@wangyugui
Copy link
Author

valgrind catch almost same thing even without '--mode lowmem'
fsck-tests-results.txt

so this problem may happen without '--mode lowmem' too.

@adam900710
Copy link
Collaborator

Oh, I forgot to check the .lowmem_repairable beacon, and that test case doesn't support lowmem repair anyway.

So the repair is all done in original mode, you can verify that in the fsck-tests-results even for lowmem mode:

====== RUN CHECK valgrind /ssd/git/os/btrfs-progs/btrfs check --repair --force ./good.img.restored

No --mode=lowmem.

So it's a bug in the original mode repair code.

Then the pwrite part seems to be a known false alert:

==24563== Syscall param pwrite64(buf) points to uninitialised byte(s)

So no need to worry about that.

But the important part is the warning part:

==24563== Conditional jump or move depends on uninitialised value(s)
==24563==    at 0x4214F8: warning_trace (kerncompat.h:107)
==24563==    by 0x4214F8: __free_extent (extent-tree.c:2049)
==24563==    by 0x4251C6: run_delayed_tree_ref (extent-tree.c:3785)
==24563==    by 0x4251C6: run_one_delayed_ref (extent-tree.c:3805)

This means the WARN_ON() can be randomly triggered.

The possible uninitialized value seems to be owner_objectid, but I don't know why btrfs_add_delayed_tree_ref() is not warning.

BTW, does the D=asan output anything?

@wangyugui
Copy link
Author

D=asan report almost same thing for lowmen and no-lowmen.

fsck-tests-results-lowmem.txt
fsck-tests-results-no-lowmem.txt

@adam900710
Copy link
Collaborator

BTW, do you have the original segfault tests result?

@wangyugui
Copy link
Author

wangyugui commented Nov 1, 2021

@kdave kdave added the check Changes in btrfs check label Nov 1, 2021
@kdave kdave added this to the v5.15 milestone Nov 1, 2021
@kdave
Copy link
Owner

kdave commented Nov 1, 2021

Thanks for the report and tracking it down. Fixed in devel and will be in 5.15.

@kdave kdave closed this as completed Nov 1, 2021
kdave pushed a commit that referenced this issue Nov 1, 2021
…properly handled

[BUG]
When a special image (diverted from fsck/012) has its unused slots (slot
number >= nritems) with garbage, lowmem mode btrfs check can crash:

  (gdb) run check --mode=lowmem ~/downloads/good.img.restored
  Starting program: /home/adam/btrfs/btrfs-progs/btrfs check --mode=lowmem ~/downloads/good.img.restored
  ...
  ERROR: root 5 INODE[5044031582654955520] nlink(257228800) not equal to inode_refs(0)
  ERROR: root 5 INODE[5044031582654955520] nbytes 474624 not equal to extent_size 0

  Program received signal SIGSEGV, Segmentation fault.
  0x0000555555639b11 in btrfs_inode_size (eb=0x5555558a7540, s=0x642e6cd1) at ./kernel-shared/ctree.h:1703
  1703	BTRFS_SETGET_FUNCS(inode_size, struct btrfs_inode_item, size, 64);
  (gdb) bt
  #0  0x0000555555639b11 in btrfs_inode_size (eb=0x5555558a7540, s=0x642e6cd1) at ./kernel-shared/ctree.h:1703
  #1  0x0000555555641544 in check_inode_item (root=0x5555556c2290, path=0x7fffffffd960) at check/mode-lowmem.c:2628

[CAUSE]
At check_inode_item() we have path->slot[0] at 29, while the tree block
only has 26 items.

This happens because two reasons:

- btrfs_next_item() never reverts its slots
  Even if we failed to read next leaf.

- check_inode_item() doesn't inform the caller that a fatal error
  happened
  In check_inode_item(), if btrfs_next_item() failed, it goes to out
  label, which doesn't really set @err properly.

This means, when check_inode_item() fails at btrfs_next_item(), it will
increase path->slots[0], while it's already beyond current tree block
nritems.

When the slot increases furthermore, and if the unused item slots have
some garbage, we will get invalid btrfs_item_ptr() result, and causing
above segfault.

[FIX]
Fix the problems by two ways:

- Make btrfs_next_item() to revert its path->slots[0] on failure

- Properly detect fatal error from check_inode_item()

By this, we will no longer crash on the crafted image.

Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Issue: #412
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
kdave pushed a commit that referenced this issue Nov 1, 2021
…properly handled

[BUG]
When a special image (diverted from fsck/012) has its unused slots (slot
number >= nritems) with garbage, lowmem mode btrfs check can crash:

  (gdb) run check --mode=lowmem ~/downloads/good.img.restored
  Starting program: /home/adam/btrfs/btrfs-progs/btrfs check --mode=lowmem ~/downloads/good.img.restored
  ...
  ERROR: root 5 INODE[5044031582654955520] nlink(257228800) not equal to inode_refs(0)
  ERROR: root 5 INODE[5044031582654955520] nbytes 474624 not equal to extent_size 0

  Program received signal SIGSEGV, Segmentation fault.
  0x0000555555639b11 in btrfs_inode_size (eb=0x5555558a7540, s=0x642e6cd1) at ./kernel-shared/ctree.h:1703
  1703	BTRFS_SETGET_FUNCS(inode_size, struct btrfs_inode_item, size, 64);
  (gdb) bt
  #0  0x0000555555639b11 in btrfs_inode_size (eb=0x5555558a7540, s=0x642e6cd1) at ./kernel-shared/ctree.h:1703
  #1  0x0000555555641544 in check_inode_item (root=0x5555556c2290, path=0x7fffffffd960) at check/mode-lowmem.c:2628

[CAUSE]
At check_inode_item() we have path->slot[0] at 29, while the tree block
only has 26 items.

This happens because two reasons:

- btrfs_next_item() never reverts its slots
  Even if we failed to read next leaf.

- check_inode_item() doesn't inform the caller that a fatal error
  happened
  In check_inode_item(), if btrfs_next_item() failed, it goes to out
  label, which doesn't really set @err properly.

This means, when check_inode_item() fails at btrfs_next_item(), it will
increase path->slots[0], while it's already beyond current tree block
nritems.

When the slot increases furthermore, and if the unused item slots have
some garbage, we will get invalid btrfs_item_ptr() result, and causing
above segfault.

[FIX]
Fix the problems by two ways:

- Make btrfs_next_item() to revert its path->slots[0] on failure

- Properly detect fatal error from check_inode_item()

By this, we will no longer crash on the crafted image.

Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Issue: #412
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
kdave pushed a commit that referenced this issue Nov 4, 2021
…properly handled

[BUG]
When a special image (diverted from fsck/012) has its unused slots (slot
number >= nritems) with garbage, lowmem mode btrfs check can crash:

  (gdb) run check --mode=lowmem ~/downloads/good.img.restored
  Starting program: /home/adam/btrfs/btrfs-progs/btrfs check --mode=lowmem ~/downloads/good.img.restored
  ...
  ERROR: root 5 INODE[5044031582654955520] nlink(257228800) not equal to inode_refs(0)
  ERROR: root 5 INODE[5044031582654955520] nbytes 474624 not equal to extent_size 0

  Program received signal SIGSEGV, Segmentation fault.
  0x0000555555639b11 in btrfs_inode_size (eb=0x5555558a7540, s=0x642e6cd1) at ./kernel-shared/ctree.h:1703
  1703	BTRFS_SETGET_FUNCS(inode_size, struct btrfs_inode_item, size, 64);
  (gdb) bt
  #0  0x0000555555639b11 in btrfs_inode_size (eb=0x5555558a7540, s=0x642e6cd1) at ./kernel-shared/ctree.h:1703
  #1  0x0000555555641544 in check_inode_item (root=0x5555556c2290, path=0x7fffffffd960) at check/mode-lowmem.c:2628

[CAUSE]
At check_inode_item() we have path->slot[0] at 29, while the tree block
only has 26 items.

This happens because two reasons:

- btrfs_next_item() never reverts its slots
  Even if we failed to read next leaf.

- check_inode_item() doesn't inform the caller that a fatal error
  happened
  In check_inode_item(), if btrfs_next_item() failed, it goes to out
  label, which doesn't really set @err properly.

This means, when check_inode_item() fails at btrfs_next_item(), it will
increase path->slots[0], while it's already beyond current tree block
nritems.

When the slot increases furthermore, and if the unused item slots have
some garbage, we will get invalid btrfs_item_ptr() result, and causing
above segfault.

[FIX]
Fix the problems by two ways:

- Make btrfs_next_item() to revert its path->slots[0] on failure

- Properly detect fatal error from check_inode_item()

By this, we will no longer crash on the crafted image.

Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Issue: #412
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
lansuse pushed a commit to lansuse/btrfs-progs that referenced this issue Dec 1, 2021
…properly handled

[BUG]
When a special image (diverted from fsck/012) has its unused slots (slot
number >= nritems) with garbage, lowmem mode btrfs check can crash:

  (gdb) run check --mode=lowmem ~/downloads/good.img.restored
  Starting program: /home/adam/btrfs/btrfs-progs/btrfs check --mode=lowmem ~/downloads/good.img.restored
  ...
  ERROR: root 5 INODE[5044031582654955520] nlink(257228800) not equal to inode_refs(0)
  ERROR: root 5 INODE[5044031582654955520] nbytes 474624 not equal to extent_size 0

  Program received signal SIGSEGV, Segmentation fault.
  0x0000555555639b11 in btrfs_inode_size (eb=0x5555558a7540, s=0x642e6cd1) at ./kernel-shared/ctree.h:1703
  1703	BTRFS_SETGET_FUNCS(inode_size, struct btrfs_inode_item, size, 64);
  (gdb) bt
  #0  0x0000555555639b11 in btrfs_inode_size (eb=0x5555558a7540, s=0x642e6cd1) at ./kernel-shared/ctree.h:1703
  kdave#1  0x0000555555641544 in check_inode_item (root=0x5555556c2290, path=0x7fffffffd960) at check/mode-lowmem.c:2628

[CAUSE]
At check_inode_item() we have path->slot[0] at 29, while the tree block
only has 26 items.

This happens because two reasons:

- btrfs_next_item() never reverts its slots
  Even if we failed to read next leaf.

- check_inode_item() doesn't inform the caller that a fatal error
  happened
  In check_inode_item(), if btrfs_next_item() failed, it goes to out
  label, which doesn't really set @err properly.

This means, when check_inode_item() fails at btrfs_next_item(), it will
increase path->slots[0], while it's already beyond current tree block
nritems.

When the slot increases furthermore, and if the unused item slots have
some garbage, we will get invalid btrfs_item_ptr() result, and causing
above segfault.

[FIX]
Fix the problems by two ways:

- Make btrfs_next_item() to revert its path->slots[0] on failure

- Properly detect fatal error from check_inode_item()

By this, we will no longer crash on the crafted image.

Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Issue: kdave#412
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug check Changes in btrfs check
Projects
None yet
Development

No branches or pull requests

3 participants