Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --enable-asan and --enable-ubsan switches #12928

Merged
merged 1 commit into from
Feb 3, 2022
Merged

Add --enable-asan and --enable-ubsan switches #12928

merged 1 commit into from
Feb 3, 2022

Conversation

szubersk
Copy link
Contributor

@szubersk szubersk commented Jan 1, 2022

Motivation and Context

Description

configure now accepts --enable-asan and --enable-ubsan switches
which results in passing -fsanitize=address
and -fsanitize=undefined, respectively, to the compiler. Those
flags are enabled in GitHub workflows for ZTS and zloop. Errors
reported by both instrumentations are corrected, except for:

  • Memory leak reporting is (temporarily) suppressed. The cost of
    fixing them is relatively high compared to the gains.

  • Checksum computing functions in module/zcommon/zfs_fletcher*
    have UBSan errors suppressed. It is completely impractical
    to enforce 64-byte payload alignment there due to performance
    impact.

  • There's no ASan heap poisoning in module/zstd/lib/zstd.c. A custom
    memory allocator is used there rendering that measure
    unfeasible.

  • Memory leaks detection has to be suppressed for cmd/zvol_id.
    zvol_id is run by udev with the help of ptrace(2). Tracing is
    incompatible with memory leaks detection.

Close #12215
Close #12216

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@rincebrain
Copy link
Contributor

Bonus points for custom kernel packages for kASAN while we're at it...

@szubersk
Copy link
Contributor Author

szubersk commented Jan 2, 2022

Bonus points for custom kernel packages for kASAN while we're at it...
I'm afraid as this is out of the scope of this PR. kASAN was evaluated as too slow to be plugged into regular CI process.

@rincebrain
Copy link
Contributor

rincebrain commented Jan 2, 2022

Bonus points for custom kernel packages for kASAN while we're at it...
I'm afraid as this is out of the scope of this PR. kASAN was evaluated as too slow to be plugged into regular CI process.

I was nearly entirely joking, though I wouldn't have been sad if your reply was "oh yeah sure."; I had figured that after a cleanup period it'd probably end up with one or two daily/weekly runs with kASAN, just because it slows things down so much...

I'm just glad to see someone diving into this. :)

@szubersk
Copy link
Contributor Author

szubersk commented Jan 3, 2022

I was nearly entirely joking, though I wouldn't have been sad if your reply was "oh yeah sure."; I had figured that after a cleanup period it'd probably end up with one or two daily/weekly runs with kASAN, just because it slows things down so much...

Well, this probably cannot be achieved in workflows as we would need to have kASan-enabled kernel. If @behlendorf agrees to revising current buildbot slaves and to adding kASan-enabled VM I'm all for giving it a shot.

@rincebrain
Copy link
Contributor

Sure, you'd need to either purely use the buildbot workflow for it or convince the Github CI instances to kexec, and I somehow suspect they would frown if they ever found out.

Everyone seemed positive on using kASAN the last time it was discussed (I think there might have been a discussion in another thread too but I don't immediately see that...)

@sempervictus
Copy link
Contributor

Given the threading issues we've had over the years, KTSAN might be a handy thing to have running for catching data races. It actually shouldn't hurt performance nearly as much as something like kASAN either, and might help suss out "strangeness" as the OS threading mechanisms change under/around ZFS and through internal code churn.

@rincebrain
Copy link
Contributor

rincebrain commented Jan 4, 2022 via email

@szubersk
Copy link
Contributor Author

szubersk commented Jan 4, 2022

I'm afraid kASan-instrumented zfs-tests require more resources than I can afford at the moment. Sanity tests croaked when run on 12 vCPU/24 GiB VM.

Out of memory: Killed process 490030 (qemu-system-x86) total-vm:31091612kB, anon-rss:24489572kB, file-rss:0kB, shmem-rss:4kB, UID:1000 pgtables:49180kB oom_score_adj:0

@rincebrain
Copy link
Contributor

Conveniently, I have a surfeit of RAM. I'll go build a custom kernel with kASAN and give it a go...

@behlendorf
Copy link
Contributor

We've had a kmemleak enabled builder in the past and the issue there really was performance. I'd be all for adding a kASAN builder if 1) the tests pass with it enabled, 2) it runs in a reasonable amount of time, and 3) we build packages for some distribution so it's easy to install. Starting with just running the sanity.run test would be a nice step in the right direction. Enabling kmemleak in the kernel would be nice too.

@behlendorf behlendorf added the Status: Work in Progress Not yet ready for general review label Jan 6, 2022
lib/libspl/assert.c Outdated Show resolved Hide resolved
@rincebrain
Copy link
Contributor

rincebrain commented Jan 6, 2022

e: @behlendorf, what would you think about something like a runner that triggers once a {day, week, ...} with kASAN (a/o kmemleak, though I have no experience with that one) enabled, rather than on every commit pushed to a PR? It wouldn't tell you before merging that something was on fire, or even which merged bit was, but it'd be better than nothing, and would avoid issues with "well, the runners are now forever backlogged on kASAN runs because they take an eternity"...

As far as easy packaging, I'd probably start with handbuilt kernel packages for them on top of $PICK_YOUR_DISTRO (my leaning obviously being Debian at the moment, but there's not really much limiting it, other than "probably not Fedora"...), and then eventually move up to a helper that periodically cuts new baselines like whatever grabs the new FreeBSD snapshots periodically.

Aww, my test system got to

Test: /home/rich/zfs_randdelay/tests/zfs-tests/tests/functional/large_files/setup (run as root) [00:00] [PASS]
Test: /home/rich/zfs_randdelay/tests/zfs-tests/tests/functional/large_files/large_files_001_pos (run as root) [00:00] [PASS]
Test: /home/rich/zfs_randdelay/tests/zfs-tests/tests/functional/large_files/large_files_002_pos (run as root) [00:00] [PASS]
Test: /home/rich/zfs_randdelay/tests/zfs-tests/tests/functional/large_files/cleanup (run as root) [00:00] [PASS]

in -T functional before running hard out of RAM - I've given it even more to try again, but before that, the only screaming it did was #12230 (I think):

[28633.997081] ==================================================================
[28633.997274] BUG: KASAN: stack-out-of-bounds in stack_trace_consume_entry+0x13d/0x160
[28633.997471] Write of size 8 at addr ffff888046577710 by task zfs/17429

[28633.997664] CPU: 0 PID: 17429 Comm: zfs Tainted: P           O      5.15.12kasan1 #1
[28633.997856] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[28633.997856] Call Trace:
[28633.997856]  <TASK>
[28633.997856]  dump_stack_lvl+0x34/0x44
[28633.997856]  print_address_description.constprop.0+0x21/0x140
[28633.997856]  ? stack_trace_consume_entry+0x13d/0x160
[28633.997856]  ? stack_trace_consume_entry+0x13d/0x160
[28633.997856]  kasan_report.cold+0x7f/0x11b
[28633.997856]  ? dsl_pool_hold+0x11/0xf0 [zfs]
[28633.997856]  ? stack_trace_consume_entry+0x13d/0x160
[28633.997856]  stack_trace_consume_entry+0x13d/0x160
[28633.997856]  ? dsl_pool_rele+0x14/0x50 [zfs]
[28633.997856]  ? create_prof_cpu_mask+0x20/0x20
[28633.997856]  arch_stack_walk+0x73/0xf0
[28633.997856]  ? dsl_pool_rele+0x14/0x50 [zfs]
[28633.997856]  ? rrw_exit+0x3b5/0x510 [zfs]
[28633.997856]  stack_trace_save+0x8c/0xc0
[28633.997856]  ? stack_trace_consume_entry+0x160/0x160
[28633.997856]  ? zcp_eval+0x4f8/0xa10 [zfs]
[28633.997856]  ? stack_trace_save+0x8c/0xc0
[28633.997856]  ? stack_trace_consume_entry+0x160/0x160
[28633.997856]  kasan_save_stack+0x1b/0x40
[28633.997856]  ? kasan_save_stack+0x1b/0x40
[28633.997856]  ? kasan_set_track+0x1c/0x30
[28633.997856]  ? kasan_set_free_info+0x20/0x30
[28633.997856]  ? __kasan_slab_free+0xea/0x120
[28633.997856]  ? kfree+0x8b/0x220
[28633.997856]  ? rrw_exit+0x3b5/0x510 [zfs]
[28633.997856]  ? zcp_eval+0x4f8/0xa10 [zfs]
[28633.997856]  ? kasan_save_stack+0x32/0x40
[28633.997856]  ? kasan_save_stack+0x1b/0x40
[28633.997856]  ? kasan_set_track+0x1c/0x30
[28633.997856]  ? kasan_set_free_info+0x20/0x30
[28633.997856]  ? __kasan_slab_free+0xea/0x120
[28633.997856]  ? kfree+0x8b/0x220
[28633.997856]  ? zcp_eval+0x4f8/0xa10 [zfs]
[28633.997856]  ? zfs_ioc_channel_program+0x19b/0x280 [zfs]
[28633.997856]  ? zfsdev_ioctl_common+0xebe/0x1710 [zfs]
[28633.997856]  ? zfsdev_ioctl+0x4a/0xd0 [zfs]
[28633.997856]  ? __x64_sys_ioctl+0x122/0x190
[28633.997856]  ? do_syscall_64+0x3b/0x90
[28633.997856]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[28633.997856]  ? kasan_set_track+0x1c/0x30
[28633.997856]  ? kfree+0x8b/0x220
[28633.997856]  ? tsd_hash_dtor+0x14a/0x220 [spl]
[28633.997856]  ? tsd_hash_search.isra.2+0x46/0x1b0 [spl]
[28633.997856]  ? tsd_set+0xd1c/0x1e80 [spl]
[28633.997856]  ? _raw_read_lock_irq+0x30/0x30
[28633.997856]  ? tsd_exit+0xed0/0xed0 [spl]
[28633.997856]  kasan_set_track+0x1c/0x30
[28633.997856]  kasan_set_free_info+0x20/0x30
[28633.997856]  __kasan_slab_free+0xea/0x120
[28633.997856]  ? rrw_exit+0x3b5/0x510 [zfs]
[28633.997856]  kfree+0x8b/0x220
[28633.997856]  rrw_exit+0x3b5/0x510 [zfs]
[28633.997856]  dsl_pool_rele+0x14/0x50 [zfs]
[28633.997856]  zcp_eval+0x509/0xa10 [zfs]
[28633.997856]  ? zcp_dataset_hold+0xb0/0xb0 [zfs]
[28633.997856]  ? nvlist_lookup_nvpair_ei_sep+0x5b1/0x960 [znvpair]
[28633.997856]  ? nvt_lookup_name_type.isra.54+0x15b/0x420 [znvpair]
[28633.997856]  ? fnvlist_lookup_nvpair+0x5a/0xc0 [znvpair]
[28633.997856]  ? fnvlist_remove_nvpair+0x30/0x30 [znvpair]
[28633.997856]  ? memmove+0x39/0x60
[28633.997856]  zfs_ioc_channel_program+0x19b/0x280 [zfs]
[28633.997856]  ? zfs_ioc_redact+0x190/0x190 [zfs]
[28633.997856]  ? nv_mem_zalloc.isra.12+0x63/0x80 [znvpair]
[28633.997856]  ? fnvlist_alloc+0x61/0xc0 [znvpair]
[28633.997856]  ? nvlist_lookup_nvpair_embedded_index+0x20/0x20 [znvpair]
[28633.997856]  ? memcpy+0x39/0x60
[28633.997856]  zfsdev_ioctl_common+0xebe/0x1710 [zfs]
[28633.997856]  ? zfsdev_state_destroy+0x1b0/0x1b0 [zfs]
[28633.997856]  ? __kasan_kmalloc_large+0x81/0xa0
[28633.997856]  ? __kmalloc_node+0x206/0x2b0
[28633.997856]  ? kvmalloc_node+0x4d/0x90
[28633.997856]  zfsdev_ioctl+0x4a/0xd0 [zfs]
[28633.997856]  __x64_sys_ioctl+0x122/0x190
[28633.997856]  do_syscall_64+0x3b/0x90
[28633.997856]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[28633.997856] RIP: 0033:0x7fd76ee66317
[28633.997856] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48
[28633.997856] RSP: 002b:00007ffd8b3e3208 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[28633.997856] RAX: ffffffffffffffda RBX: 0000000000005a48 RCX: 00007fd76ee66317
[28633.997856] RDX: 00007ffd8b3e3230 RSI: 0000000000005a48 RDI: 0000000000000004
[28633.997856] RBP: 00007ffd8b3e6820 R08: 00000000ffffffff R09: 0000000000000000
[28633.997856] R10: 0000000000000010 R11: 0000000000000202 R12: 00007ffd8b3e3230
[28633.997856] R13: 00007ffd8b3e68e0 R14: 0000000000005a48 R15: 000055b800a0b340
[28633.997856]  </TASK>

[28633.997856] The buggy address belongs to the page:
[28633.997856] page:0000000070fc777b refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x46577
[28633.997856] flags: 0x100000000000000(node=0|zone=1)
[28633.997856] raw: 0100000000000000 0000000000000000 ffffffff00000101 0000000000000000
[28633.997856] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[28633.997856] page dumped because: kasan: bad access detected

[28633.997856] addr ffff888046577710 is located in stack of task zfs/17429 at offset 192 in frame:
[28633.997856]  stack_trace_save+0x0/0xc0

[28633.997856] this frame has 1 object:
[28633.997856]  [32, 56) 'c'

[28633.997856] Memory state around the buggy address:
[28633.997856]  ffff888046577600: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00
[28633.997856]  ffff888046577680: 00 f3 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
[28633.997856] >ffff888046577700: 00 00 f2 00 00 00 00 00 00 00 00 00 00 00 00 00
[28633.997856]                          ^
[28633.997856]  ffff888046577780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[28633.997856]  ffff888046577800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[28633.997856] ==================================================================

e: the part that actually made it die in a fire appears to have been:

Jan  6 17:07:35 ubuntu18 kernel: [36258.115075] ------------[ cut here ]------------
Jan  6 17:07:35 ubuntu18 kernel: [36258.115080] Stack depot reached limit capacity
Jan  6 17:07:35 ubuntu18 kernel: [36258.115099] WARNING: CPU: 1 PID: 26711 at lib/stackdepot.c:115 stack_depot_save+0x3e1/0x460
Jan  6 17:07:35 ubuntu18 kernel: [36258.115109] Modules linked in: zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) virtio_net net_failover failover virtio_pci virtio_pci_modern_dev virtio virtio_ring
Jan  6 17:07:35 ubuntu18 kernel: [36258.115140] CPU: 1 PID: 26711 Comm: zfs Tainted: P    B      O      5.15.12kasan1 #1
Jan  6 17:07:35 ubuntu18 kernel: [36258.115143] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Jan  6 17:07:35 ubuntu18 kernel: [36258.115146] RIP: 0010:stack_depot_save+0x3e1/0x460
Jan  6 17:07:35 ubuntu18 kernel: [36258.115150] Code: 24 08 e9 98 fd ff ff 0f 0b e9 09 fe ff ff 80 3d a0 9b d9 02 00 75 15 48 c7 c7 e8 bd bb 8f c6 05 90 9b d9 02 01 e8 cf b1 85 01 <0f> 0b 48 c7 c7 6c 9a ed 90 4c 89 fe e8 0e ad 92 01 48 8b 7c 24 08
Jan  6 17:07:35 ubuntu18 kernel: [36258.115153] RSP: 0018:ffff88801f545d60 EFLAGS: 00010082
Jan  6 17:07:35 ubuntu18 kernel: [36258.115157] RAX: 0000000000000000 RBX: 000000005c589b82 RCX: 0000000000000000
Jan  6 17:07:35 ubuntu18 kernel: [36258.115159] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffffed1003ea8b9e
Jan  6 17:07:35 ubuntu18 kernel: [36258.115161] RBP: 000000000000001d R08: 0000000000000001 R09: ffffed101b64ce90
Jan  6 17:07:35 ubuntu18 kernel: [36258.115163] R10: ffff8880db26747b R11: ffffed101b64ce8f R12: ffff88801f545db0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115165] R13: 0000000000000000 R14: ffff8880da44dc10 R15: 0000000000000246
Jan  6 17:07:35 ubuntu18 kernel: [36258.115167] FS:  00007f61215597c0(0000) GS:ffff8880db240000(0000) knlGS:0000000000000000
Jan  6 17:07:35 ubuntu18 kernel: [36258.115171] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan  6 17:07:35 ubuntu18 kernel: [36258.115173] CR2: 00007ffeea8397d8 CR3: 000000000471c000 CR4: 00000000000506e0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115175] Call Trace:
Jan  6 17:07:35 ubuntu18 kernel: [36258.115178]  <TASK>
Jan  6 17:07:35 ubuntu18 kernel: [36258.115180]  ? spl_kmem_cache_free+0x260/0x7c0 [spl]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115186]  kasan_save_stack+0x32/0x40
Jan  6 17:07:35 ubuntu18 kernel: [36258.115191]  ? kasan_save_stack+0x1b/0x40
Jan  6 17:07:35 ubuntu18 kernel: [36258.115193]  ? kasan_set_track+0x1c/0x30
Jan  6 17:07:35 ubuntu18 kernel: [36258.115196]  ? kasan_set_free_info+0x20/0x30
Jan  6 17:07:35 ubuntu18 kernel: [36258.115199]  ? __kasan_slab_free+0xea/0x120
Jan  6 17:07:35 ubuntu18 kernel: [36258.115202]  ? kmem_cache_free+0x74/0x270
Jan  6 17:07:35 ubuntu18 kernel: [36258.115205]  ? spl_kmem_cache_free+0x260/0x7c0 [spl]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115208]  ? zio_done+0x31a0/0x5600 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115244]  ? zio_nowait+0x2dd/0x630 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115253]  ? arc_read+0x1927/0x6140 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115286]  ? dbuf_read_impl.constprop.30+0xe4f/0x21f0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115316]  ? dbuf_read+0x2ac/0x10d0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115345]  ? dmu_buf_hold+0x68/0xd0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115376]  ? zap_lockdir+0xa7/0x1d0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115409]  ? zap_lookup_norm+0x9e/0x120 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115439]  ? zap_lookup+0xd/0x20 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115468]  ? dsl_load_sets+0xe5/0x230 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115503]  ? dsl_deleg_access_impl+0x3bf/0x7e0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115533]  ? dsl_fs_ss_limit_check+0x31c/0x5f0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115563]  ? dmu_recv_begin_check+0x653/0x1e20 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115595]  ? dsl_sync_task_common+0x1fc/0x380 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115702]  ? dsl_sync_task+0x11/0x20 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dmu_recv_begin+0x552/0xb50 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? zfs_ioc_recv_impl+0x22e/0x15a0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? zfs_ioc_recv+0x380/0x5c0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? zfsdev_ioctl_common+0xa71/0x1710 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? zfsdev_ioctl+0x4a/0xd0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? __x64_sys_ioctl+0x122/0x190
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? do_syscall_64+0x3b/0x90
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? mutex_lock+0x89/0xd0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? __mutex_lock_slowpath+0x10/0x10
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? arc_get_data_buf+0xf0/0xf0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? kasan_set_track+0x1c/0x30
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? kfree+0x8b/0x220
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? arc_read_done+0xccf/0x2460 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? __cv_destroy+0x177/0x3e0 [spl]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  kasan_set_track+0x1c/0x30
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  kasan_set_free_info+0x20/0x30
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  __kasan_slab_free+0xea/0x120
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? spl_kmem_cache_free+0x260/0x7c0 [spl]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  kmem_cache_free+0x74/0x270
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  spl_kmem_cache_free+0x260/0x7c0 [spl]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  zio_done+0x31a0/0x5600 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? zio_ready+0x1160/0x1160 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? zio_read_bp_init+0x5b4/0x770 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  zio_nowait+0x2dd/0x630 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? zio_ddt_collision+0xb80/0xb80 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? __kasan_kmalloc+0x7c/0x90
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  arc_read+0x1927/0x6140 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? kfree+0x8b/0x220
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dbuf_rele_and_unlock+0x12a0/0x12a0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? arc_loan_raw_buf+0x60/0x60 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? mutex_unlock+0x7b/0xd0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dbuf_write_children_ready+0x590/0x590 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? mutex_unlock+0x7b/0xd0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  dbuf_read_impl.constprop.30+0xe4f/0x21f0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? __mutex_lock_slowpath+0x10/0x10
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? mutex_unlock+0x7b/0xd0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dbuf_write_override_ready+0x70/0x70 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? spl_kmem_cache_destroy+0x9b0/0x9b0 [spl]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? zio_null+0x26/0x30 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  dbuf_read+0x2ac/0x10d0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dbuf_read_impl.constprop.30+0x21f0/0x21f0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dmu_free_long_object+0xb0/0xb0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  dmu_buf_hold+0x68/0xd0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  zap_lockdir+0xa7/0x1d0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? zap_byteswap+0x1d0/0x1d0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? vsnprintf+0x10d/0x15e0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? pointer+0x780/0x780
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  zap_lookup_norm+0x9e/0x120 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? zap_count+0x1a0/0x1a0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  zap_lookup+0xd/0x20 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  dsl_load_sets+0xe5/0x230 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dsl_deleg_get+0x640/0x640 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? _raw_spin_lock_irqsave+0x7b/0xd0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? _raw_write_lock_irqsave+0xd0/0xd0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dsl_prop_get_dd+0x355/0x4c0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? stack_trace_save+0x8c/0xc0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? stack_trace_consume_entry+0x160/0x160
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? stack_depot_save+0x1dd/0x460
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dsl_prop_get_dd+0x355/0x4c0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? kasan_save_stack+0x32/0x40
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? kasan_save_stack+0x1b/0x40
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? kasan_set_track+0x1c/0x30
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? kasan_set_free_info+0x20/0x30
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? __kasan_slab_free+0xea/0x120
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? kfree+0x8b/0x220
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dsl_prop_get_dd+0x355/0x4c0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dsl_prop_get_ds+0x371/0x530 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dsl_fs_ss_limit_check+0x2f5/0x5f0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dmu_recv_begin_check+0x653/0x1e20 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dsl_sync_task_common+0x1fc/0x380 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dsl_sync_task+0x11/0x20 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dmu_recv_begin+0x552/0xb50 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? zfs_ioc_recv_impl+0x22e/0x15a0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? zfs_ioc_recv+0x380/0x5c0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? zfsdev_ioctl_common+0xa71/0x1710 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? zfsdev_ioctl+0x4a/0xd0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? __x64_sys_ioctl+0x122/0x190
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? do_syscall_64+0x3b/0x90
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? __kasan_kmalloc+0x7c/0x90
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? mutex_lock+0x89/0xd0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? _raw_spin_lock+0x75/0xd0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? _raw_read_lock_irq+0x30/0x30
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? mutex_unlock+0x7b/0xd0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? tsd_get+0xc3/0x160 [spl]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? rrn_find+0x25/0xc0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  dsl_deleg_access_impl+0x3bf/0x7e0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dsl_check_access+0x150/0x150 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dsl_process_sub_livelist+0x150/0x150 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  dsl_fs_ss_limit_check+0x31c/0x5f0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dsl_dir_activate_fs_ss_limit+0x30/0x30 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? stack_trace_save+0x8c/0xc0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dsl_dataset_promote_crypt_sync+0x780/0x780 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? kasan_save_stack+0x32/0x40
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? kasan_save_stack+0x1b/0x40
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? memcpy+0x39/0x60
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  dmu_recv_begin_check+0x653/0x1e20 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dmu_recv_resume_begin_check+0xdb0/0xdb0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? kasan_save_stack+0x32/0x40
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? __kasan_kmalloc+0x7c/0x90
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? spl_kmem_alloc+0x12d/0x190 [spl]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? rrw_enter_read_impl+0x193/0x460 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dsl_sync_task_common+0x1e9/0x380 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dsl_sync_task+0x11/0x20 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dmu_recv_begin+0x552/0xb50 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? zfs_ioc_recv_impl+0x22e/0x15a0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? zfsdev_ioctl_common+0xa71/0x1710 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? zfsdev_ioctl+0x4a/0xd0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? __x64_sys_ioctl+0x122/0x190
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? do_syscall_64+0x3b/0x90
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? __kernel_read+0x5eb/0x9e0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? _raw_spin_lock+0x75/0xd0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? _raw_read_lock_irq+0x30/0x30
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? _raw_spin_lock+0x75/0xd0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? _raw_read_lock_irq+0x30/0x30
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? __kasan_kmalloc+0x7c/0x90
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? tsd_set+0x669/0x1e80 [spl]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? __kasan_slab_alloc+0x2c/0x80
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? mutex_unlock+0x7b/0xd0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? __kasan_kmalloc+0x7c/0x90
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? zfs_refcount_add_many+0x4d/0x350 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? rrw_enter_read_impl+0x290/0x460 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  dsl_sync_task_common+0x1fc/0x380 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? receive_cksum+0x70/0x70 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dmu_recv_resume_begin_check+0xdb0/0xdb0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dsl_null_checkfunc+0x10/0x10 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? dmu_recv_resume_begin_check+0xdb0/0xdb0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? receive_cksum+0x70/0x70 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? kasan_unpoison+0x23/0x50
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  dsl_sync_task+0x11/0x20 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  dmu_recv_begin+0x552/0xb50 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? memset+0x20/0x40
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? receive_writer_thread+0x3960/0x3960 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  zfs_ioc_recv_impl+0x22e/0x15a0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? mutex_unlock+0x7b/0xd0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? clear_received_props+0x190/0x190 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? deref_stack_reg+0x33/0x70
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? unwind_next_frame+0x11a1/0x17e0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? get_reg+0xef/0x170
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? unwind_next_frame+0x101d/0x17e0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? deref_stack_reg+0x70/0x70
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? zfsdev_ioctl+0x4a/0xd0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? __is_insn_slot_addr+0x7f/0xd0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? deref_stack_reg+0x33/0x70
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? unwind_next_frame+0x11a1/0x17e0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? get_reg+0xef/0x170
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? unwind_next_frame+0x101d/0x17e0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? deref_stack_reg+0x70/0x70
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? zfsdev_ioctl+0x4a/0xd0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? rrw_exit+0x3b5/0x510 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  zfs_ioc_recv+0x380/0x5c0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? __kernel_text_address+0x9/0x30
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? zfs_ioc_recv_impl+0x15a0/0x15a0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? stack_trace_save+0x8c/0xc0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? stack_trace_consume_entry+0x160/0x160
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? zfs_refcount_count+0x16/0x40 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? zfs_secpolicy_write_perms+0x130/0x140 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? kasan_unpoison+0x23/0x50
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? __kasan_slab_alloc+0x2c/0x80
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? memcpy+0x39/0x60
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  zfsdev_ioctl_common+0xa71/0x1710 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? __alloc_pages_slowpath.constprop.0+0x1e40/0x1e40
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? zfsdev_state_destroy+0x1b0/0x1b0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? __kasan_kmalloc_large+0x81/0xa0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? __kmalloc_node+0x206/0x2b0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? drain_pages+0x80/0x80
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  ? kvmalloc_node+0x4d/0x90
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  zfsdev_ioctl+0x4a/0xd0 [zfs]
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  __x64_sys_ioctl+0x122/0x190
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  do_syscall_64+0x3b/0x90
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  entry_SYSCALL_64_after_hwframe+0x44/0xae
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716] RIP: 0033:0x7f6120593317
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716] RSP: 002b:00007ffeea83cdb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6120593317
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716] RDX: 00007ffeea83cf60 RSI: 0000000000005a1b RDI: 0000000000000004
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716] RBP: 00007ffeea841550 R08: 00007f6120868d80 R09: 0000000000000000
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716] R10: 00005572d761c010 R11: 0000000000000246 R12: 00007ffeea83cf60
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716] R13: 00007ffeea8473c0 R14: 0000000000000016 R15: 00007ffeea844cc0
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716]  </TASK>
Jan  6 17:07:35 ubuntu18 kernel: [36258.115716] ---[ end trace fd1c007a4d462c56 ]---

@sempervictus
Copy link
Contributor

I dont know if this is true nowadays, i've not run vbox in many years in part due to this, but the visor used to be quite unsafe with IO and memory operations (not honoring sync/direct, doing weird things with memory layout, leaving a bunch of visor memory accessible to the VM, etc) - so might not be the best thing on which to test if any of that is still problematic. How much memory are you giving the poor thing, and how much do we think it needs? If this is too expensive to run in public cloud, we might be able to donate some pages and cycles in our stacks (similar concern though in that we run grsec at the phys page table layer so our MM is quite different from upstream, way more so after AUTOSLAB came into being).

@rincebrain
Copy link
Contributor

I thought I had given it 16 GB, but apparently it didn't save, and that run was with a puny 4, so no wonder it died in a fire.

I have since given it 24.

(As far as dangerous life choices, I've not made it crash and burn in general without enabling features that have flashing "experimental do not use" lights over them - in particular, TRIM on disk images and using host IO cache, neither of which is a default.)

@nabijaczleweli
Copy link
Contributor

nabijaczleweli commented Jan 7, 2022

As I let the kASAN testsuite rip (and that's very loose usage of "rip"), this is what I get from just running zpool import -a (default qemu-system-x86_64 machine, two virtio disks, so fd0, sr0, vd[ab], one pool), which is quite fun:

=================================================================
==979==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 24 byte(s) in 1 object(s) allocated from:
    #0 0x7f5d67f677cf in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x7f5d678c5c68 in nv_mem_zalloc ../../module/nvpair/nvpair.c:205
    #2 0x7f5d678c5c68 in nvlist_xalloc ../../module/nvpair/nvpair.c:604
    #3 0x7f5d678c5c68 in nvlist_xalloc ../../module/nvpair/nvpair.c:594
    #4 0x7f5d67902b28 in nvlist_xunpack ../../module/nvpair/nvpair.c:2753
    #5 0x7f5d67c878cb in zcmd_read_dst_nvlist /root/zfs/lib/libzfs/libzfs_util.c:1237
    #6 0x7f5d67c28ed2 in zpool_get_all_props /root/zfs/lib/libzfs/libzfs_pool.c:97
    #7 0x7f5d67c2e487 in zpool_get_prop /root/zfs/lib/libzfs/libzfs_pool.c:330
    #8 0x7f5d67c02675 in zfs_prop_get /root/zfs/lib/libzfs/libzfs_dataset.c:2712
    #9 0x7f5d67c1fdd9 in non_descendant_idx /root/zfs/lib/libzfs/libzfs_mount.c:1247
    #10 0x7f5d67c26e1d in zfs_foreach_mountpoint /root/zfs/lib/libzfs/libzfs_mount.c:1438
    #11 0x7f5d67c2767c in zpool_enable_datasets /root/zfs/lib/libzfs/libzfs_mount.c:1485
    #12 0x56481f209ac8 in do_import /root/zfs/cmd/zpool/zpool_main.c:3232
    #13 0x56481f222de6 in import_pools /root/zfs/cmd/zpool/zpool_main.c:3281
    #14 0x56481f228f5a in zpool_do_import /root/zfs/cmd/zpool/zpool_main.c:3828
    #15 0x56481f1f5796 in main /root/zfs/cmd/zpool/zpool_main.c:10949
    #16 0x7f5d66aa67ec in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x277ec)

Indirect leak of 2304 byte(s) in 18 object(s) allocated from:
    #0 0x7f5d67f677cf in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x7f5d678c42ae in nv_mem_zalloc ../../module/nvpair/nvpair.c:205
    #2 0x7f5d678c42ae in nvt_tab_alloc ../../module/nvpair/nvpair.c:270
    #3 0x7f5d678c42ae in nvt_add_nvpair ../../module/nvpair/nvpair.c:509
    #4 0x7f5d678c77a6 in nvs_decode_pairs ../../module/nvpair/nvpair.c:2414
    #5 0x7f5d678c77a6 in nvs_operation ../../module/nvpair/nvpair.c:2468
    #6 0x7f5d678ca1de in nvs_embedded ../../module/nvpair/nvpair.c:2513
    #7 0x7f5d678c7461 in nvs_decode_pairs ../../module/nvpair/nvpair.c:2403
    #8 0x7f5d678c7461 in nvs_operation ../../module/nvpair/nvpair.c:2468
    #9 0x7f5d679030e8 in nvs_native ../../module/nvpair/nvpair.c:3135
    #10 0x7f5d679030e8 in nvlist_common ../../module/nvpair/nvpair.c:2658
    #11 0x7f5d679030e8 in nvlist_xunpack ../../module/nvpair/nvpair.c:2756
    #12 0x7f5d67c878cb in zcmd_read_dst_nvlist /root/zfs/lib/libzfs/libzfs_util.c:1237
    #13 0x7f5d67c28ed2 in zpool_get_all_props /root/zfs/lib/libzfs/libzfs_pool.c:97
    #14 0x7f5d67c2e487 in zpool_get_prop /root/zfs/lib/libzfs/libzfs_pool.c:330
    #15 0x7f5d67c02675 in zfs_prop_get /root/zfs/lib/libzfs/libzfs_dataset.c:2712
    #16 0x7f5d67c1fdd9 in non_descendant_idx /root/zfs/lib/libzfs/libzfs_mount.c:1247
    #17 0x7f5d67c26e1d in zfs_foreach_mountpoint /root/zfs/lib/libzfs/libzfs_mount.c:1438
    #18 0x7f5d67c2767c in zpool_enable_datasets /root/zfs/lib/libzfs/libzfs_mount.c:1485
    #19 0x56481f209ac8 in do_import /root/zfs/cmd/zpool/zpool_main.c:3232
    #20 0x56481f222de6 in import_pools /root/zfs/cmd/zpool/zpool_main.c:3281
    #21 0x56481f228f5a in zpool_do_import /root/zfs/cmd/zpool/zpool_main.c:3828
    #22 0x56481f1f5796 in main /root/zfs/cmd/zpool/zpool_main.c:10949
    #23 0x7f5d66aa67ec in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x277ec)

Indirect leak of 2024 byte(s) in 36 object(s) allocated from:
    #0 0x7f5d67f677cf in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x7f5d678c7a79 in nv_mem_zalloc ../../module/nvpair/nvpair.c:205
    #2 0x7f5d678c7a79 in nvp_buf_alloc ../../module/nvpair/nvpair.c:631
    #3 0x7f5d678c7a79 in nvs_decode_pairs ../../module/nvpair/nvpair.c:2400
    #4 0x7f5d678c7a79 in nvs_operation ../../module/nvpair/nvpair.c:2468
    #5 0x7f5d678ca1de in nvs_embedded ../../module/nvpair/nvpair.c:2513
    #6 0x7f5d678c7461 in nvs_decode_pairs ../../module/nvpair/nvpair.c:2403
    #7 0x7f5d678c7461 in nvs_operation ../../module/nvpair/nvpair.c:2468
    #8 0x7f5d679030e8 in nvs_native ../../module/nvpair/nvpair.c:3135
    #9 0x7f5d679030e8 in nvlist_common ../../module/nvpair/nvpair.c:2658
    #10 0x7f5d679030e8 in nvlist_xunpack ../../module/nvpair/nvpair.c:2756
    #11 0x7f5d67c878cb in zcmd_read_dst_nvlist /root/zfs/lib/libzfs/libzfs_util.c:1237
    #12 0x7f5d67c28ed2 in zpool_get_all_props /root/zfs/lib/libzfs/libzfs_pool.c:97
    #13 0x7f5d67c2e487 in zpool_get_prop /root/zfs/lib/libzfs/libzfs_pool.c:330
    #14 0x7f5d67c02675 in zfs_prop_get /root/zfs/lib/libzfs/libzfs_dataset.c:2712
    #15 0x7f5d67c1fdd9 in non_descendant_idx /root/zfs/lib/libzfs/libzfs_mount.c:1247
    #16 0x7f5d67c26e1d in zfs_foreach_mountpoint /root/zfs/lib/libzfs/libzfs_mount.c:1438
    #17 0x7f5d67c2767c in zpool_enable_datasets /root/zfs/lib/libzfs/libzfs_mount.c:1485
    #18 0x56481f209ac8 in do_import /root/zfs/cmd/zpool/zpool_main.c:3232
    #19 0x56481f222de6 in import_pools /root/zfs/cmd/zpool/zpool_main.c:3281
    #20 0x56481f228f5a in zpool_do_import /root/zfs/cmd/zpool/zpool_main.c:3828
    #21 0x56481f1f5796 in main /root/zfs/cmd/zpool/zpool_main.c:10949
    #22 0x7f5d66aa67ec in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x277ec)

Indirect leak of 1376 byte(s) in 18 object(s) allocated from:
    #0 0x7f5d67f677cf in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x7f5d678c7a79 in nv_mem_zalloc ../../module/nvpair/nvpair.c:205
    #2 0x7f5d678c7a79 in nvp_buf_alloc ../../module/nvpair/nvpair.c:631
    #3 0x7f5d678c7a79 in nvs_decode_pairs ../../module/nvpair/nvpair.c:2400
    #4 0x7f5d678c7a79 in nvs_operation ../../module/nvpair/nvpair.c:2468
    #5 0x7f5d679030e8 in nvs_native ../../module/nvpair/nvpair.c:3135
    #6 0x7f5d679030e8 in nvlist_common ../../module/nvpair/nvpair.c:2658
    #7 0x7f5d679030e8 in nvlist_xunpack ../../module/nvpair/nvpair.c:2756
    #8 0x7f5d67c878cb in zcmd_read_dst_nvlist /root/zfs/lib/libzfs/libzfs_util.c:1237
    #9 0x7f5d67c28ed2 in zpool_get_all_props /root/zfs/lib/libzfs/libzfs_pool.c:97
    #10 0x7f5d67c2e487 in zpool_get_prop /root/zfs/lib/libzfs/libzfs_pool.c:330
    #11 0x7f5d67c02675 in zfs_prop_get /root/zfs/lib/libzfs/libzfs_dataset.c:2712
    #12 0x7f5d67c1fdd9 in non_descendant_idx /root/zfs/lib/libzfs/libzfs_mount.c:1247
    #13 0x7f5d67c26e1d in zfs_foreach_mountpoint /root/zfs/lib/libzfs/libzfs_mount.c:1438
    #14 0x7f5d67c2767c in zpool_enable_datasets /root/zfs/lib/libzfs/libzfs_mount.c:1485
    #15 0x56481f209ac8 in do_import /root/zfs/cmd/zpool/zpool_main.c:3232
    #16 0x56481f222de6 in import_pools /root/zfs/cmd/zpool/zpool_main.c:3281
    #17 0x56481f228f5a in zpool_do_import /root/zfs/cmd/zpool/zpool_main.c:3828
    #18 0x56481f1f5796 in main /root/zfs/cmd/zpool/zpool_main.c:10949
    #19 0x7f5d66aa67ec in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x277ec)

Indirect leak of 1008 byte(s) in 18 object(s) allocated from:
    #0 0x7f5d67f677cf in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x7f5d678c9f1e in nv_mem_zalloc ../../module/nvpair/nvpair.c:205
    #2 0x7f5d678c9f1e in nv_priv_alloc_embedded ../../module/nvpair/nvpair.c:255
    #3 0x7f5d678c9f1e in nvs_embedded ../../module/nvpair/nvpair.c:2503
    #4 0x7f5d678c7461 in nvs_decode_pairs ../../module/nvpair/nvpair.c:2403
    #5 0x7f5d678c7461 in nvs_operation ../../module/nvpair/nvpair.c:2468
    #6 0x7f5d679030e8 in nvs_native ../../module/nvpair/nvpair.c:3135
    #7 0x7f5d679030e8 in nvlist_common ../../module/nvpair/nvpair.c:2658
    #8 0x7f5d679030e8 in nvlist_xunpack ../../module/nvpair/nvpair.c:2756
    #9 0x7f5d67c878cb in zcmd_read_dst_nvlist /root/zfs/lib/libzfs/libzfs_util.c:1237
    #10 0x7f5d67c28ed2 in zpool_get_all_props /root/zfs/lib/libzfs/libzfs_pool.c:97
    #11 0x7f5d67c2e487 in zpool_get_prop /root/zfs/lib/libzfs/libzfs_pool.c:330
    #12 0x7f5d67c02675 in zfs_prop_get /root/zfs/lib/libzfs/libzfs_dataset.c:2712
    #13 0x7f5d67c1fdd9 in non_descendant_idx /root/zfs/lib/libzfs/libzfs_mount.c:1247
    #14 0x7f5d67c26e1d in zfs_foreach_mountpoint /root/zfs/lib/libzfs/libzfs_mount.c:1438
    #15 0x7f5d67c2767c in zpool_enable_datasets /root/zfs/lib/libzfs/libzfs_mount.c:1485
    #16 0x56481f209ac8 in do_import /root/zfs/cmd/zpool/zpool_main.c:3232
    #17 0x56481f222de6 in import_pools /root/zfs/cmd/zpool/zpool_main.c:3281
    #18 0x56481f228f5a in zpool_do_import /root/zfs/cmd/zpool/zpool_main.c:3828
    #19 0x56481f1f5796 in main /root/zfs/cmd/zpool/zpool_main.c:10949
    #20 0x7f5d66aa67ec in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x277ec)

Indirect leak of 256 byte(s) in 1 object(s) allocated from:
    #0 0x7f5d67f677cf in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x7f5d678c2750 in nv_mem_zalloc ../../module/nvpair/nvpair.c:205
    #2 0x7f5d678c2750 in nvt_resize ../../module/nvpair/nvpair.c:378
    #3 0x7f5d678c3fb4 in nvt_grow ../../module/nvpair/nvpair.c:433
    #4 0x7f5d678c3fb4 in nvt_add_nvpair ../../module/nvpair/nvpair.c:525
    #5 0x7f5d678c77a6 in nvs_decode_pairs ../../module/nvpair/nvpair.c:2414
    #6 0x7f5d678c77a6 in nvs_operation ../../module/nvpair/nvpair.c:2468
    #7 0x7f5d679030e8 in nvs_native ../../module/nvpair/nvpair.c:3135
    #8 0x7f5d679030e8 in nvlist_common ../../module/nvpair/nvpair.c:2658
    #9 0x7f5d679030e8 in nvlist_xunpack ../../module/nvpair/nvpair.c:2756
    #10 0x7f5d67c878cb in zcmd_read_dst_nvlist /root/zfs/lib/libzfs/libzfs_util.c:1237
    #11 0x7f5d67c28ed2 in zpool_get_all_props /root/zfs/lib/libzfs/libzfs_pool.c:97
    #12 0x7f5d67c2e487 in zpool_get_prop /root/zfs/lib/libzfs/libzfs_pool.c:330
    #13 0x7f5d67c02675 in zfs_prop_get /root/zfs/lib/libzfs/libzfs_dataset.c:2712
    #14 0x7f5d67c1fdd9 in non_descendant_idx /root/zfs/lib/libzfs/libzfs_mount.c:1247
    #15 0x7f5d67c26e1d in zfs_foreach_mountpoint /root/zfs/lib/libzfs/libzfs_mount.c:1438
    #16 0x7f5d67c2767c in zpool_enable_datasets /root/zfs/lib/libzfs/libzfs_mount.c:1485
    #17 0x56481f209ac8 in do_import /root/zfs/cmd/zpool/zpool_main.c:3232
    #18 0x56481f222de6 in import_pools /root/zfs/cmd/zpool/zpool_main.c:3281
    #19 0x56481f228f5a in zpool_do_import /root/zfs/cmd/zpool/zpool_main.c:3828
    #20 0x56481f1f5796 in main /root/zfs/cmd/zpool/zpool_main.c:10949
    #21 0x7f5d66aa67ec in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x277ec)

Indirect leak of 56 byte(s) in 1 object(s) allocated from:
    #0 0x7f5d67f677cf in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x7f5d678c5b7d in nv_priv_alloc ../../module/nvpair/nvpair.c:237
    #2 0x7f5d678c5b7d in nvlist_xalloc ../../module/nvpair/nvpair.c:601
    #3 0x7f5d678c5b7d in nvlist_xalloc ../../module/nvpair/nvpair.c:594
    #4 0x7f5d67902b28 in nvlist_xunpack ../../module/nvpair/nvpair.c:2753
    #5 0x7f5d67c878cb in zcmd_read_dst_nvlist /root/zfs/lib/libzfs/libzfs_util.c:1237
    #6 0x7f5d67c28ed2 in zpool_get_all_props /root/zfs/lib/libzfs/libzfs_pool.c:97
    #7 0x7f5d67c2e487 in zpool_get_prop /root/zfs/lib/libzfs/libzfs_pool.c:330
    #8 0x7f5d67c02675 in zfs_prop_get /root/zfs/lib/libzfs/libzfs_dataset.c:2712
    #9 0x7f5d67c1fdd9 in non_descendant_idx /root/zfs/lib/libzfs/libzfs_mount.c:1247
    #10 0x7f5d67c26e1d in zfs_foreach_mountpoint /root/zfs/lib/libzfs/libzfs_mount.c:1438
    #11 0x7f5d67c2767c in zpool_enable_datasets /root/zfs/lib/libzfs/libzfs_mount.c:1485
    #12 0x56481f209ac8 in do_import /root/zfs/cmd/zpool/zpool_main.c:3232
    #13 0x56481f222de6 in import_pools /root/zfs/cmd/zpool/zpool_main.c:3281
    #14 0x56481f228f5a in zpool_do_import /root/zfs/cmd/zpool/zpool_main.c:3828
    #15 0x56481f1f5796 in main /root/zfs/cmd/zpool/zpool_main.c:10949
    #16 0x7f5d66aa67ec in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x277ec)

SUMMARY: AddressSanitizer: 7048 byte(s) leaked in 93 allocation(s).

@szubersk
Copy link
Contributor Author

@nabijaczleweli
Indeed, memory allocated by zfs_prop_get / zpool_get_all_props is not released. For now I suppressed those errors as I did not have enough resources to dig deeper into it (I couldn't determine an obvious fix at the first glance).

@behlendorf
I just run sanity suite and kmemleak reports no errors (I hope I did not make a mistake setting the environment). I'll trigger the full suite in a second. It doesn't seem like we have a significant slowdown here.

Results Summary
PASS     810

Running Time:   00:31:16
Percent passed: 100.0%

@nabijaczleweli
Copy link
Contributor

AFAICT (which is to say: not that far, but I looked my best), there is no actual leak there. Even replacing all the zpool_get_all_props() with zpool_props_refresh(), which does free the previous zpool_props, doesn't make the asan warning go away.

Oddly, valgrind just gives me shit, somehow managing to break free() semi-entirely, it seems, and nothing gets freed. It enters both regcomp() and regfree() in libzfs_init/_fini but seemingly nothing actually gets freed. Likewise with the newargv in zpool main().

nabijaczleweli added a commit to nabijaczleweli/zfs that referenced this pull request Jan 10, 2022
Caught by valgrind while investigating
openzfs#12928 (comment)

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
nabijaczleweli added a commit to nabijaczleweli/zfs that referenced this pull request Jan 10, 2022
They're later |=d with constants, but never reset

Caught by valgrind while investigating
openzfs#12928 (comment)

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
@szubersk
Copy link
Contributor Author

szubersk commented Jan 11, 2022

@behlendorf
My VM croaks when ZTS is run with Kmemleak enabled. Default configuration, periodic scans were turned off. 740 test cases were executed. I'll re-run a smaller subset now.

[439809.751521] Out of memory: Killed process 1000531 (qemu-system-x86) total-vm:31116140kB, anon-rss:24749672kB, file-rss:0kB, shmem-rss:4kB, UID:1000 pgtables:49696kB oom_score_adj:0

EDIT
The ZFS ARC on the outer VM was using little bit too much memory leaving not enough space for the inner VM. I lowered the inner VM (the testbed) RAM to 20 GiB and forced in there

# cat /etc/modprobe.d/zfs.conf 
options zfs zfs_arc_max=2147483648

The testbed did not collapse. I think the ARC-related problems in the test cases can be attributed to the low zfs_arc_max and ignored for now.

Results Summary                                                                                                                                                                               
PASS     1562                                                                                                                                                                                 
FAIL      12                                                                                                                                                                                  
SKIP      13                                                                                                                                                                                  
KILLED     4                                                                                                                                                                                  
                                                                                                                                                                                              
Running Time:   07:20:32                                                                                                                                                                      
Percent passed: 98.2%                                                                                                                                                                         
Log directory:  /var/tmp/test_results/20220111T203720                                                                                                                                         
                                                                                                                                                                                              
Tests with results other than PASS that are expected:                                                                                                                                         
    FAIL casenorm/mixed_formd_delete (https://github.com/openzfs/zfs/issues/7633)                                                                                                             
    FAIL casenorm/mixed_formd_lookup (https://github.com/openzfs/zfs/issues/7633)                                                                                                             
    FAIL casenorm/mixed_formd_lookup_ci (https://github.com/openzfs/zfs/issues/7633)                                                                                                          
    FAIL casenorm/mixed_none_lookup_ci (https://github.com/openzfs/zfs/issues/7633)                                                                                                           
    FAIL casenorm/sensitive_formd_delete (https://github.com/openzfs/zfs/issues/7633)                                                                                                         
    FAIL casenorm/sensitive_formd_lookup (https://github.com/openzfs/zfs/issues/7633)                                                                                                         
    SKIP cli_root/zfs_get/zfs_get_009_pos (https://github.com/openzfs/zfs/issues/5479)                                                                                                        
    FAIL cli_root/zpool_expand/zpool_expand_001_pos (Known issue)                                                                                                                             
    FAIL cli_root/zpool_import/import_rewind_device_replaced (Arbitrary pool rewind is not guaranteed)                                                                                        
    SKIP cli_root/zpool_import/zpool_import_missing_003_pos (https://github.com/openzfs/zfs/issues/6839)                                                                                      
    SKIP redundancy/redundancy_draid_spare3 (Known issue)                                                                                                                                     
    FAIL refreserv/refreserv_004_pos (Known issue)                                                                                                                                            
    SKIP removal/removal_with_zdb (Known issue)                                                                                                                                               
    SKIP rsend/rsend_008_pos (https://github.com/openzfs/zfs/issues/6066)                                                                                                                     
    FAIL vdev_zaps/vdev_zaps_007_pos (Known issue)                                                                                                                                            
                                                                                                                                                                                              
Tests with result of PASS that are unexpected:                                                                                                                                                

Tests with results other than PASS that are unexpected:
    FAIL arc/arcstats_runtime_tuning (expected PASS)
    KILLED l2arc/persist_l2arc_001_pos (expected PASS)
    KILLED l2arc/persist_l2arc_002_pos (expected PASS)
    FAIL l2arc/persist_l2arc_003_neg (expected PASS)
    KILLED l2arc/persist_l2arc_004_pos (expected PASS)
    KILLED l2arc/persist_l2arc_005_pos (expected PASS)

Kmemleak scans and were run every 60 seconds and saved to a file, sync-ed right after to survive the testbed crash. 2 leaks were found, one of them in btrfs (/).

unreferenced object 0xffff8b3532034720 (size 96):                                                                                                                                             
  comm "modprobe", pid 958, jiffies 4294909057 (age 8559.752s)                                                                                                                                
  hex dump (first 32 bytes):                                                                                                                                                                  
    03 00 00 00 01 00 40 00 00 00 00 00 00 00 00 00  ......@.........                                                                                                                         
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................                                                                                                                         
  backtrace:                                                                                                                                                                                  
    [<0000000061682ab7>] spl_kmem_alloc_impl+0xac/0xc0 [spl]         
    [<0000000058ca4072>] tsd_hash_add_key+0x3c/0x180 [spl]                                     
    [<00000000f15f2543>] zfs_kmod_init+0x9c/0xf0 [zfs]                                         
    [<0000000069c432b6>] ext4_has_free_clusters+0xc/0x170 [ext4]     
    [<000000006ac0fa98>] do_one_initcall+0x44/0x1d0        
    [<00000000b5a23ac3>] do_init_module+0x5c/0x270                   
    [<0000000012d23638>] __do_sys_finit_module+0xae/0x110                  
    [<00000000b192f3ae>] do_syscall_64+0x3b/0xc0                 
    [<000000008214e32f>] entry_SYSCALL_64_after_hwframe+0x44/0xae 

unreferenced object 0xffff8b3507110140 (size 80):
  comm "kworker/u32:2", pid 2529891, jiffies 4297431495 (age 12245.296s)
  hex dump (first 32 bytes):
    00 30 30 00 00 00 00 00 ff 8f 30 00 00 00 00 00  .00.......0.....
    90 a2 24 2a 35 8b ff ff 00 00 00 00 00 00 00 00  ..$*5...........
  backtrace:
    [<00000000b33e8deb>] alloc_extent_state+0x1d/0xb0 [btrfs]
    [<00000000892ec6fa>] set_extent_bit+0x2ff/0x670 [btrfs]
    [<0000000004bf9a94>] lock_extent_bits+0x6b/0xa0 [btrfs]
    [<00000000b87f1479>] lock_and_cleanup_extent_if_need+0xaf/0x1c0 [btrfs]
    [<0000000098342a53>] btrfs_buffered_write+0x29f/0x820 [btrfs]
    [<00000000c89f7ca3>] btrfs_file_write_iter+0x127/0x390 [btrfs]
    [<00000000badec29a>] do_iter_readv_writev+0x152/0x1b0
    [<00000000bbb1619f>] do_iter_write+0x7c/0x1c0
    [<00000000fa26b371>] lo_write_bvec+0x62/0x150 [loop]
    [<0000000014bc55e0>] loop_process_work+0x250/0xbd0 [loop]
    [<00000000e84d7710>] process_one_work+0x1f1/0x390
    [<00000000b250d8d7>] worker_thread+0x53/0x3e0
    [<0000000091495c86>] kthread+0x127/0x150
    [<0000000074a95624>] ret_from_fork+0x22/0x30

@nabijaczleweli
Copy link
Contributor

Heh. Digging deeper into it, and fixing some other stuff, ZFS_SERIAL_MOUNT=1 masks this. Progress?

@behlendorf
Copy link
Contributor

We actually still have a kmemleak enabled builder as part of the CI. It's just not enabled by default since we turned off the code coverage analysis. If you add the following line to your commit message you can request it. This would request only the architecture, style, and coverage CI builders. The coverage builder has kmemleak enabled, but it's using a quite old custom 4.13 kernel. It'd be interesting to see what it find, it used to reliably pass.

Requires-builders: arch,style,coverage

@nabijaczleweli
Copy link
Contributor

nabijaczleweli commented Jan 12, 2022

here's a minimal reproducer (yes its atrocious DO NOT bully me):

#include <stdio.h>
void * libzfs_init(void);
void libzfs_fini(void *);
typedef int (*zfs_iter_f)(void *, void *);
void zfs_foreach_mountpoint(void *, void **, unsigned long, zfs_iter_f, void *, int);
void *zfs_open(void *, const char *, int);
void zfs_close(void *);
static int iter(void *a, void * b) {
printf("%p, %p\n", a, b);
return 0;
}
int main(int argc, char ** argv) {
void * zfs = libzfs_init();
void * h = zfs_open(zfs, "scratchpsko", 1);
zfs_foreach_mountpoint(zfs, &h, 1, iter, 0, argc - 1);
zfs_close(h);
libzfs_fini(zfs);
}
root@kasan-test:/tmp# cc -fsanitize={address,undefined} a.c -lzfs
root@kasan-test:/tmp# ./a.out
0x616000000080, (nil)
root@kasan-test:/tmp# ./a.out a
0x616000000080, (nil)

=================================================================
==325709==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 24 byte(s) in 1 object(s) allocated from:
    #0 0x7f004742a7cf in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x7f0046158bd2 in nv_alloc_sys /root/zfs/lib/libnvpair/nvpair_alloc_system.c:37
    #2 0x7f004615ecbc in nv_mem_zalloc ../../module/nvpair/nvpair.c:205
    #3 0x7f004615ecbc in nvlist_xalloc ../../module/nvpair/nvpair.c:606
    #4 0x7f004615ecbc in nvlist_xalloc ../../module/nvpair/nvpair.c:594
    #5 0x7f004619bbb8 in nvlist_xunpack ../../module/nvpair/nvpair.c:2756
    #6 0x7f004714ba0b in zcmd_read_dst_nvlist /root/zfs/lib/libzfs/libzfs_util.c:1237
    #7 0x7f00470ed426 in zpool_get_all_props /root/zfs/lib/libzfs/libzfs_pool.c:94
    #8 0x7f00470f076f in zpool_props_refresh /root/zfs/lib/libzfs/libzfs_pool.c:115
    #9 0x7f00470f26b7 in zpool_get_prop /root/zfs/lib/libzfs/libzfs_pool.c:332
    #10 0x7f00470c6c44 in zfs_prop_get /root/zfs/lib/libzfs/libzfs_dataset.c:2689
    #11 0x7f00470e44c9 in non_descendant_idx /root/zfs/lib/libzfs/libzfs_mount.c:1238
    #12 0x7f00470eaf3d in zfs_foreach_mountpoint /root/zfs/lib/libzfs/libzfs_mount.c:1431
    #13 0x55fa63057342 in main (/tmp/a.out+0x1342)
    #14 0x7f004643f7ec in __libc_start_main ../csu/libc-start.c:332

Indirect leak of 2304 byte(s) in 18 object(s) allocated from:
    #0 0x7f004742a7cf in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x7f0046158bd2 in nv_alloc_sys /root/zfs/lib/libnvpair/nvpair_alloc_system.c:37
    #2 0x7f004615d2fe in nv_mem_zalloc ../../module/nvpair/nvpair.c:205
    #3 0x7f004615d2fe in nvt_tab_alloc ../../module/nvpair/nvpair.c:270
    #4 0x7f004615d2fe in nvt_add_nvpair ../../module/nvpair/nvpair.c:509
    #5 0x7f0046160836 in nvs_decode_pairs ../../module/nvpair/nvpair.c:2417
    #6 0x7f0046160836 in nvs_operation ../../module/nvpair/nvpair.c:2471
    #7 0x7f004616326e in nvs_embedded ../../module/nvpair/nvpair.c:2516
    #8 0x7f00461604f1 in nvs_decode_pairs ../../module/nvpair/nvpair.c:2406
    #9 0x7f00461604f1 in nvs_operation ../../module/nvpair/nvpair.c:2471
    #10 0x7f004619c178 in nvs_native ../../module/nvpair/nvpair.c:3138
    #11 0x7f004619c178 in nvlist_common ../../module/nvpair/nvpair.c:2661
    #12 0x7f004619c178 in nvlist_xunpack ../../module/nvpair/nvpair.c:2759
    #13 0x7f004714ba0b in zcmd_read_dst_nvlist /root/zfs/lib/libzfs/libzfs_util.c:1237
    #14 0x7f00470ed426 in zpool_get_all_props /root/zfs/lib/libzfs/libzfs_pool.c:94
    #15 0x7f00470f076f in zpool_props_refresh /root/zfs/lib/libzfs/libzfs_pool.c:115
    #16 0x7f00470f26b7 in zpool_get_prop /root/zfs/lib/libzfs/libzfs_pool.c:332
    #17 0x7f00470c6c44 in zfs_prop_get /root/zfs/lib/libzfs/libzfs_dataset.c:2689
    #18 0x7f00470e44c9 in non_descendant_idx /root/zfs/lib/libzfs/libzfs_mount.c:1238
    #19 0x7f00470eaf3d in zfs_foreach_mountpoint /root/zfs/lib/libzfs/libzfs_mount.c:1431
    #20 0x55fa63057342 in main (/tmp/a.out+0x1342)
    #21 0x7f004643f7ec in __libc_start_main ../csu/libc-start.c:332

Indirect leak of 2024 byte(s) in 36 object(s) allocated from:
    #0 0x7f004742a7cf in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x7f0046158bd2 in nv_alloc_sys /root/zfs/lib/libnvpair/nvpair_alloc_system.c:37
    #2 0x7f0046160b09 in nv_mem_zalloc ../../module/nvpair/nvpair.c:205
    #3 0x7f0046160b09 in nvp_buf_alloc ../../module/nvpair/nvpair.c:634
    #4 0x7f0046160b09 in nvs_decode_pairs ../../module/nvpair/nvpair.c:2403
    #5 0x7f0046160b09 in nvs_operation ../../module/nvpair/nvpair.c:2471
    #6 0x7f004616326e in nvs_embedded ../../module/nvpair/nvpair.c:2516
    #7 0x7f00461604f1 in nvs_decode_pairs ../../module/nvpair/nvpair.c:2406
    #8 0x7f00461604f1 in nvs_operation ../../module/nvpair/nvpair.c:2471
    #9 0x7f004619c178 in nvs_native ../../module/nvpair/nvpair.c:3138
    #10 0x7f004619c178 in nvlist_common ../../module/nvpair/nvpair.c:2661
    #11 0x7f004619c178 in nvlist_xunpack ../../module/nvpair/nvpair.c:2759
    #12 0x7f004714ba0b in zcmd_read_dst_nvlist /root/zfs/lib/libzfs/libzfs_util.c:1237
    #13 0x7f00470ed426 in zpool_get_all_props /root/zfs/lib/libzfs/libzfs_pool.c:94
    #14 0x7f00470f076f in zpool_props_refresh /root/zfs/lib/libzfs/libzfs_pool.c:115
    #15 0x7f00470f26b7 in zpool_get_prop /root/zfs/lib/libzfs/libzfs_pool.c:332
    #16 0x7f00470c6c44 in zfs_prop_get /root/zfs/lib/libzfs/libzfs_dataset.c:2689
    #17 0x7f00470e44c9 in non_descendant_idx /root/zfs/lib/libzfs/libzfs_mount.c:1238
    #18 0x7f00470eaf3d in zfs_foreach_mountpoint /root/zfs/lib/libzfs/libzfs_mount.c:1431
    #19 0x55fa63057342 in main (/tmp/a.out+0x1342)
    #20 0x7f004643f7ec in __libc_start_main ../csu/libc-start.c:332

Indirect leak of 1376 byte(s) in 18 object(s) allocated from:
    #0 0x7f004742a7cf in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x7f0046158bd2 in nv_alloc_sys /root/zfs/lib/libnvpair/nvpair_alloc_system.c:37
    #2 0x7f0046160b09 in nv_mem_zalloc ../../module/nvpair/nvpair.c:205
    #3 0x7f0046160b09 in nvp_buf_alloc ../../module/nvpair/nvpair.c:634
    #4 0x7f0046160b09 in nvs_decode_pairs ../../module/nvpair/nvpair.c:2403
    #5 0x7f0046160b09 in nvs_operation ../../module/nvpair/nvpair.c:2471
    #6 0x7f004619c178 in nvs_native ../../module/nvpair/nvpair.c:3138
    #7 0x7f004619c178 in nvlist_common ../../module/nvpair/nvpair.c:2661
    #8 0x7f004619c178 in nvlist_xunpack ../../module/nvpair/nvpair.c:2759
    #9 0x7f004714ba0b in zcmd_read_dst_nvlist /root/zfs/lib/libzfs/libzfs_util.c:1237
    #10 0x7f00470ed426 in zpool_get_all_props /root/zfs/lib/libzfs/libzfs_pool.c:94
    #11 0x7f00470f076f in zpool_props_refresh /root/zfs/lib/libzfs/libzfs_pool.c:115
    #12 0x7f00470f26b7 in zpool_get_prop /root/zfs/lib/libzfs/libzfs_pool.c:332
    #13 0x7f00470c6c44 in zfs_prop_get /root/zfs/lib/libzfs/libzfs_dataset.c:2689
    #14 0x7f00470e44c9 in non_descendant_idx /root/zfs/lib/libzfs/libzfs_mount.c:1238
    #15 0x7f00470eaf3d in zfs_foreach_mountpoint /root/zfs/lib/libzfs/libzfs_mount.c:1431
    #16 0x55fa63057342 in main (/tmp/a.out+0x1342)
    #17 0x7f004643f7ec in __libc_start_main ../csu/libc-start.c:332

Indirect leak of 1008 byte(s) in 18 object(s) allocated from:
    #0 0x7f004742a7cf in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x7f0046158bd2 in nv_alloc_sys /root/zfs/lib/libnvpair/nvpair_alloc_system.c:37
    #2 0x7f0046162fae in nv_mem_zalloc ../../module/nvpair/nvpair.c:205
    #3 0x7f0046162fae in nv_priv_alloc_embedded ../../module/nvpair/nvpair.c:255
    #4 0x7f0046162fae in nvs_embedded ../../module/nvpair/nvpair.c:2506
    #5 0x7f00461604f1 in nvs_decode_pairs ../../module/nvpair/nvpair.c:2406
    #6 0x7f00461604f1 in nvs_operation ../../module/nvpair/nvpair.c:2471
    #7 0x7f004619c178 in nvs_native ../../module/nvpair/nvpair.c:3138
    #8 0x7f004619c178 in nvlist_common ../../module/nvpair/nvpair.c:2661
    #9 0x7f004619c178 in nvlist_xunpack ../../module/nvpair/nvpair.c:2759
    #10 0x7f004714ba0b in zcmd_read_dst_nvlist /root/zfs/lib/libzfs/libzfs_util.c:1237
    #11 0x7f00470ed426 in zpool_get_all_props /root/zfs/lib/libzfs/libzfs_pool.c:94
    #12 0x7f00470f076f in zpool_props_refresh /root/zfs/lib/libzfs/libzfs_pool.c:115
    #13 0x7f00470f26b7 in zpool_get_prop /root/zfs/lib/libzfs/libzfs_pool.c:332
    #14 0x7f00470c6c44 in zfs_prop_get /root/zfs/lib/libzfs/libzfs_dataset.c:2689
    #15 0x7f00470e44c9 in non_descendant_idx /root/zfs/lib/libzfs/libzfs_mount.c:1238
    #16 0x7f00470eaf3d in zfs_foreach_mountpoint /root/zfs/lib/libzfs/libzfs_mount.c:1431
    #17 0x55fa63057342 in main (/tmp/a.out+0x1342)
    #18 0x7f004643f7ec in __libc_start_main ../csu/libc-start.c:332

Indirect leak of 256 byte(s) in 1 object(s) allocated from:
    #0 0x7f004742a7cf in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x7f0046158bd2 in nv_alloc_sys /root/zfs/lib/libnvpair/nvpair_alloc_system.c:37
    #2 0x7f004615b7a0 in nv_mem_zalloc ../../module/nvpair/nvpair.c:205
    #3 0x7f004615b7a0 in nvt_resize ../../module/nvpair/nvpair.c:378
    #4 0x7f004615d004 in nvt_grow ../../module/nvpair/nvpair.c:433
    #5 0x7f004615d004 in nvt_add_nvpair ../../module/nvpair/nvpair.c:525
    #6 0x7f0046160836 in nvs_decode_pairs ../../module/nvpair/nvpair.c:2417
    #7 0x7f0046160836 in nvs_operation ../../module/nvpair/nvpair.c:2471
    #8 0x7f004619c178 in nvs_native ../../module/nvpair/nvpair.c:3138
    #9 0x7f004619c178 in nvlist_common ../../module/nvpair/nvpair.c:2661
    #10 0x7f004619c178 in nvlist_xunpack ../../module/nvpair/nvpair.c:2759
    #11 0x7f004714ba0b in zcmd_read_dst_nvlist /root/zfs/lib/libzfs/libzfs_util.c:1237
    #12 0x7f00470ed426 in zpool_get_all_props /root/zfs/lib/libzfs/libzfs_pool.c:94
    #13 0x7f00470f076f in zpool_props_refresh /root/zfs/lib/libzfs/libzfs_pool.c:115
    #14 0x7f00470f26b7 in zpool_get_prop /root/zfs/lib/libzfs/libzfs_pool.c:332
    #15 0x7f00470c6c44 in zfs_prop_get /root/zfs/lib/libzfs/libzfs_dataset.c:2689
    #16 0x7f00470e44c9 in non_descendant_idx /root/zfs/lib/libzfs/libzfs_mount.c:1238
    #17 0x7f00470eaf3d in zfs_foreach_mountpoint /root/zfs/lib/libzfs/libzfs_mount.c:1431
    #18 0x55fa63057342 in main (/tmp/a.out+0x1342)
    #19 0x7f004643f7ec in __libc_start_main ../csu/libc-start.c:332

Indirect leak of 56 byte(s) in 1 object(s) allocated from:
    #0 0x7f004742a7cf in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x7f0046158bd2 in nv_alloc_sys /root/zfs/lib/libnvpair/nvpair_alloc_system.c:37
    #2 0x7f004615ebcc in nv_priv_alloc ../../module/nvpair/nvpair.c:237
    #3 0x7f004615ebcc in nvlist_xalloc ../../module/nvpair/nvpair.c:602
    #4 0x7f004615ebcc in nvlist_xalloc ../../module/nvpair/nvpair.c:594
    #5 0x7f004619bbb8 in nvlist_xunpack ../../module/nvpair/nvpair.c:2756
    #6 0x7f004714ba0b in zcmd_read_dst_nvlist /root/zfs/lib/libzfs/libzfs_util.c:1237
    #7 0x7f00470ed426 in zpool_get_all_props /root/zfs/lib/libzfs/libzfs_pool.c:94
    #8 0x7f00470f076f in zpool_props_refresh /root/zfs/lib/libzfs/libzfs_pool.c:115
    #9 0x7f00470f26b7 in zpool_get_prop /root/zfs/lib/libzfs/libzfs_pool.c:332
    #10 0x7f00470c6c44 in zfs_prop_get /root/zfs/lib/libzfs/libzfs_dataset.c:2689
    #11 0x7f00470e44c9 in non_descendant_idx /root/zfs/lib/libzfs/libzfs_mount.c:1238
    #12 0x7f00470eaf3d in zfs_foreach_mountpoint /root/zfs/lib/libzfs/libzfs_mount.c:1431
    #13 0x55fa63057342 in main (/tmp/a.out+0x1342)
    #14 0x7f004643f7ec in __libc_start_main ../csu/libc-start.c:332

SUMMARY: AddressSanitizer: 7048 byte(s) leaked in 93 allocation(s).

root@kasan-test:/tmp# ZFS_SERIAL_MOUNT=1 ./a.out a
0x616000000080, (nil)

Notably, the leak volume is 7048, just like zfs mount -a, and zpool import -a – this is a bug in tpooled zfs_foreach_mountpoint().

Also, under valgrind on my normal system (zfs 2.1.2-1 + some backports, bullseye), I get

nabijaczleweli@tarta:~/uwu$ valgrind --leak-check=full --show-leak-kinds=all ./z
==4110162== Memcheck, a memory error detector
==4110162== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==4110162== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==4110162== Command: ./z
==4110162==
==4110162== Warning: noted but unhandled ioctl 0x5a12 with no size/direction hints.
==4110162==    This could cause spurious value errors to appear.
==4110162==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==4110162== Warning: noted but unhandled ioctl 0x5a05 with no size/direction hints.
==4110162==    This could cause spurious value errors to appear.
==4110162==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
0x4fffce0, (nil)
==4110162==
==4110162== HEAP SUMMARY:
==4110162==     in use at exit: 64 bytes in 2 blocks
==4110162==   total heap usage: 819 allocs, 817 frees, 422,253 bytes allocated
==4110162==
==4110162== 32 bytes in 1 blocks are still reachable in loss record 1 of 2
==4110162==    at 0x483AB65: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4110162==    by 0x48AC7CB: register_fstype (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4110162==    by 0x48AE486: libshare_nfs_init (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4110162==    by 0x486F128: ??? (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4110162==    by 0x400FFE1: call_init.part.0 (dl-init.c:72)
==4110162==    by 0x40100E8: call_init (dl-init.c:30)
==4110162==    by 0x40100E8: _dl_init (dl-init.c:119)
==4110162==    by 0x40010C9: ??? (in /usr/lib/x86_64-linux-gnu/ld-2.31.so)
==4110162==
==4110162== 32 bytes in 1 blocks are still reachable in loss record 2 of 2
==4110162==    at 0x483AB65: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4110162==    by 0x48AC7CB: register_fstype (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4110162==    by 0x48AED96: libshare_smb_init (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4110162==    by 0x400FFE1: call_init.part.0 (dl-init.c:72)
==4110162==    by 0x40100E8: call_init (dl-init.c:30)
==4110162==    by 0x40100E8: _dl_init (dl-init.c:119)
==4110162==    by 0x40010C9: ??? (in /usr/lib/x86_64-linux-gnu/ld-2.31.so)
==4110162==
==4110162== LEAK SUMMARY:
==4110162==    definitely lost: 0 bytes in 0 blocks
==4110162==    indirectly lost: 0 bytes in 0 blocks
==4110162==      possibly lost: 0 bytes in 0 blocks
==4110162==    still reachable: 64 bytes in 2 blocks
==4110162==         suppressed: 0 bytes in 0 blocks
==4110162==
==4110162== For lists of detected and suppressed errors, rerun with: -s
==4110162== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

which is as-expected and

==4111237== Memcheck, a memory error detector
==4111237== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==4111237== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==4111237== Command: ./z a
==4111237== 
==4111237== Warning: noted but unhandled ioctl 0x5a12 with no size/direction hints.
==4111237==    This could cause spurious value errors to appear.
==4111237==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==4111237== Warning: noted but unhandled ioctl 0x5a05 with no size/direction hints.
==4111237==    This could cause spurious value errors to appear.
==4111237==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==4111237== Warning: noted but unhandled ioctl 0x5a27 with no size/direction hints.
==4111237==    This could cause spurious value errors to appear.
==4111237==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
0x4fffce0, (nil)
==4111237== 
==4111237== HEAP SUMMARY:
==4111237==     in use at exit: 8,224 bytes in 110 blocks
==4111237==   total heap usage: 1,043 allocs, 933 frees, 966,933 bytes allocated
==4111237== 
==4111237== 32 bytes in 1 blocks are still reachable in loss record 1 of 9
==4111237==    at 0x483AB65: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4111237==    by 0x48AC7CB: register_fstype (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x48AE486: libshare_nfs_init (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x486F128: ??? (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x400FFE1: call_init.part.0 (dl-init.c:72)
==4111237==    by 0x40100E8: call_init (dl-init.c:30)
==4111237==    by 0x40100E8: _dl_init (dl-init.c:119)
==4111237==    by 0x40010C9: ??? (in /usr/lib/x86_64-linux-gnu/ld-2.31.so)
==4111237==    by 0x1: ???
==4111237==    by 0x1FFF00066E: ???
==4111237==    by 0x1FFF000672: ???
==4111237== 
==4111237== 32 bytes in 1 blocks are still reachable in loss record 2 of 9
==4111237==    at 0x483AB65: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4111237==    by 0x48AC7CB: register_fstype (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x48AED96: libshare_smb_init (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x400FFE1: call_init.part.0 (dl-init.c:72)
==4111237==    by 0x40100E8: call_init (dl-init.c:30)
==4111237==    by 0x40100E8: _dl_init (dl-init.c:119)
==4111237==    by 0x40010C9: ??? (in /usr/lib/x86_64-linux-gnu/ld-2.31.so)
==4111237==    by 0x1: ???
==4111237==    by 0x1FFF00066E: ???
==4111237==    by 0x1FFF000672: ???
==4111237== 
==4111237== 56 bytes in 1 blocks are indirectly lost in loss record 3 of 9
==4111237==    at 0x483877F: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4111237==    by 0x4ABADB3: nvlist_xalloc (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x4ABFAF6: nvlist_xunpack (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x489C4E4: zcmd_read_dst_nvlist (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x488387E: ??? (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x4885697: zpool_get_prop (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x48783A8: zfs_prop_get (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x4881629: ??? (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x488327A: zfs_foreach_mountpoint (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x4011DC: main (z.c:15)
==4111237== 
==4111237== 256 bytes in 1 blocks are indirectly lost in loss record 4 of 9
==4111237==    at 0x483877F: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4111237==    by 0x4ABA8A5: ??? (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x4ABAB4D: ??? (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x4ABB2B8: ??? (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x4ABB5E4: ??? (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x4ABFB2F: nvlist_xunpack (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x489C4E4: zcmd_read_dst_nvlist (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x488387E: ??? (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x4885697: zpool_get_prop (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x48783A8: zfs_prop_get (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x4881629: ??? (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x488327A: zfs_foreach_mountpoint (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237== 
==4111237== 1,176 bytes in 21 blocks are indirectly lost in loss record 5 of 9
==4111237==    at 0x483877F: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4111237==    by 0x4ABB907: ??? (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x4ABB1CC: ??? (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x4ABB5E4: ??? (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x4ABFB2F: nvlist_xunpack (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x489C4E4: zcmd_read_dst_nvlist (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x488387E: ??? (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x4885697: zpool_get_prop (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x48783A8: zfs_prop_get (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x4881629: ??? (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x488327A: zfs_foreach_mountpoint (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x4011DC: main (z.c:15)
==4111237== 
==4111237== 1,608 bytes in 21 blocks are indirectly lost in loss record 6 of 9
==4111237==    at 0x483877F: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4111237==    by 0x4ABB19A: ??? (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x4ABB5E4: ??? (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x4ABFB2F: nvlist_xunpack (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x489C4E4: zcmd_read_dst_nvlist (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x488387E: ??? (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x4885697: zpool_get_prop (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x48783A8: zfs_prop_get (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x4881629: ??? (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x488327A: zfs_foreach_mountpoint (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x4011DC: main (z.c:15)
==4111237== 
==4111237== 2,352 bytes in 42 blocks are indirectly lost in loss record 7 of 9
==4111237==    at 0x483877F: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4111237==    by 0x4ABB19A: ??? (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x4ABB977: ??? (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x4ABB1CC: ??? (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x4ABB5E4: ??? (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x4ABFB2F: nvlist_xunpack (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x489C4E4: zcmd_read_dst_nvlist (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x488387E: ??? (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x4885697: zpool_get_prop (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x48783A8: zfs_prop_get (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x4881629: ??? (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x488327A: zfs_foreach_mountpoint (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237== 
==4111237== 2,688 bytes in 21 blocks are indirectly lost in loss record 8 of 9
==4111237==    at 0x483877F: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4111237==    by 0x4ABAC0F: ??? (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x4ABB2B8: ??? (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x4ABB977: ??? (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x4ABB1CC: ??? (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x4ABB5E4: ??? (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x4ABFB2F: nvlist_xunpack (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x489C4E4: zcmd_read_dst_nvlist (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x488387E: ??? (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x4885697: zpool_get_prop (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x48783A8: zfs_prop_get (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x4881629: ??? (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237== 
==4111237== 8,160 (24 direct, 8,136 indirect) bytes in 1 blocks are definitely lost in loss record 9 of 9
==4111237==    at 0x483877F: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==4111237==    by 0x4ABADE5: nvlist_xalloc (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x4ABFAF6: nvlist_xunpack (in /usr/lib/x86_64-linux-gnu/libnvpair.so.3.0.0)
==4111237==    by 0x489C4E4: zcmd_read_dst_nvlist (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x488387E: ??? (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x4885697: zpool_get_prop (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x48783A8: zfs_prop_get (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x4881629: ??? (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x488327A: zfs_foreach_mountpoint (in /usr/lib/x86_64-linux-gnu/libzfs.so.4.1.0)
==4111237==    by 0x4011DC: main (z.c:15)
==4111237== 
==4111237== LEAK SUMMARY:
==4111237==    definitely lost: 24 bytes in 1 blocks
==4111237==    indirectly lost: 8,136 bytes in 107 blocks
==4111237==      possibly lost: 0 bytes in 0 blocks
==4111237==    still reachable: 64 bytes in 2 blocks
==4111237==         suppressed: 0 bytes in 0 blocks
==4111237== 
==4111237== For lists of detected and suppressed errors, rerun with: -s
==4111237== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

Which matches the ASAN stats.

Update: if you comment out tpool_create/tpool_wait/tpool_destroy in zfs_foreach_mountpoint() and replace the tpool_dispatch line in zfs_dispatch_mount() with zfs_mount_task(mnt_param) it doesn't happen anymore, in any of the three cases above. What da hell. I haven't the foggiest why that'd be!!

@codecov
Copy link

codecov bot commented Jan 12, 2022

Codecov Report

Merging #12928 (d384ac8) into master (161ed82) will increase coverage by 1.67%.
The diff coverage is 63.05%.

❗ Current head d384ac8 differs from pull request most recent head 45bdd45. Consider uploading reports for the commit 45bdd45 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master   #12928      +/-   ##
==========================================
+ Coverage   75.17%   76.85%   +1.67%     
==========================================
  Files         402      403       +1     
  Lines      128071   129826    +1755     
==========================================
+ Hits        96283    99774    +3491     
+ Misses      31788    30052    -1736     
Flag Coverage Δ
kernel 81.01% <100.00%> (+2.25%) ⬆️
user 47.94% <62.50%> (+0.51%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
cmd/mount_zfs/mount_zfs.c 62.91% <0.00%> (-0.85%) ⬇️
cmd/zdb/zdb_il.c 30.86% <ø> (-22.84%) ⬇️
cmd/zed/agents/zfs_agents.c 85.16% <ø> (ø)
cmd/zed/zed_file.c 40.00% <ø> (-2.23%) ⬇️
cmd/zed/zed_log.c 37.17% <ø> (ø)
cmd/zed/zed_strings.c 81.25% <ø> (ø)
cmd/zfs_ids_to_path/zfs_ids_to_path.c 60.00% <0.00%> (ø)
cmd/zinject/zinject.c 37.99% <ø> (+2.95%) ⬆️
cmd/zstream/zstream_redup.c 74.86% <ø> (+15.50%) ⬆️
include/os/freebsd/linux/compiler.h 100.00% <ø> (ø)
... and 272 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7633c0a...45bdd45. Read the comment docs.

@szubersk
Copy link
Contributor Author

@nabijaczleweli
You made quite an investigation there. I played around your findings and established that the leak disappears when either

  • non_descendant_idx is removed altogether from the function (of course this renders the code incorrect for general use cases)
  • there is at least one invocation of it before the first thread is dispatched or after the pool is destroyed

Case 1

diff --git lib/libzfs/libzfs_mount.c lib/libzfs/libzfs_mount.c
index 7959933ed..ac769d9a1 100644
--- lib/libzfs/libzfs_mount.c
+++ lib/libzfs/libzfs_mount.c
@@ -1435,7 +1437,7 @@ zfs_foreach_mountpoint(libzfs_handle_t *hdl, zfs_handle_t **handles,
         * these.
         */
        for (int i = 0; i < num_handles;
-           i = non_descendant_idx(handles, num_handles, i)) {
+           i++) {
                /*
                 * Since the mountpoints have been sorted so that the zoned
                 * filesystems are at the end, a zoned filesystem seen from

Case 2

diff --git lib/libzfs/libzfs_mount.c lib/libzfs/libzfs_mount.c
index 7959933ed..1bc4461a6 100644
--- lib/libzfs/libzfs_mount.c
+++ lib/libzfs/libzfs_mount.c
@@ -1434,6 +1436,7 @@ zfs_foreach_mountpoint(libzfs_handle_t *hdl, zfs_handle_t **handles,
         * root mountpoint, e.g.: /foo /bar. Dispatch a mount task for each of
         * these.
         */
+       printf("%d\n", non_descendant_idx(handles, num_handles, 0));
        for (int i = 0; i < num_handles;
            i = non_descendant_idx(handles, num_handles, i)) {
                /*

behlendorf pushed a commit that referenced this pull request Jan 13, 2022
They're later |=d with constants, but never reset

Caught by valgrind while investigating
#12928 (comment)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12954
@szubersk szubersk changed the title Turn on ASan and UBSan in workflows Add --enable-asan and --enable-ubsan switches Jan 25, 2022
@szubersk szubersk marked this pull request as ready for review January 25, 2022 19:55
@szubersk
Copy link
Contributor Author

@behlendorf, I think this PR is ready for a general review.

@behlendorf behlendorf self-requested a review January 26, 2022 19:49
@behlendorf behlendorf added Status: Code Review Needed Ready for review and testing and removed Status: Work in Progress Not yet ready for general review labels Jan 26, 2022
Copy link
Member

@gmelikov gmelikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM at least for CI's part.

"be specified by using a name containing a colon (:).\n"));
(void) fprintf(fp, "%s", gettext("\nUser-defined properties "
"can be specified by using a name containing a colon "
"(:).\n"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add the "%s" format specifier to just this one fprintf()? I see several others were updated as well, were warnings issues for these in particular?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it gives warnings for those (for me these were a distribution of "non-constant used as format" and "potentially NULL format", both are bogus in this case). OTOH, if there isn't anything to format here, it's better to make this a fputs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is GCC bug. GCC enables additional format string checks when -fsanitize=undefined is used. Unfortunately there are false positives here and there (tested on GCC 10 and 11), e. g. raspberrypi/userland#631 (comment)

It looks incredibly lame, I did not find a better solution/workaround, unfortunately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that's unfortunate but we can live with it.

.github/workflows/zfs-tests-sanity.yml Outdated Show resolved Hide resolved
.github/workflows/zfs-tests-sanity.yml Outdated Show resolved Hide resolved
@@ -631,6 +631,8 @@ fatal(int do_perror, char *message, ...)

(void) fflush(stdout);
buf = umem_alloc(FATAL_MSG_SZ, UMEM_NOFAIL);
if (buf == NULL)
goto out;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UMEM_NOFAIL flag ensures this can't fail. There are several other places in the ztest.c which use this flag and don't check the return value. I'm guessing the unsan checks just flagged this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, UBSan has no way to determine that umem_alloc() won't fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I'm just surprised it only flagged this one case where there are clearly several others.

cmd/zvol_id/zvol_id_main.c Outdated Show resolved Hide resolved
module/icp/algs/modes/gcm.c Outdated Show resolved Hide resolved
module/icp/algs/modes/gcm.c Outdated Show resolved Hide resolved
module/icp/io/sha2_mod.c Outdated Show resolved Hide resolved
@@ -41,8 +41,7 @@ log_must display_status "$TESTPOOL"
#

log_must zfs create -o dedup=on -V 2G $TESTPOOL/$TESTVOL

log_must eval "new_fs $ZVOL_DEVDIR/$TESTPOOL/$TESTVOL >/dev/null 2>&1"
log_must eval "new_fs $ZVOL_DEVDIR/$TESTPOOL/$TESTVOL >/dev/null"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unrelated, but we should add a block_device_wait $ZVOL_DEVDIR/$TESTPOOL/$TESTVOL here to make sure the block device is available before using it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

block_device_wait $ZVOL_DEVDIR/$TESTPOOL/$TESTVOL applied.

Not ignoring stderr gives us much better diagnostic. When the test case failed I had to wonder what were the reasons for file system creation failure and I ended up removing >/dev/null. I know it might be obvious that it is due to missing block device, time saved on inserting diagnostics is time earned.

@@ -35,7 +35,7 @@
DISK1=${DISKS%% *}

log_must zpool create -f $TESTPOOL $DISK1
log_must zpool trim $TESTPOOL
log_must zpool trim -r 1 "$TESTPOOL"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably also unrelated, but address a false positive you were seeing in the test case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes the trim ends before the next command checks if trim is happening.
ASan/UBSan makes really interesting behaviors bubble up :)

`configure` now accepts `--enable-asan` and `--enable-ubsan` switches
which results in passing `-fsanitize=address`
and `-fsanitize=undefined`, respectively, to the compiler. Those
flags are enabled in GitHub workflows for ZTS and zloop. Errors
reported by both instrumentations are corrected, except for:

- Memory leak reporting is (temporarily) suppressed. The cost of
  fixing them is relatively high compared to the gains.

- Checksum computing functions in `module/zcommon/zfs_fletcher*`
  have UBSan errors suppressed. It is completely impractical
  to enforce 64-byte payload alignment there due to performance
  impact.

- There's no ASan heap poisoning in `module/zstd/lib/zstd.c`. A custom
  memory allocator is used there rendering that measure
  unfeasible.

- Memory leaks detection has to be suppressed for `cmd/zvol_id`.
  `zvol_id` is run by udev with the help of `ptrace(2)`. Tracing is
  incompatible with memory leaks detection.

Signed-off-by: szubersk <szuberskidamian@gmail.com>
@szubersk szubersk mentioned this pull request Jan 30, 2022
13 tasks
ghost pushed a commit to truenas/zfs that referenced this pull request Feb 2, 2022
They're later |=d with constants, but never reset

Caught by valgrind while investigating
openzfs#12928 (comment)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#12954
"be specified by using a name containing a colon (:).\n"));
(void) fprintf(fp, "%s", gettext("\nUser-defined properties "
"can be specified by using a name containing a colon "
"(:).\n"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that's unfortunate but we can live with it.

@@ -631,6 +631,8 @@ fatal(int do_perror, char *message, ...)

(void) fflush(stdout);
buf = umem_alloc(FATAL_MSG_SZ, UMEM_NOFAIL);
if (buf == NULL)
goto out;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I'm just surprised it only flagged this one case where there are clearly several others.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Feb 2, 2022
@behlendorf behlendorf merged commit 63652e1 into openzfs:master Feb 3, 2022
tonyhutter pushed a commit that referenced this pull request Feb 3, 2022
They're later |=d with constants, but never reset

Caught by valgrind while investigating
#12928 (comment)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes #12954
@szubersk
Copy link
Contributor Author

szubersk commented Feb 4, 2022

Thank you all for invaluable tips and ideas pitched in this PR!

Things that could be improved/explored later on:

nicman23 pushed a commit to nicman23/zfs that referenced this pull request Aug 22, 2022
They're later |=d with constants, but never reset

Caught by valgrind while investigating
openzfs#12928 (comment)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#12954
nicman23 pushed a commit to nicman23/zfs that referenced this pull request Aug 22, 2022
`configure` now accepts `--enable-asan` and `--enable-ubsan` switches
which results in passing `-fsanitize=address`
and `-fsanitize=undefined`, respectively, to the compiler. Those
flags are enabled in GitHub workflows for ZTS and zloop. Errors
reported by both instrumentations are corrected, except for:

- Memory leak reporting is (temporarily) suppressed. The cost of
  fixing them is relatively high compared to the gains.

- Checksum computing functions in `module/zcommon/zfs_fletcher*`
  have UBSan errors suppressed. It is completely impractical
  to enforce 64-byte payload alignment there due to performance
  impact.

- There's no ASan heap poisoning in `module/zstd/lib/zstd.c`. A custom
  memory allocator is used there rendering that measure
  unfeasible.

- Memory leaks detection has to be suppressed for `cmd/zvol_id`.
  `zvol_id` is run by udev with the help of `ptrace(2)`. Tracing is
  incompatible with memory leaks detection.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes openzfs#12928
nicman23 pushed a commit to nicman23/zfs that referenced this pull request Aug 22, 2022
They're later |=d with constants, but never reset

Caught by valgrind while investigating
openzfs#12928 (comment)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#12954
nicman23 pushed a commit to nicman23/zfs that referenced this pull request Aug 22, 2022
`configure` now accepts `--enable-asan` and `--enable-ubsan` switches
which results in passing `-fsanitize=address`
and `-fsanitize=undefined`, respectively, to the compiler. Those
flags are enabled in GitHub workflows for ZTS and zloop. Errors
reported by both instrumentations are corrected, except for:

- Memory leak reporting is (temporarily) suppressed. The cost of
  fixing them is relatively high compared to the gains.

- Checksum computing functions in `module/zcommon/zfs_fletcher*`
  have UBSan errors suppressed. It is completely impractical
  to enforce 64-byte payload alignment there due to performance
  impact.

- There's no ASan heap poisoning in `module/zstd/lib/zstd.c`. A custom
  memory allocator is used there rendering that measure
  unfeasible.

- Memory leaks detection has to be suppressed for `cmd/zvol_id`.
  `zvol_id` is run by udev with the help of `ptrace(2)`. Tracing is
  incompatible with memory leaks detection.

Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes openzfs#12928
snajpa pushed a commit to vpsfreecz/zfs that referenced this pull request Oct 22, 2022
They're later |=d with constants, but never reset

Caught by valgrind while investigating
openzfs#12928 (comment)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#12954
snajpa pushed a commit to vpsfreecz/zfs that referenced this pull request Oct 22, 2022
They're later |=d with constants, but never reset

Caught by valgrind while investigating
openzfs#12928 (comment)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#12954
snajpa pushed a commit to vpsfreecz/zfs that referenced this pull request Oct 23, 2022
They're later |=d with constants, but never reset

Caught by valgrind while investigating
openzfs#12928 (comment)

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes openzfs#12954
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Run CI tests with AddressSanitizer AddressSanitizer reports use-after-free within zstd mempool
6 participants