Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use-after-free in dsl_dataset_promote_sync() #16272

Closed
markjdb opened this issue Jun 16, 2024 · 5 comments · Fixed by #16273
Closed

use-after-free in dsl_dataset_promote_sync() #16272

markjdb opened this issue Jun 16, 2024 · 5 comments · Fixed by #16273
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@markjdb
Copy link
Contributor

markjdb commented Jun 16, 2024

System information

Type Version/Name
Distribution Name FreeBSD
Distribution Version 15.0-CURRENT
Kernel Version 517c5854588eaa7c2248d97cd750b8b8bad9d69f
Architecture amd64
OpenZFS Version
zfs-2.2.99-517-FreeBSD_ge2357561b
zfs-kmod-2.2.99-517-FreeBSD_ge2357561b

Describe the problem you're observing

Running the ZFS test suite with KASAN triggered a panic:

panic: ASan: Invalid access, 8-byte read at 0xfffffe00e8bba1e0, UMAUseAfterFree(fd)              
cpuid = 1                                                
time = 1718485401                                                     
KDB: stack backtrace:                                                                                                                                          
db_trace_self_wrapper() at db_trace_self_wrapper+0xa5/frame 0xfffffe00d49a4110                                                                                 
kdb_backtrace() at kdb_backtrace+0xc6/frame 0xfffffe00d49a4270
vpanic() at vpanic+0x226/frame 0xfffffe00d49a4410                     
panic() at panic+0xb5/frame 0xfffffe00d49a44e0                                                                                                                 
kasan_report() at kasan_report+0xdf/frame 0xfffffe00d49a45b0                                                                                                   
dsl_dataset_promote_sync() at dsl_dataset_promote_sync+0x1421/frame 0xfffffe00d49a48e0
dsl_sync_task_sync() at dsl_sync_task_sync+0x17a/frame 0xfffffe00d49a4930
dsl_pool_sync() at dsl_pool_sync+0x8db/frame 0xfffffe00d49a4a50
spa_sync() at spa_sync+0x10f3/frame 0xfffffe00d49a4cd0
txg_sync_thread() at txg_sync_thread+0x7c5/frame 0xfffffe00d49a4ef0
fork_exit() at fork_exit+0xa3/frame 0xfffffe00d49a4f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00d49a4f30
--- trap 0xc, rip = 0x14eda78e331a, rsp = 0x14edb0908eb8, rbp = 0x14edb0908ed0 ---

This corresponds to the dereference of origin_head at the very end of dsl_dataset_promote_sync().

Describe how to reproduce the problem

It is not consistently reproducible, so far I only hit this once.

I believe the problem is that origin_head and hds are not safe to dereference after promote_rele() is called. Either the object IDs should be loaded before the references are released, or references should be released after calling spa_swap_errlog().

@markjdb markjdb added the Type: Defect Incorrect behavior (e.g. crash, hang) label Jun 16, 2024
@markjdb
Copy link
Contributor Author

markjdb commented Jun 16, 2024

This apparent bug was introduced in commit 0409d3327371 ("Improve zpool status output, list all affected datasets"), perhaps @gamanakis could take a look?

@gamanakis
Copy link
Contributor

Thanks for catching this, you are correct, I will submit a PR.

@gamanakis
Copy link
Contributor

@markjdb could you take a look at #16273?
Once it runs through the ZTS I will mark it as non-draft.

@markjdb
Copy link
Contributor Author

markjdb commented Jun 16, 2024

@markjdb could you take a look at #16273? Once it runs through the ZTS I will mark it as non-draft.

It looks fine to me. I applied the patch locally and kicked off another test run with KASAN.

@markjdb
Copy link
Contributor Author

markjdb commented Jun 17, 2024

@markjdb could you take a look at #16273? Once it runs through the ZTS I will mark it as non-draft.

It looks fine to me. I applied the patch locally and kicked off another test run with KASAN.

I didn't see any problems with the patch.

gamanakis added a commit to gamanakis/zfs that referenced this issue Jul 2, 2024
In the commit of the head_errlog feature we introduced a bug in
dsl_dataset_promote_sync(): we may dereference origin_head and hds, both
dereferencing ddpa after calling promote_sync() on ddpa.

Closes: openzfs#16272

Signed-off-by: George Amanakis <gamanakis@gmail.com>
gamanakis added a commit to gamanakis/zfs that referenced this issue Jul 2, 2024
In the commit of the head_errlog feature we introduced a bug in
dsl_dataset_promote_sync(): we may dereference origin_head and hds, both
dereferencing ddpa after calling promote_sync() on ddpa.

Closes: openzfs#16272

Signed-off-by: George Amanakis <gamanakis@gmail.com>
gamanakis added a commit to gamanakis/zfs that referenced this issue Jul 12, 2024
In the commit of the head_errlog feature we introduced a bug in
dsl_dataset_promote_sync(): we may dereference origin_head and hds, both
dereferencing ddpa after calling promote_sync() on ddpa.

Closes: openzfs#16272

Signed-off-by: George Amanakis <gamanakis@gmail.com>
behlendorf pushed a commit that referenced this issue Jul 15, 2024
In the commit of the head_errlog feature we introduced a bug in
dsl_dataset_promote_sync(): we may dereference origin_head and hds, both
dereferencing ddpa after calling promote_sync() on ddpa.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #16272
Closes #16273
calccrypto pushed a commit to hpc/zfs that referenced this issue Jul 17, 2024
In the commit of the head_errlog feature we introduced a bug in
dsl_dataset_promote_sync(): we may dereference origin_head and hds, both
dereferencing ddpa after calling promote_sync() on ddpa.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes openzfs#16272
Closes openzfs#16273
ptr1337 pushed a commit to CachyOS/zfs that referenced this issue Aug 4, 2024
In the commit of the head_errlog feature we introduced a bug in
dsl_dataset_promote_sync(): we may dereference origin_head and hds, both
dereferencing ddpa after calling promote_sync() on ddpa.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes openzfs#16272
Closes openzfs#16273
lundman pushed a commit to openzfsonwindows/openzfs that referenced this issue Sep 4, 2024
In the commit of the head_errlog feature we introduced a bug in
dsl_dataset_promote_sync(): we may dereference origin_head and hds, both
dereferencing ddpa after calling promote_sync() on ddpa.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Reviewed-by: Rob Norris <robn@despairlabs.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes openzfs#16272
Closes openzfs#16273
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants