Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deadlock in 'zfs rollback' #9203

Merged
merged 1 commit into from
Aug 27, 2019
Merged

Conversation

tcaputi
Copy link
Contributor

@tcaputi tcaputi commented Aug 22, 2019

Currently, the 'zfs rollback' code can end up deadlocked due to
the way the kernel handles unreferenced inodes on a suspended fs.
Essentially, the zfs_resume_fs() code path may cause zfs to spawn
new threads as it reinstantiates the suspended fs's zil. When a
new thread is spawned, the kernel may attempt to free memory for
that thread by freeing some unreferenced inodes. If it happens to
select inodes that are a a part of the suspended fs a deadlock
will occur because freeing inodes requires holding the fs's
z_teardown_inactive_lock which is still held from the suspend.

This patch corrects this issue by adding an additional reference
to all inodes that are still present when a suspend is initiated.
This prevents them from being freed by the kernel for any reason.

Signed-off-by: Tom Caputi tcaputi@datto.com

How Has This Been Tested?

I am currently working to get this tested on some of our machines that are experiencing this problem. However, It is rare and intermittent.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

@tcaputi
Copy link
Contributor Author

tcaputi commented Aug 22, 2019

For reference, these were the stacks of the 2 deadlocked threads:

      1 2d9b1c0a41eeafd0570844ee7b3438d7  /proc/30264/stack
[<ffffffff810a603f>] kthread_create_on_node+0xdf/0x1e0 --------------> waiting on kthreadd
[<ffffffffc06765ea>] spl_kthread_create+0x9a/0xf0 [spl]
[<ffffffffc0676ab9>] taskq_thread_create+0x69/0x110 [spl]
[<ffffffffc0677e7d>] taskq_create+0x1cd/0x400 [spl]
[<ffffffffc0c74232>] zil_open+0x42/0x60 [zfs]
[<ffffffffc0c618f2>] zfsvfs_setup.part.15+0x22/0x160 [zfs] 
[<ffffffffc0c638b1>] zfs_resume_fs+0x271/0x2c0 [zfs] ---------> holding z_teardown_inactive_lock from zfs_suspend_fs
[<ffffffffc0c5328b>] zfs_ioc_rollback+0x10b/0x1b0 [zfs]
[<ffffffffc0c564f4>] zfsdev_ioctl+0x214/0x630 [zfs]
[<ffffffff8122faaf>] do_vfs_ioctl+0x2af/0x4b0
[<ffffffff8122fd29>] SyS_ioctl+0x79/0x90
[<ffffffff8186145b>] entry_SYSCALL_64_fastpath+0x22/0xcb
[<ffffffffffffffff>] 0xffffffffffffffff

      1 2c87d8d638b1a3c61c1b496984000f99  /proc/2/stack
[<ffffffff81418a44>] call_rwsem_down_read_failed+0x14/0x30
[<ffffffffc0c6a04d>] zfs_inactive+0x5d/0x210 [zfs] ---------> wants z_teardown_inactive_lock
[<ffffffffc0c84293>] zpl_evict_inode+0x43/0x60 [zfs]
[<ffffffff812379bf>] evict+0xbf/0x190
[<ffffffff81237ac6>] dispose_list+0x36/0x50
[<ffffffff81238bba>] prune_icache_sb+0x5a/0x80
[<ffffffff8121f462>] super_cache_scan+0x152/0x1b0
[<ffffffff811abddd>] shrink_slab.part.40+0x20d/0x430
[<ffffffff811b05d8>] shrink_zone+0x2c8/0x2e0
[<ffffffff811b074b>] do_try_to_free_pages+0x15b/0x3b0
[<ffffffff811b0a6e>] try_to_free_pages+0xce/0x190
[<ffffffff811a1db2>] __alloc_pages_slowpath.constprop.88+0x302/0xaf0
[<ffffffff811a2828>] __alloc_pages_nodemask+0x288/0x2a0
[<ffffffff811a28db>] alloc_kmem_pages_node+0x4b/0xc0
[<ffffffff81082a79>] copy_process+0x1d9/0x1c30
[<ffffffff81084660>] _do_fork+0x80/0x360
[<ffffffff81084969>] kernel_thread+0x29/0x30
[<ffffffff810a6c08>] kthreadd+0x148/0x190
[<ffffffff818618e5>] ret_from_fork+0x55/0x80
[<ffffffffffffffff>] 0xffffffffffffffff

Copy link
Contributor

@alek-p alek-p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, and aligns with what we talked about previously

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Aug 22, 2019
if (zp->z_suspended) {
iput(ZTOI(zp));
zp->z_suspended = B_FALSE;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling iput() here directly isn't safe since the inode can be freed by iput()->iput_final() when the last reference is dropped. As part of that destroy/free ->zfs_inode_destroy() is called which will take the z_znodes_lock lock resulting in a deadlock. You're going to need to call zfs_iput_async() after the zfs_rezget() instead. This ensures the iput() is handled asynchronously by a taskq thread when it's the last reference.

@codecov
Copy link

codecov bot commented Aug 22, 2019

Codecov Report

Merging #9203 into master will decrease coverage by 0.17%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9203      +/-   ##
==========================================
- Coverage   79.19%   79.02%   -0.18%     
==========================================
  Files         400      400              
  Lines      122002   122008       +6     
==========================================
- Hits        96625    96420     -205     
- Misses      25377    25588     +211
Flag Coverage Δ
#kernel 79.71% <100%> (-0.09%) ⬇️
#user 66.74% <ø> (-0.2%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4302698...2816cec. Read the comment docs.

Currently, the 'zfs rollback' code can end up deadlocked due to
the way the kernel handles unreferenced inodes on a suspended fs.
Essentially, the zfs_resume_fs() code path may cause zfs to spawn
new threads as it reinstantiates the suspended fs's zil. When a
new thread is spawned, the kernel may attempt to free memory for
that thread by freeing some unreferenced inodes. If it happens to
select inodes that are a a part of the suspended fs a deadlock
will occur because freeing inodes requires holding the fs's
z_teardown_inactive_lock which is still held from the suspend.

This patch corrects this issue by adding an additional reference
to all inodes that are still present when a suspend is initiated.
This prevents them from being freed by the kernel for any reason.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Aug 26, 2019
@behlendorf behlendorf merged commit e7a2fa7 into openzfs:master Aug 27, 2019
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Sep 17, 2019
Currently, the 'zfs rollback' code can end up deadlocked due to
the way the kernel handles unreferenced inodes on a suspended fs.
Essentially, the zfs_resume_fs() code path may cause zfs to spawn
new threads as it reinstantiates the suspended fs's zil. When a
new thread is spawned, the kernel may attempt to free memory for
that thread by freeing some unreferenced inodes. If it happens to
select inodes that are a a part of the suspended fs a deadlock
will occur because freeing inodes requires holding the fs's
z_teardown_inactive_lock which is still held from the suspend.

This patch corrects this issue by adding an additional reference
to all inodes that are still present when a suspend is initiated.
This prevents them from being freed by the kernel for any reason.

Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes openzfs#9203
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Sep 18, 2019
Currently, the 'zfs rollback' code can end up deadlocked due to
the way the kernel handles unreferenced inodes on a suspended fs.
Essentially, the zfs_resume_fs() code path may cause zfs to spawn
new threads as it reinstantiates the suspended fs's zil. When a
new thread is spawned, the kernel may attempt to free memory for
that thread by freeing some unreferenced inodes. If it happens to
select inodes that are a a part of the suspended fs a deadlock
will occur because freeing inodes requires holding the fs's
z_teardown_inactive_lock which is still held from the suspend.

This patch corrects this issue by adding an additional reference
to all inodes that are still present when a suspend is initiated.
This prevents them from being freed by the kernel for any reason.

Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes openzfs#9203
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Sep 18, 2019
Currently, the 'zfs rollback' code can end up deadlocked due to
the way the kernel handles unreferenced inodes on a suspended fs.
Essentially, the zfs_resume_fs() code path may cause zfs to spawn
new threads as it reinstantiates the suspended fs's zil. When a
new thread is spawned, the kernel may attempt to free memory for
that thread by freeing some unreferenced inodes. If it happens to
select inodes that are a a part of the suspended fs a deadlock
will occur because freeing inodes requires holding the fs's
z_teardown_inactive_lock which is still held from the suspend.

This patch corrects this issue by adding an additional reference
to all inodes that are still present when a suspend is initiated.
This prevents them from being freed by the kernel for any reason.

Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes openzfs#9203
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Sep 18, 2019
Currently, the 'zfs rollback' code can end up deadlocked due to
the way the kernel handles unreferenced inodes on a suspended fs.
Essentially, the zfs_resume_fs() code path may cause zfs to spawn
new threads as it reinstantiates the suspended fs's zil. When a
new thread is spawned, the kernel may attempt to free memory for
that thread by freeing some unreferenced inodes. If it happens to
select inodes that are a a part of the suspended fs a deadlock
will occur because freeing inodes requires holding the fs's
z_teardown_inactive_lock which is still held from the suspend.

This patch corrects this issue by adding an additional reference
to all inodes that are still present when a suspend is initiated.
This prevents them from being freed by the kernel for any reason.

Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes openzfs#9203
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Sep 19, 2019
Currently, the 'zfs rollback' code can end up deadlocked due to
the way the kernel handles unreferenced inodes on a suspended fs.
Essentially, the zfs_resume_fs() code path may cause zfs to spawn
new threads as it reinstantiates the suspended fs's zil. When a
new thread is spawned, the kernel may attempt to free memory for
that thread by freeing some unreferenced inodes. If it happens to
select inodes that are a a part of the suspended fs a deadlock
will occur because freeing inodes requires holding the fs's
z_teardown_inactive_lock which is still held from the suspend.

This patch corrects this issue by adding an additional reference
to all inodes that are still present when a suspend is initiated.
This prevents them from being freed by the kernel for any reason.

Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes openzfs#9203
@alek-p alek-p mentioned this pull request Sep 20, 2019
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Sep 23, 2019
Currently, the 'zfs rollback' code can end up deadlocked due to
the way the kernel handles unreferenced inodes on a suspended fs.
Essentially, the zfs_resume_fs() code path may cause zfs to spawn
new threads as it reinstantiates the suspended fs's zil. When a
new thread is spawned, the kernel may attempt to free memory for
that thread by freeing some unreferenced inodes. If it happens to
select inodes that are a a part of the suspended fs a deadlock
will occur because freeing inodes requires holding the fs's
z_teardown_inactive_lock which is still held from the suspend.

This patch corrects this issue by adding an additional reference
to all inodes that are still present when a suspend is initiated.
This prevents them from being freed by the kernel for any reason.

Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes openzfs#9203
tonyhutter pushed a commit that referenced this pull request Sep 26, 2019
Currently, the 'zfs rollback' code can end up deadlocked due to
the way the kernel handles unreferenced inodes on a suspended fs.
Essentially, the zfs_resume_fs() code path may cause zfs to spawn
new threads as it reinstantiates the suspended fs's zil. When a
new thread is spawned, the kernel may attempt to free memory for
that thread by freeing some unreferenced inodes. If it happens to
select inodes that are a a part of the suspended fs a deadlock
will occur because freeing inodes requires holding the fs's
z_teardown_inactive_lock which is still held from the suspend.

This patch corrects this issue by adding an additional reference
to all inodes that are still present when a suspend is initiated.
This prevents them from being freed by the kernel for any reason.

Reviewed-by: Alek Pinchuk <apinchuk@datto.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #9203
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants