Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"zpool export <zpoolname>" on a faulted zpool hangs and blocks other zpool commands after that #6649

Open
sanjeevbagewadi opened this issue Sep 15, 2017 · 18 comments
Assignees
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@sanjeevbagewadi
Copy link
Contributor

System information

Type Version/Name
Distribution Name centos
Distribution Version CentOS release 6.8 (Final)
Linux Kernel 4.4.14-1.el6
Architecture x86_64
ZFS Version 0.7.1
SPL Version 0.7.1

Describe the problem you're observing

Had a zpool (with failmode=wait) has entered a degraded/faulted state due to IO failures.
Issued a "zpool export" and that blocked as below :

crash> bt 0xffff8801e79e9540
PID: 5478 TASK: ffff8801e79e9540 CPU: 1 COMMAND: "zpool"
#0 [ffff88032e1a7bf0] __schedule at ffffffff816cb514
#1 [ffff88032e1a7ca0] schedule at ffffffff816cbc10
#2 [ffff88032e1a7cc0] cv_wait_common at ffffffffa07cf845 [spl]
#3 [ffff88032e1a7d40] __cv_wait at ffffffffa07cf8d5 [spl]
#4 [ffff88032e1a7d50] txg_wait_synced at ffffffffa08f3919 [zfs]
#5 [ffff88032e1a7da0] spa_export_common at ffffffffa08e3dc0 [zfs]
#6 [ffff88032e1a7e00] spa_export at ffffffffa08e407b [zfs]
#7 [ffff88032e1a7e10] zfs_ioc_pool_export at ffffffffa0924d7f [zfs]
#8 [ffff88032e1a7e40] zfsdev_ioctl at ffffffffa09277d4 [zfs]
#9 [ffff88032e1a7eb0] do_vfs_ioctl at ffffffff81216072
#10 [ffff88032e1a7f00] sys_ioctl at ffffffff81216402
#11 [ffff88032e1a7f50] entry_SYSCALL_64_fastpath at ffffffff816cf76e
RIP: 00007f40aaff1a77 RSP: 00007fff29addca8 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000c491a0 RCX: 00007f40aaff1a77
RDX: 00007fff29addcc0 RSI: 0000000000005a03 RDI: 0000000000000003
RBP: 00007fff29adda70 R8: 6338383337336261 R9: 3566353033323238
R10: 00007fff29adda30 R11: 0000000000000246 R12: 0000000000000006
R13: 00007fff29addb50 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b

Unfortunately, the spa_sync() will not complete because, the vdisk is faulted. And until
a "zpool online" is issued, it will not progress. However, the "zpool export" is holding the
spa_namespace_lock in WRITE mode and hence other command will block as below :

crash> bt 5582
PID: 5582 TASK: ffff88008ffc5500 CPU: 2 COMMAND: "zpool"
#0 [ffff8802a7e1fb80] __schedule at ffffffff816cb514
#1 [ffff8802a7e1fc30] schedule at ffffffff816cbc10
#2 [ffff8802a7e1fc50] schedule_preempt_disabled at ffffffff816cbe4e
#3 [ffff8802a7e1fc60] __mutex_lock_slowpath at ffffffff816cd440
#4 [ffff8802a7e1fd00] mutex_lock at ffffffff816cd4f3
#5 [ffff8802a7e1fd20] spa_open_common at ffffffffa08e64a3 [zfs]
#6 [ffff8802a7e1fda0] spa_get_stats at ffffffffa08e6909 [zfs]
#7 [ffff8802a7e1fe00] zfs_ioc_pool_stats at ffffffffa0924c11 [zfs]
#8 [ffff8802a7e1fe40] zfsdev_ioctl at ffffffffa09277d4 [zfs]
#9 [ffff8802a7e1feb0] do_vfs_ioctl at ffffffff81216072
#10 [ffff8802a7e1ff00] sys_ioctl at ffffffff81216402
#11 [ffff8802a7e1ff50] entry_SYSCALL_64_fastpath at ffffffff816cf76e
RIP: 00007fd4730c5a77 RSP: 00007fffa8889408 RFLAGS: 00000202
RAX: ffffffffffffffda RBX: 00007fd473373120 RCX: 00007fd4730c5a77
RDX: 00007fffa8889430 RSI: 0000000000005a05 RDI: 0000000000000004
RBP: 0000000000772f80 R8: 0000000000000008 R9: 0000000001e00000
R10: 00007fffa8889190 R11: 0000000000000202 R12: 0000000000020090
R13: 0000000000772f70 R14: 0000000000010000 R15: 00007fd473373120
ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b

And hence, all zpool commands will block on spa_namespace_lock.

Probably, it is better for spa_export_common() to wait on txg_wait_synced() without holding the spa_namespace_lock.

Describe how to reproduce the problem

Here are the steps to reproduce the problem :

  • inject IO faults using zinject
    zinject -a -d /dev/sdz -e io zpool-1
  • Trigger some IO to the zpool
  • Wait for the zpool to get into degraded state
  • Try to export the zpool : zpool export zpool-1
    This command will hang.
    At this point all other zpool command (e.g.zpool list, zpool status) will hang.

Include any warning/errors/backtraces from the system logs

@sanjeevbagewadi
Copy link
Contributor Author

sanjeevbagewadi commented Sep 15, 2017

Here is a possible fix which I have tested and found to be working :

diff --git a/module/zfs/spa.c b/module/zfs/spa.c
index 1977d4e..c3c4891 100644
--- a/module/zfs/spa.c
+++ b/module/zfs/spa.c
@@ -4615,21 +4615,28 @@ spa_export_common(char *pool, int new_state, nvlist_t **oldconfig,
                zvol_remove_minors(spa, spa_name(spa), B_TRUE);
                taskq_wait(spa->spa_zvol_taskq);
        }
-       mutex_enter(&spa_namespace_lock);
-       spa_close(spa, FTAG);

-       if (spa->spa_state == POOL_STATE_UNINITIALIZED)
-               goto export_spa;
        /*
         * The pool will be in core if it's openable, in which case we can
         * modify its state.  Objsets may be open only because they're dirty,
         * so we have to force it to sync before checking spa_refcnt.
+        *
+        * If the pool is syncing wait for it to complete. This could take long
+        * or may never complete if the IO is suspended (spa_suspended = 1).
+        * Hence, it is best done without holding the spa_namespace_lock to
+        * avoid blocking other operations which need spa_namespace_lock.
         */
-       if (spa->spa_sync_on) {
+       if (spa->spa_state != POOL_STATE_UNINITIALIZED && spa->spa_sync_on) {
                txg_wait_synced(spa->spa_dsl_pool, 0);
                spa_evicting_os_wait(spa);
        }

+       mutex_enter(&spa_namespace_lock);
+       spa_close(spa, FTAG);
+
+       if (spa->spa_state == POOL_STATE_UNINITIALIZED)
+               goto export_spa;
+
        /*
         * A pool cannot be exported or destroyed if there are active
         * references.  If we are resetting a pool, allow references by

@behlendorf
Copy link
Contributor

@sanjeevbagewadi when you open a PR for this please make sure you add your reproducer above to a test case for the ZFS Test Suite.

@mailinglists35
Copy link

mailinglists35 commented Feb 3, 2018

@behlendorf I don't see the fix in @sanjeevbagewadi's zfs fork, it appears like he tested locally without commiting to github.
would it be difficult for some project member to take a look at his apparently small modifications above and try to create a PR?

@sanjeevbagewadi
Copy link
Contributor Author

sanjeevbagewadi commented Feb 5, 2018

@behlendorf , This slipped through the cracks. It is pending review internally and we did not get to it. This fix only allows other zpools commands to continue. It will not help export the faulted zpool. We might need additional work for that. I will generate a pull request soon.

@stale
Copy link

stale bot commented Aug 25, 2020

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Aug 25, 2020
@mailinglists35
Copy link

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

stale but still happening

@stale stale bot removed the Status: Stale No recent activity for issue label Aug 26, 2020
@behlendorf behlendorf added the Type: Defect Incorrect behavior (e.g. crash, hang) label Aug 26, 2020
@mailinglists35
Copy link

mailinglists35 commented Sep 30, 2020

@behlendorf

"At this point all other zpool command (e.g.zpool list, zpool status) will hang."

Is there a way to make zpool command work for other pools when one pool is suspended due to I/O hangs? (z* processes in D state).

I have a simple to reproduce case where one external pool failure breaks the whole working server:

Connect an external usb pool then disconnect the power of that disk.
kernel will log "WARNING: Pool 'backup' has encountered an uncorrectable I/O failure and has been suspended"
now wait for some cron jobs to kick in and let more processes build up in the D state (rsync etc)

result: ANY zpool command hangs from this point on. you can't do anything with it until you reboot or miraculously the I/O times out (which almost never happens, I see D state from several days ago). you can keep using the existing zfs mounts but cannot manage any pools. for example cron commands that rely on 'zpool status -x' will queue up as D state processes. some zfs commands appear to work, though (such as zfs list)

all of these D state processes (zpool, samba etc) ultimately depend on a txg_sync process in D state:

cat /proc/26767/stack
[<0>] spa_errlog_sync+0xe1/0x2b0 [zfs]
[<0>] spa_sync+0x560/0xfb0 [zfs]
[<0>] txg_sync_thread+0x2ba/0x4b0 [zfs]
[<0>] thread_generic_wrapper+0x74/0x90 [spl]
[<0>] kthread+0x121/0x140
[<0>] ret_from_fork+0x35/0x40
[<0>] 0xffffffffffffffff

even if the power has returned to the external usb pool disk, I cannot issue 'zpool online backup' nor 'zpool clear backup' because there are 100 other 'zpool status' processes queued in D state (and rsync, and samba)

@behlendorf
Copy link
Contributor

This is something I know multiple people have looked in to, I'm sure it's doable but it's surprisingly tricky. There's been some recent renewed interest and work towards being able to export a suspended pool but no patches for review just yet.

@mailinglists35
Copy link

able to export/unload a suspended pool would be awesome, but meanwhile can at least be solved the case where one faulted pool blocks zpool commands to work on other online pools (ie... currently zpool status/list hangs as well)?

@thoro
Copy link

thoro commented Oct 26, 2020

I just ran into the same issue, zfs with a iscsi volume below, which was removed a while ago.

Somehow the pool never even faulted since no IO was running on it, but now I tried to clean it up and that lead to a total hang of everything ZFS related.

So, it should be split in two separate issues:

  • Allow operations on other pools without blocking by the broken one (separate mutex?)
  • Allow to export / unload a broken/suspended pool, or at least timeout the command?

Is there now anything I can do, except reboot the server? Will that even work since all zfs related commands lead to a D state process?

@devsk
Copy link

devsk commented Jul 18, 2021

I have run into this issue as well today as well (my external back disk hung on some USB bug). In my opinion, any single failure that brings down the whole server should be a P0. Can someone please at least implement the minimalistic change of allowing other commands to work?

@mailinglists35
Copy link

mailinglists35 commented Aug 9, 2021

@devsk watch for this when it gets merged #5242 (comment)

@kt97679
Copy link

kt97679 commented Aug 7, 2022

Folks, I'd very much appreciate if this issue will be fixed, thank you!

@Haravikk
Copy link

Just ran into this issue myself, and it's basically locked everything; only way I've found to resolve it is a full restart which is hardly ideal!

I'm not so bothered about zpool export blocking until it can complete, but if it's holding up the commands necessary for that to happen then it's a pretty major flaw.

@devZer0
Copy link

devZer0 commented Feb 16, 2023

folks sorry to tell, but this sucks really really big. i had numerous reboots becaus of this issue, which create far more hassle then this issue itself.

and even worse - you cannot cleanly reboot a system with hanging zpool, because things get stuck on shutdown. you need to do hard reset !!!

please add some logic to atleast avoid hang of zpool/zfs command

@Haravikk
Copy link

please add some logic to atleast avoid hang of zpool/zfs command

It's currently being worked on, you can track the progress on issue #11082. I believe the actual code to support this is implemented but it's currently failing some tests; hopefully once those last few bugs are resolved it can be rolled out.

@devZer0
Copy link

devZer0 commented Feb 17, 2023

thank you for the update!

@compumatter
Copy link

I've got the same problem today. A clients server backup disk died. The disk is no longer visible by the system. I tried to run an sudo zpool export -f backuppool and it just hung indefinitely. At some point I ran zfs destroy -r backuppool
as I am not an expert at zfs and though perhaps datasets existed that needed to be deleted first and it said
cannot open 'backuppool': pool I/O is currently suspended

Indeed from that point forward any command using zfs pool just didn't work. It hung. I am now at a remote location and had left the clients office. So I was left with no choice but a remote reboot. The server is now not coming up via ssh so having the client send me a pic of what's on the screen.
image

Fortunately the system eventually came back online. It took a solid 10 minutes but it worked without a hard shut off.

zpool status shows the backuppool no longer exists so mission completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

9 participants