-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'zfs list' hangs after a pool is suspended #5345
Comments
I think zfs is deadlocked. Take a look at line 193 of the dmesg.txt file I attached. That's the call stack for a 'zpool status' process and it's stuck waiting on a mutex. I can try a 'zpool clear' remotely, but fiddling with the eSATA enclosure will have to wait until I get home this evening. |
Yep: 'zpool clear' hangs, too. |
As there is currently no way to remove a suspended pool: you will have to reboot. All zpool and zfs invocations (even when not related to the suspended pool) will be blocked by the defunct pool. |
OK, I'll go ahead and reboot. What should we do about this issue? Call it a duplicate of #2878 and close it? |
--Your issue interests me because I experienced pretty much the same thing today. Host: Dell Studio 1550 laptop with 8GB RAM, Ubuntu 16.04-64-LTS, running latest kernel 4.4.0-45-generic and ZFS 0.6.5.6-0ubuntu14. --I just got a new Marvell 9128 chipset-based eSATA card for my older laptop (StarTech.com 2 Port SATA 6 Gbps ExpressCard eSATA Controller Card - ECESAT32) and am testing a Probox external eSATA/USB3 case ( Mediasonic ProBox HF2-SU3S2 4 Bay 3.5" SATA HDD Enclosure - USB 3.0 & eSATA Support) with 4x1TB WD RED NAS drives for use as a local ZFS DAS. --Long story short, I started a zpool scrub on the 4-drive zRAID10 and after ~35GB (IIRC), all 4 drives dropped off with I/O errors. Rebooted and made sure the cabling was not being jostled, and now the scrub is still going with good I/O (~25-40MB/sec per drive, with the occasional drop off to ~18MB/sec.) The pool overall is averaging ~60MB/sec. --I should note that this same external case + drive pool passed a scrub with no issues on another system running slower SATA speed (~1.5Gb) and Ubuntu 14.04 with latest kernel just a couple of days ago.
pool: zredtera1
--When the drives dropped off the system, the kernel messages were (see attached file) --Probably the zfs code needs a better way to recover from hardware failure like this, but understandably ZFS wasn't originally written to run on a laptop with a jackleg eSATA array. I hope somebody does fix this issue. |
I encountered an issue last night where 'zfs list' hangs indefinitely. I have traces from 'dmesg' that I've attached. The issue seems to be related to the fact that a pool was suspended for uncorrectable I/O errors while 'zfs list' was running.
Note that I can still read & write to my (other) pools normally. However, I can't use either the 'zfs' or 'zpool' utils any more. My suspicion is that the hung 'zfs' process has a lock on a mutex, but that's just a guess.
This is on 0.6.5.8, running on CentOS7.
Some observations about the attached dmesg traces:
The system is still up and running, as is the hung 'zfs list' process, so if anyone wants more data, I can try to get it. (Also, we don't need to worry about the suspended pool: it was just for testing and was using an eSATA enclosure that I knew was flaky. I'm not particularly surprised it failed.)
dmesg.txt
The text was updated successfully, but these errors were encountered: