-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock between MakeRoomForWrite() and AwaitCloseAndDestructor() #108
Comments
"LOG" file contents:
|
If I remove an exit() call on the Erlang side, I see a deadlock involving only 1 thread: at the same place as above, the |
Does @matthewvon How do you reproduce the issue @slfritchie ? |
I'm fighting QuickCheck to get a somewhat-deterministic case. |
Don't waste your time making a deterministic case. I see what is happening … shutdown with pending writes and full disk. Compaction knows to stop attempting compactions, but write thread does not know to quit waiting upon compactions. |
Add: if (!shutting_down_.Acquire_Load()) before each of the bg_cv_.Wait() calls in MakeRoomForWrite() … and all will be fine. |
Matthew, I suspect that there's a larger problem, and the 2-thread case above got caught in it? The larger problem appears to be that a I've added a new command to the QuickCheck model that will write 10s/100s/1000s of keys sequentially (without other model operations interleaved). The new command is called 'put_filler'. An example test case looks like:
Without fault injection, this kind of test case has no problem: I can run for hours without a single problem. However, it's impossible to go for 1 minute without a problem caused by the Stack traces from all threads is available at https://gist.github.com/slfritchie/fb238f5c48ff57788507 The "LOG" file looks like this after getting stuck:
|
Drat, I forgot to mention that the comment immediately above also has your
|
If I inject faults only into pread(2) at a 7%, I can get the same hang inside |
Ditto for only Failing |
To recreate on OS X:
Then:
The string |
Did you wait a full 60 seconds to see if errors unblocked? Level db holds the failed compaction thread for60 seconds. That holds writes on conditions too. |
Wow, that's a cool feature. Yes, I'm pretty certain that I've gone to lunch and hand the hang remain when I returned. I haven't yet learned fully to eat at Tokyo Speed(™), so I believe that I've exceeded the 60 second wow-that's-cool-latency-waiting-for-the-referee-to-ring-the-bell-for-the-next-round period. |
Yup, more than 60 seconds.
8:12:21 pm
8:21:26pm |
leveldb branch mv-imm-retry |
+1, many thanks. |
eleveldb revision: 639f69c
leveldb revision: d553b6c4bf525c0301b3d2f128be0a63df930ee4
I see a deadlock that immobilizes the entire scheduler thread. The first suspicious stacktrace is:
And the second:
The text was updated successfully, but these errors were encountered: