Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

closing and reopening eleveldb may deadlock #71

Closed
uwiger opened this issue Sep 3, 2013 · 6 comments
Closed

closing and reopening eleveldb may deadlock #71

uwiger opened this issue Sep 3, 2013 · 6 comments
Milestone

Comments

@uwiger
Copy link

uwiger commented Sep 3, 2013

When testing a patched mnesia using eleveldb as a backend, I noticed that some test cases could hang forever. I believe the problem is as follows:

  1. The test process creates a table (leveldb database instance) and does some reads and writes
  2. The database is closed and deleted (eleveldb:destroy/2 followed by rm -rf ... just to be sure)
  3. The same process reopens the database. In this particular case, the open() consistently hangs on an IO error.

The key is that the 'client' process reads from the database, i.e. using the Ref. If the Ref remains as garbage on the heap when mnesia is restarted (which triggers a lot of work, but not in the calling process), the Ref will not be freed, as the destructor isn't called until the GC clears out the last reference.

Calling erlang:garbage_collect() in the test process before restarting mnesia fixes the problem in this particular case (with luck, adding debug printouts can achieve the same thing by triggering the GC). But it's not safe to assume that the Ref will ever be completely freed by GC, as some processes may perform work and then idle forever without performing the final GC.

One idea is to let a worker thread call AwaitCloseAndDestructor() [1] right after InitiateCloseRequest() has been called, then have it remove the LevelDB env from the magic binary. I assume this would release the LevelDB lock entry?

[1] https://github.com/basho/eleveldb/blob/master/c_src/refobjects.cc#L137

@matthewvon
Copy link
Contributor

What is the full text of the IO error?

@uwiger
Copy link
Author

uwiger commented Oct 27, 2013

I don't think I have it anymore, but as I recall, it was the usual error you get when you try to open an instance that is already in use.

@matthewvon
Copy link
Contributor

I assumed that, but was just trying to make sure.

eleveldb_close() is one of the routines our Erlang experts identified as still being "synchronous" and therefore needing to be reworked. I will attempt to address your concerns in the rework.

@evanmcc evanmcc added this to the 2.1 milestone May 12, 2014
@matthewvon
Copy link
Contributor

@uwiger The eleveldb mv-tuning7 branch coupled with leveldb mv-tuning7 branch includes the long awaited asynchronous close of the database and/or iterator. Give it a whirl if you have time and send me feedback.

@matthewvon
Copy link
Contributor

This is believed fixed on the "develop" branch. Have you retried recently?

@uwiger
Copy link
Author

uwiger commented Aug 15, 2014

From what I could tell when I was testing, it was ok.

@uwiger uwiger closed this as completed Aug 15, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants