-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deferred delete bug (aka Cd8) #170
Conversation
See the `Cd8` PULSE counterexample in #156
cc: @engelsanchez |
Not sure that it's useful, but I do have a branch somewhere where I started |
In any case, I'll get it pushed in the morning for reference, at least. |
@evanmcc Did you push the deferred delete work that you mentioned a couple weeks ago? |
My recollection is that I did, but it'll have to wait until I get home in |
@engelsanchez is those more what you were thinking about: https://github.com/basho/bitcask/tree/rm-deferer ? Passes the unit tests that remain (I didn't know if the unit tests from the old module were valid under the new world, so just deleted them to keep from getting nonsense, save them if you think they have value). I have it pulsing on r2s10, it doesn't trivially fail. |
updated my branch, but there's a few issues. the main issue I've been running into is ordering between a couple of different events:
the current code, at hash 48a7ef0, tries to get a stable list by checking if the epoch has changed while the list of files was being opened, but this is trivially DOSed by heavy write load. the orders I have tried:
I am fairly sure at this point that the freeze needs to come first, since merges don't freeze the keydir, so you can't guarantee that epochs actually mean anything in the case that we care about. But even with the stabilization code in 3, there is still seemingly room for a fold-breaking race. What needs to happen is making the freeze and the file list semi-atomic? I am going to try a lock real quick, but otherwise I think that I am about out of time. |
locked version is pulsing, see: https://github.com/basho/bitcask/tree/rm-referer2 because apparently I can't spell. |
@evanmcc the new bitcask_lockops:acquire/3 in rm-referer2 is missing a terminating clause. |
I'm biting the bullet and moving this item to 2.0RC. Also, I'm curious about the counter examples you have found. If they still point the finger at a race involving the set of files in fold, I'm going to try and implement the fold file list/epoch atomicity by adding operations to manage the list of files the keydir points to (either in the current fstats map or a different place). Then when attempting a fold, you can fetch this list atomically with an associated epoch. Then we would try the regular open files operation. If any of those files fails to open, we try again getting an updated file list and epoch, taking care not to re-open files again and closing any opened file that is no longer in the keydir. The same lock would be used when the merger attempts to delete a file. It would request to mark a file as deleted, which would be recorded in the keydir. Then delete it, then remove the file from the delete pending list in the keydir. Basically mimicking what we have now, but in the keydir structure instead. |
So, deferred deletes are needed in one case, which is probably what the counters are hitting: If the keydir is truly frozen (there is a secondary pending hashmap) during a merge, the merge deleting a file it has merged will potentially removed a file referenced by the most recent version of a value in the entries hash map. Currently, folds work on an epoch based view of the entries. But when the keydir is frozen, we don't allow it to see newer entries in the pending hash. We cap the epoch it could possibly use to the last in the entries hash map. So it may potentially require a file deleted by such a merge. In that case, the delete needs to be postponed until the pending hash map is merged back and we have a single hash map again. Or we could change the semantics of folds. |
Options appear to be:
Thoughts? cc: @jonmeredith |
I got a bit confused with this issue, so it's worth recapping. There are two issues here I believe. The unit test in the first commit of this PR is about the first:
Honestly, I think the first point is rare and not worth fixing right now. I wouldn't oppose your "remember last file id on disk" thingie too much, even if ugly, to fix that. But the fold missing values occasionally is terrible in my opinion, and that's what I would like to fix for RC. Perhaps I should open a separate issue for that. I'm trying to code a solution that should guarantee safety at the cost of occasional increase of high percentile latencies. Not 100% sure it will work. I will update the ticket during the day with progress on that. We should chat when you're online tonight/your morning |
I've force-pushed an update. The old file system-based scheme is on a branch called bug/deferred-delete-Cd8-old. This branch passes EUnit tests (including the new one for this branch) on OS X and Linux. It also doesn't anger Valgrind (on Linux). |
} | ||
the_biggest = (global_biggest > keydir->biggest_file_id) ? \ | ||
global_biggest : keydir->biggest_file_id; | ||
the_biggest += 100; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the one thing I would like to remove from this patch. I would rather just store the biggest file id as is, without adding some magic number to it. This kind of fudge just helps hide issues for longer than they should and adds strange behavior that would be hard to account for in test validations, etc.
I have finished the code review for this. We can discuss the little issue with the biggest file id fudge, but I feel very strongly about it after recently removing some similar fudgy things from bitcask. For example, the old arbitrary 1 second exclusion window for merge file inclusion and a lenient version check policy in multi-entries, which led to big cleanup efforts once the issues they were hiding were made obvious. Other than that, looks like we're almost ready! Let's see what PULSE has to say. |
Holy shit PULSE. You guys rock. |
I'd merged this branch and #175 together and ran it for a bit less than 8 CPU days. No problems, yay. I'll remove the |
I would argue against incrementing the stored biggest id at all, but I won't argue too much as it's a style/maintenance argument and not about correctness. So feel free to merge if you are satisfied with the extra pulse testing. 👍 ac71627 |
👍ac71627 |
👍 ac71627 |
Deferred delete bug (aka Cd8) Reviewed-by: engelsanchez
@borshop merge |
This is work leftover from #156.
Let the kibbitz commence.