Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deferred delete bug (aka Cd8) #170

Merged
merged 4 commits into from
Jul 8, 2014
Merged

Deferred delete bug (aka Cd8) #170

merged 4 commits into from
Jul 8, 2014

Conversation

slfritchie
Copy link
Contributor

This is work leftover from #156.

  • Commit 948fc84 has a non-concurrent EUnit test case for the problem
  • Commit 944cef8 has a fix that Engel despises. :-)

Let the kibbitz commence.

See the `Cd8` PULSE counterexample in #156
@slfritchie slfritchie added Bug and removed Bug labels Jun 4, 2014
@slfritchie slfritchie added this to the 2.0.1 milestone Jun 4, 2014
@slfritchie
Copy link
Contributor Author

cc: @engelsanchez

@evanmcc
Copy link
Contributor

evanmcc commented Jun 4, 2014

Not sure that it's useful, but I do have a branch somewhere where I started
to remove all the deferred delete stuff that I can push. The work is done
and it's half debugged, it'd just need a rebase and the other half of the
debugging.

@evanmcc
Copy link
Contributor

evanmcc commented Jun 4, 2014

In any case, I'll get it pushed in the morning for reference, at least.

@slfritchie
Copy link
Contributor Author

@evanmcc Did you push the deferred delete work that you mentioned a couple weeks ago?

@evanmcc
Copy link
Contributor

evanmcc commented Jun 20, 2014

My recollection is that I did, but it'll have to wait until I get home in
an hour or so to get you the name. Want to say it was something-opt1 and
-opt2. Whichever one we didn't merge earlier.

@evanmcc
Copy link
Contributor

evanmcc commented Jun 23, 2014

@engelsanchez is those more what you were thinking about: https://github.com/basho/bitcask/tree/rm-deferer ?

Passes the unit tests that remain (I didn't know if the unit tests from the old module were valid under the new world, so just deleted them to keep from getting nonsense, save them if you think they have value). I have it pulsing on r2s10, it doesn't trivially fail.

@evanmcc
Copy link
Contributor

evanmcc commented Jun 25, 2014

updated my branch, but there's a few issues.

the main issue I've been running into is ordering between a couple of different events:

  • taking the file list
  • deciding on the fold epoch
  • actually freezing the keydir (so that multiply updated keys are actually recorded at different epochs)

the current code, at hash 48a7ef0, tries to get a stable list by checking if the epoch has changed while the list of files was being opened, but this is trivially DOSed by heavy write load.

the orders I have tried:

  1. (develop): freeze, epoch, filelist
  2. epoch, filelist, freeze
  3. filelist, epoch, freeze
  4. epoch, filelist, epoch, freeze
  5. freeze, epoch, filelist, epoch

I am fairly sure at this point that the freeze needs to come first, since merges don't freeze the keydir, so you can't guarantee that epochs actually mean anything in the case that we care about.

But even with the stabilization code in 3, there is still seemingly room for a fold-breaking race. What needs to happen is making the freeze and the file list semi-atomic? I am going to try a lock real quick, but otherwise I think that I am about out of time.

@evanmcc
Copy link
Contributor

evanmcc commented Jun 25, 2014

locked version is pulsing, see: https://github.com/basho/bitcask/tree/rm-referer2 because apparently I can't spell.

@engelsanchez
Copy link
Contributor

@evanmcc the new bitcask_lockops:acquire/3 in rm-referer2 is missing a terminating clause.

@engelsanchez engelsanchez modified the milestones: 2.0-RC, 2.0.1 Jun 26, 2014
@engelsanchez
Copy link
Contributor

I'm biting the bullet and moving this item to 2.0RC. Also, I'm curious about the counter examples you have found. If they still point the finger at a race involving the set of files in fold, I'm going to try and implement the fold file list/epoch atomicity by adding operations to manage the list of files the keydir points to (either in the current fstats map or a different place). Then when attempting a fold, you can fetch this list atomically with an associated epoch. Then we would try the regular open files operation. If any of those files fails to open, we try again getting an updated file list and epoch, taking care not to re-open files again and closing any opened file that is no longer in the keydir. The same lock would be used when the merger attempts to delete a file. It would request to mark a file as deleted, which would be recorded in the keydir. Then delete it, then remove the file from the delete pending list in the keydir. Basically mimicking what we have now, but in the keydir structure instead.

@engelsanchez engelsanchez self-assigned this Jun 26, 2014
@engelsanchez
Copy link
Contributor

So, deferred deletes are needed in one case, which is probably what the counters are hitting: If the keydir is truly frozen (there is a secondary pending hashmap) during a merge, the merge deleting a file it has merged will potentially removed a file referenced by the most recent version of a value in the entries hash map. Currently, folds work on an epoch based view of the entries. But when the keydir is frozen, we don't allow it to see newer entries in the pending hash. We cap the epoch it could possibly use to the last in the entries hash map. So it may potentially require a file deleted by such a merge. In that case, the delete needs to be postponed until the pending hash map is merged back and we have a single hash map again. Or we could change the semantics of folds.

@slfritchie
Copy link
Contributor Author

Options appear to be:

  1. Change the semantics of fold (quite unlikely for Riak 2.0) and make a major change of Bitcask version number.
  2. Fix the Cd8 deferred delete bug using this PR's method
  3. Fix the Cd8 deferred delete bug using another method

Thoughts? cc: @jonmeredith

@engelsanchez
Copy link
Contributor

I got a bit confused with this issue, so it's worth recapping. There are two issues here I believe. The unit test in the first commit of this PR is about the first:

  • Because deletes are deferred, you may get unlucky and close/re-open a bitcask and start writing a new file with a re-used id that will be deleted by a confused delete worker from a previous merge.
  • Folds use an epoch for snapshotting which may or may not be related to the list of files it opens when the fold starts because it's simply listing the files in the directory and trying to open them all. So it may want a snapshot that requires a file deleted during that time. It is necessary to ensure that the fold has already opened all files needed by that epoch so deletes won't affect it.

Honestly, I think the first point is rare and not worth fixing right now. I wouldn't oppose your "remember last file id on disk" thingie too much, even if ugly, to fix that. But the fold missing values occasionally is terrible in my opinion, and that's what I would like to fix for RC. Perhaps I should open a separate issue for that. I'm trying to code a solution that should guarantee safety at the cost of occasional increase of high percentile latencies. Not 100% sure it will work. I will update the ticket during the day with progress on that. We should chat when you're online tonight/your morning

@slfritchie
Copy link
Contributor Author

I've force-pushed an update. The old file system-based scheme is on a branch called bug/deferred-delete-Cd8-old.

This branch passes EUnit tests (including the new one for this branch) on OS X and Linux. It also doesn't anger Valgrind (on Linux).

}
the_biggest = (global_biggest > keydir->biggest_file_id) ? \
global_biggest : keydir->biggest_file_id;
the_biggest += 100;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the one thing I would like to remove from this patch. I would rather just store the biggest file id as is, without adding some magic number to it. This kind of fudge just helps hide issues for longer than they should and adds strange behavior that would be hard to account for in test validations, etc.

@engelsanchez
Copy link
Contributor

I have finished the code review for this. We can discuss the little issue with the biggest file id fudge, but I feel very strongly about it after recently removing some similar fudgy things from bitcask. For example, the old arbitrary 1 second exclusion window for merge file inclusion and a lenient version check policy in multi-entries, which led to big cleanup efforts once the issues they were hiding were made obvious.

Other than that, looks like we're almost ready! Let's see what PULSE has to say.

@reiddraper
Copy link
Contributor

Holy shit PULSE. You guys rock.

@slfritchie
Copy link
Contributor Author

I'd merged this branch and #175 together and ran it for a bit less than 8 CPU days. No problems, yay.

I'll remove the the_biggest += 100, re-merge, and re-test.

@engelsanchez
Copy link
Contributor

I would argue against incrementing the stored biggest id at all, but I won't argue too much as it's a style/maintenance argument and not about correctness. So feel free to merge if you are satisfied with the extra pulse testing.

👍 ac71627

@slfritchie
Copy link
Contributor Author

👍ac71627

@engelsanchez
Copy link
Contributor

👍 ac71627
Borshop! Listen to reason! :)

borshop added a commit that referenced this pull request Jul 8, 2014
Deferred delete bug (aka Cd8)

Reviewed-by: engelsanchez
@slfritchie
Copy link
Contributor Author

@borshop merge

@borshop borshop merged commit ac71627 into develop Jul 8, 2014
@seancribbs seancribbs deleted the bug/deferred-delete-Cd8 branch April 1, 2015 22:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants