Deferred delete bug (aka Cd8) #170

slfritchie · 2014-06-04T04:00:55Z

This is work leftover from #156.

Commit 948fc84 has a non-concurrent EUnit test case for the problem
Commit 944cef8 has a fix that Engel despises. :-)

Let the kibbitz commence.

See the `Cd8` PULSE counterexample in #156

slfritchie · 2014-06-04T04:01:50Z

evanmcc · 2014-06-04T04:49:40Z

Not sure that it's useful, but I do have a branch somewhere where I started
to remove all the deferred delete stuff that I can push. The work is done
and it's half debugged, it'd just need a rebase and the other half of the
debugging.

evanmcc · 2014-06-04T04:50:18Z

In any case, I'll get it pushed in the morning for reference, at least.

slfritchie · 2014-06-20T02:16:06Z

@evanmcc Did you push the deferred delete work that you mentioned a couple weeks ago?

evanmcc · 2014-06-20T02:19:13Z

My recollection is that I did, but it'll have to wait until I get home in
an hour or so to get you the name. Want to say it was something-opt1 and
-opt2. Whichever one we didn't merge earlier.

evanmcc · 2014-06-23T19:25:28Z

@engelsanchez is those more what you were thinking about: https://github.com/basho/bitcask/tree/rm-deferer ?

Passes the unit tests that remain (I didn't know if the unit tests from the old module were valid under the new world, so just deleted them to keep from getting nonsense, save them if you think they have value). I have it pulsing on r2s10, it doesn't trivially fail.

evanmcc · 2014-06-25T18:49:34Z

updated my branch, but there's a few issues.

the main issue I've been running into is ordering between a couple of different events:

taking the file list
deciding on the fold epoch
actually freezing the keydir (so that multiply updated keys are actually recorded at different epochs)

the current code, at hash 48a7ef0, tries to get a stable list by checking if the epoch has changed while the list of files was being opened, but this is trivially DOSed by heavy write load.

the orders I have tried:

(develop): freeze, epoch, filelist
epoch, filelist, freeze
filelist, epoch, freeze
epoch, filelist, epoch, freeze
freeze, epoch, filelist, epoch

I am fairly sure at this point that the freeze needs to come first, since merges don't freeze the keydir, so you can't guarantee that epochs actually mean anything in the case that we care about.

But even with the stabilization code in 3, there is still seemingly room for a fold-breaking race. What needs to happen is making the freeze and the file list semi-atomic? I am going to try a lock real quick, but otherwise I think that I am about out of time.

evanmcc · 2014-06-25T19:59:35Z

locked version is pulsing, see: https://github.com/basho/bitcask/tree/rm-referer2 because apparently I can't spell.

engelsanchez · 2014-06-26T15:20:56Z

@evanmcc the new bitcask_lockops:acquire/3 in rm-referer2 is missing a terminating clause.

engelsanchez · 2014-06-26T19:16:42Z

I'm biting the bullet and moving this item to 2.0RC. Also, I'm curious about the counter examples you have found. If they still point the finger at a race involving the set of files in fold, I'm going to try and implement the fold file list/epoch atomicity by adding operations to manage the list of files the keydir points to (either in the current fstats map or a different place). Then when attempting a fold, you can fetch this list atomically with an associated epoch. Then we would try the regular open files operation. If any of those files fails to open, we try again getting an updated file list and epoch, taking care not to re-open files again and closing any opened file that is no longer in the keydir. The same lock would be used when the merger attempts to delete a file. It would request to mark a file as deleted, which would be recorded in the keydir. Then delete it, then remove the file from the delete pending list in the keydir. Basically mimicking what we have now, but in the keydir structure instead.

engelsanchez · 2014-06-27T14:54:58Z

So, deferred deletes are needed in one case, which is probably what the counters are hitting: If the keydir is truly frozen (there is a secondary pending hashmap) during a merge, the merge deleting a file it has merged will potentially removed a file referenced by the most recent version of a value in the entries hash map. Currently, folds work on an epoch based view of the entries. But when the keydir is frozen, we don't allow it to see newer entries in the pending hash. We cap the epoch it could possibly use to the last in the entries hash map. So it may potentially require a file deleted by such a merge. In that case, the delete needs to be postponed until the pending hash map is merged back and we have a single hash map again. Or we could change the semantics of folds.

slfritchie · 2014-06-30T07:01:10Z

Options appear to be:

Change the semantics of fold (quite unlikely for Riak 2.0) and make a major change of Bitcask version number.
Fix the Cd8 deferred delete bug using this PR's method
Fix the Cd8 deferred delete bug using another method

Thoughts? cc: @jonmeredith

engelsanchez · 2014-06-30T14:20:31Z

I got a bit confused with this issue, so it's worth recapping. There are two issues here I believe. The unit test in the first commit of this PR is about the first:

Because deletes are deferred, you may get unlucky and close/re-open a bitcask and start writing a new file with a re-used id that will be deleted by a confused delete worker from a previous merge.
Folds use an epoch for snapshotting which may or may not be related to the list of files it opens when the fold starts because it's simply listing the files in the directory and trying to open them all. So it may want a snapshot that requires a file deleted during that time. It is necessary to ensure that the fold has already opened all files needed by that epoch so deletes won't affect it.

Honestly, I think the first point is rare and not worth fixing right now. I wouldn't oppose your "remember last file id on disk" thingie too much, even if ugly, to fix that. But the fold missing values occasionally is terrible in my opinion, and that's what I would like to fix for RC. Perhaps I should open a separate issue for that. I'm trying to code a solution that should guarantee safety at the cost of occasional increase of high percentile latencies. Not 100% sure it will work. I will update the ticket during the day with progress on that. We should chat when you're online tonight/your morning

slfritchie · 2014-07-01T13:18:13Z

I've force-pushed an update. The old file system-based scheme is on a branch called bug/deferred-delete-Cd8-old.

This branch passes EUnit tests (including the new one for this branch) on OS X and Linux. It also doesn't anger Valgrind (on Linux).

engelsanchez · 2014-07-03T14:40:50Z

c_src/bitcask_nifs.c

+        }
+        the_biggest = (global_biggest > keydir->biggest_file_id) ? \
+            global_biggest : keydir->biggest_file_id;
+        the_biggest += 100;


This is the one thing I would like to remove from this patch. I would rather just store the biggest file id as is, without adding some magic number to it. This kind of fudge just helps hide issues for longer than they should and adds strange behavior that would be hard to account for in test validations, etc.

engelsanchez · 2014-07-03T15:03:16Z

I have finished the code review for this. We can discuss the little issue with the biggest file id fudge, but I feel very strongly about it after recently removing some similar fudgy things from bitcask. For example, the old arbitrary 1 second exclusion window for merge file inclusion and a lenient version check policy in multi-entries, which led to big cleanup efforts once the issues they were hiding were made obvious.

Other than that, looks like we're almost ready! Let's see what PULSE has to say.

reiddraper · 2014-07-03T17:44:41Z

Holy shit PULSE. You guys rock.

slfritchie · 2014-07-04T01:49:27Z

I'd merged this branch and #175 together and ran it for a bit less than 8 CPU days. No problems, yay.

I'll remove the the_biggest += 100, re-merge, and re-test.

engelsanchez · 2014-07-07T17:38:01Z

I would argue against incrementing the stored biggest id at all, but I won't argue too much as it's a style/maintenance argument and not about correctness. So feel free to merge if you are satisfied with the extra pulse testing.

👍 ac71627

slfritchie · 2014-07-08T01:01:45Z

👍ac71627

engelsanchez · 2014-07-08T01:07:48Z

👍 ac71627
Borshop! Listen to reason! :)

Deferred delete bug (aka Cd8) Reviewed-by: engelsanchez

slfritchie · 2014-07-08T01:30:58Z

@borshop merge

EUnit regression test case for Cd8 of PR 156

948fc84

See the `Cd8` PULSE counterexample in #156

slfritchie added Bug and removed Bug labels Jun 4, 2014

slfritchie added this to the 2.0.1 milestone Jun 4, 2014

engelsanchez modified the milestones: 2.0-RC, 2.0.1 Jun 26, 2014

engelsanchez self-assigned this Jun 26, 2014

slfritchie added 2 commits July 1, 2014 21:57

Bugfix: calculate end of table while still holding lock

35b7967

Preserve biggest_file_id across close & re-open

0c37606

This was referenced Jul 1, 2014

Folds miss values due to race with deletes #174

Closed

Bugfix/fold open delete race #175

Merged

engelsanchez reviewed Jul 3, 2014
View reviewed changes

Change 'the_biggest += 100' to 'the_biggest++'

ac71627

borshop added a commit that referenced this pull request Jul 8, 2014

Merge pull request #170 from basho/bug/deferred-delete-Cd8

2ab7e95

Deferred delete bug (aka Cd8) Reviewed-by: engelsanchez

borshop merged commit ac71627 into develop Jul 8, 2014

seancribbs deleted the bug/deferred-delete-Cd8 branch April 1, 2015 22:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deferred delete bug (aka Cd8) #170

Deferred delete bug (aka Cd8) #170

slfritchie commented Jun 4, 2014

slfritchie commented Jun 4, 2014

evanmcc commented Jun 4, 2014

evanmcc commented Jun 4, 2014

slfritchie commented Jun 20, 2014

evanmcc commented Jun 20, 2014

evanmcc commented Jun 23, 2014

evanmcc commented Jun 25, 2014

evanmcc commented Jun 25, 2014

engelsanchez commented Jun 26, 2014

engelsanchez commented Jun 26, 2014

engelsanchez commented Jun 27, 2014

slfritchie commented Jun 30, 2014

engelsanchez commented Jun 30, 2014

slfritchie commented Jul 1, 2014

engelsanchez Jul 3, 2014

engelsanchez commented Jul 3, 2014

reiddraper commented Jul 3, 2014

slfritchie commented Jul 4, 2014

engelsanchez commented Jul 7, 2014

slfritchie commented Jul 8, 2014

engelsanchez commented Jul 8, 2014

slfritchie commented Jul 8, 2014

Deferred delete bug (aka Cd8) #170

Deferred delete bug (aka Cd8) #170

Conversation

slfritchie commented Jun 4, 2014

slfritchie commented Jun 4, 2014

evanmcc commented Jun 4, 2014

evanmcc commented Jun 4, 2014

slfritchie commented Jun 20, 2014

evanmcc commented Jun 20, 2014

evanmcc commented Jun 23, 2014

evanmcc commented Jun 25, 2014

evanmcc commented Jun 25, 2014

engelsanchez commented Jun 26, 2014

engelsanchez commented Jun 26, 2014

engelsanchez commented Jun 27, 2014

slfritchie commented Jun 30, 2014

engelsanchez commented Jun 30, 2014

slfritchie commented Jul 1, 2014

engelsanchez Jul 3, 2014

Choose a reason for hiding this comment

engelsanchez commented Jul 3, 2014

reiddraper commented Jul 3, 2014

slfritchie commented Jul 4, 2014

engelsanchez commented Jul 7, 2014

slfritchie commented Jul 8, 2014

engelsanchez commented Jul 8, 2014

slfritchie commented Jul 8, 2014