-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bz888 writer restart bug #3
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The specific case uncovered by bctt was the keydir failing with not_ready initially, and between the time that occurred and the call was retried, the keydir finished loading, was used and closed. The right thing to do in this situation is attempt the keydir_new from scratch.
Closing -- this was to the wrong branch. |
slfritchie
added a commit
that referenced
this pull request
Dec 17, 2013
Originally found with bitcask_pulse, I deconstructed the test case to help understand what was happening: the new EUnit test is new_20131217_a_test_. As a result of the puts, key #13 is written 3x to fileid #1 (normal, tombstone, normal) and 1x to fileid #2 (normal @ the very beginning of the file). The merge creates fileid #3 and copies only the tombstone (the normal entry isn't copied because it is out-of-date). Before the close, the internal keydir contains the correct info about key #13, but after the close and re-open, we see key #13's entries: normal (and most recent) in fileid 32, and tombstone in fileid #3, oops. The fix is to remove all of the merge input fileids from the set of fileids that will survive/exist after the merge is finished.
slfritchie
added a commit
that referenced
this pull request
Mar 1, 2014
Change order of scan & fold: now oldest -> youngest New regression test (currently broken) for recent scan change of oldest -> youngest Fix rough edges on test/bitcask_qc.erl & test/bitcask_qc_expiry.erl Write tombstone on delete only if there's a keydir entry, plus race bugfix Fix remaining bug when bitcask scan/fold order was changed to oldest->youngest The NIF change fixes a long-standing latent bug: when put'ing a key that does not exist, if there's a race with a merge, keydir_put_int() would return 'ok' (error) rather than 'already_exists' (correct). The 'already_exists' return value is a signal to the read-write owner of the bitcask that the current append file must be closed and a new one opened (with a larger fileid than any merge). The tombstone change adds a new tombstone data format. Old tombstones will be handled correctly. New tombstones for any key K contain the fileid & offset of the key that it is deleting. If the fileid F still exists, then the tombstone will always be merged forward. If the fileid F does not exist, then merging forward is not necessary. When F was merged, the on-disk representation of key K not be merged forward: K does not exist in the keydir (because it was deleted by this tombstone), or it was replaced by a newer put. FML: fix tombstone merging bug when K & K's tombstone are in same fileid Originally found with bitcask_pulse, I deconstructed the test case to help understand what was happening: the new EUnit test is new_20131217_a_test_. As a result of the puts, key #13 is written 3x to fileid #1 (normal, tombstone, normal) and 1x to fileid #2 (normal @ the very beginning of the file). The merge creates fileid #3 and copies only the tombstone (the normal entry isn't copied because it is out-of-date). Before the close, the internal keydir contains the correct info about key #13, but after the close and re-open, we see key #13's entries: normal (and most recent) in fileid 32, and tombstone in fileid #3, oops. The fix is to remove all of the merge input fileids from the set of fileids that will survive/exist after the merge is finished. WIP: new test new_20131217_c_test_() is broken, need to experiment with bitcask:delete & NIF usage before continuing Fix non-determinism in the fold_visits_unfrozen_test test ... now 100% broken Fix iterator freeze bug demonstrated by last commit: remove kh_put_will_resize() predicate test for creating a keydir->pending Fix bug introduced by almost-bugfix in commit 1a9c99 that was supposed to be a partial fix for GH #82 Fix unexported function errors, thanks Buildbot Omnibus bugfixing, including: * Add 'already_exists' return to bitcask_nifs_keydir_remove(): we need it to signal races with merge, alas. * Add state to #filestate to be able to 'undo' last update to both a data file and its hint file. This probably means that we're going to have to play some games with merge file naming, TBD, stay tuned. * For bitcask:delete(), make the keydir delete conditional: if it fails, redo the entire thing again. * inner_merge_write() can have a race that, if a partial merge happens at the proper time after, we see an old value reappearing. Fix by checking the return value of the keydir put, and if 'already_exists', then undo the write. * When do_put() has a race and gets 'already_exists' from the keydir, undo the write before retrying. If this key is deleted sometime later, and then a partial merge happens after that, we might see this value reappear after the merge is done. * Add file_truncate() to bitcask_file.erl. TODO: do the same for the NIF style I/O. * Better robustness (I hope) to EUnit tests in bitcask_merge_delete.erl Add UNIX epoch time to put_int NIF call, to avoid C use of time(3). I hope this will eliminate a nasty source of nondeterminism during PULSE testing. Fix buildbot ?TESTDIR definition location problem Uff da: most PULSE non-determinism problems fixed? Fix merge race when keydir_put specifies old fileid & offset races with delete Scenario, with 3 writers, 2 & 3 are racing: * Writer 1: Put K, write @ {file 1, offset 63} * Writer 2: Delete operation starts ... but there is no write to disk yet * Writer 3: Merge scans file 1, sees K @ {1,30} -> not out of date -> but there is no write to disk yet * Writer 2: writes a tombstone @ {3,48} * Writer 2: Keydir conditional delete @ old location @ {1,63} is ok * Writer 2: keydir delete returns from NIF-land * Writer 3: merge copies data from {1, 63} -> {4, 42} * Writer 3: keydir put {4, 42} conditional on {1,63} succeeds due to incorrect conditional validation: the record is gone, but bug permits put to return 'ok'. Fix bug in previous commit: offset=0 is valid, do not check it Use larger retry # for open_fold_files() and keydir_wait_ready() when PULSE testing Small bitcask_pulse changes: file size, pretty the keys used, slightly less verbosity Adjust command frequency weightings for PULSE model Add no_tombstones_after_reopen_test() to test desired condition Avoid storing tombstones in the keydir when possible When a Bitcask is opened and scan_key_files() is reading data from disk and loading the RAM keydir, we now detect if the key is a tombstone and, if so, do not store it in the keydir. Normally, only hint files are scanned during startup. However, hint files have not stored enough information to confirm that a key is/is not a tombstone. I have added such a flag in a backward-compatible way: the offset size has been reduced from 64 -> 63 bits, and the uppermost bit (which is assumed to be 0 in all cases -- we assume nobody has actually written a file bit enough to require 64 bits to describe the offset) is used to signal tombstone status. An optional argument was given to the increment_file_id() NIF to communicate to the NIF that a data file exists ... a fact that would otherwise be lost if a hint/data file contains only tombstones. For testing purposes, fold() and fold_keys() are extended with another argument to expose the presence of keydir tombstones. Adjust timeouts and concurrency limits in bitcask_pulse.erl to avoid the worst of false-positive errors when using the PULSE model: {badrpc,timeout} nonsense. Add checks to the PULSE model to try to catch improper tombstones in keydir Dialyzer fixes Add failing test: tombstones exist in keydir if hint files are destroyed Add {tombstone,Key} wrapper when folding .data files; fixes test failure in last commit Fix intermittent bitcask_merge_worker intermittent timeout failure NIF review: avoid looking at argv[1] directly Remove unwanted oopsie foo.erl
slfritchie
added a commit
that referenced
this pull request
Mar 7, 2014
Change order of scan & fold: now oldest -> youngest New regression test (currently broken) for recent scan change of oldest -> youngest Fix rough edges on test/bitcask_qc.erl & test/bitcask_qc_expiry.erl Write tombstone on delete only if there's a keydir entry, plus race bugfix Fix remaining bug when bitcask scan/fold order was changed to oldest->youngest The NIF change fixes a long-standing latent bug: when put'ing a key that does not exist, if there's a race with a merge, keydir_put_int() would return 'ok' (error) rather than 'already_exists' (correct). The 'already_exists' return value is a signal to the read-write owner of the bitcask that the current append file must be closed and a new one opened (with a larger fileid than any merge). The tombstone change adds a new tombstone data format. Old tombstones will be handled correctly. New tombstones for any key K contain the fileid & offset of the key that it is deleting. If the fileid F still exists, then the tombstone will always be merged forward. If the fileid F does not exist, then merging forward is not necessary. When F was merged, the on-disk representation of key K not be merged forward: K does not exist in the keydir (because it was deleted by this tombstone), or it was replaced by a newer put. FML: fix tombstone merging bug when K & K's tombstone are in same fileid Originally found with bitcask_pulse, I deconstructed the test case to help understand what was happening: the new EUnit test is new_20131217_a_test_. As a result of the puts, key #13 is written 3x to fileid #1 (normal, tombstone, normal) and 1x to fileid #2 (normal @ the very beginning of the file). The merge creates fileid #3 and copies only the tombstone (the normal entry isn't copied because it is out-of-date). Before the close, the internal keydir contains the correct info about key #13, but after the close and re-open, we see key #13's entries: normal (and most recent) in fileid 32, and tombstone in fileid #3, oops. The fix is to remove all of the merge input fileids from the set of fileids that will survive/exist after the merge is finished. WIP: new test new_20131217_c_test_() is broken, need to experiment with bitcask:delete & NIF usage before continuing Fix non-determinism in the fold_visits_unfrozen_test test ... now 100% broken Fix iterator freeze bug demonstrated by last commit: remove kh_put_will_resize() predicate test for creating a keydir->pending Fix bug introduced by almost-bugfix in commit 1a9c99 that was supposed to be a partial fix for GH #82 Fix unexported function errors, thanks Buildbot Omnibus bugfixing, including: * Add 'already_exists' return to bitcask_nifs_keydir_remove(): we need it to signal races with merge, alas. * Add state to #filestate to be able to 'undo' last update to both a data file and its hint file. This probably means that we're going to have to play some games with merge file naming, TBD, stay tuned. * For bitcask:delete(), make the keydir delete conditional: if it fails, redo the entire thing again. * inner_merge_write() can have a race that, if a partial merge happens at the proper time after, we see an old value reappearing. Fix by checking the return value of the keydir put, and if 'already_exists', then undo the write. * When do_put() has a race and gets 'already_exists' from the keydir, undo the write before retrying. If this key is deleted sometime later, and then a partial merge happens after that, we might see this value reappear after the merge is done. * Add file_truncate() to bitcask_file.erl. TODO: do the same for the NIF style I/O. * Better robustness (I hope) to EUnit tests in bitcask_merge_delete.erl Add UNIX epoch time to put_int NIF call, to avoid C use of time(3). I hope this will eliminate a nasty source of nondeterminism during PULSE testing. Fix buildbot ?TESTDIR definition location problem Uff da: most PULSE non-determinism problems fixed? Fix merge race when keydir_put specifies old fileid & offset races with delete Scenario, with 3 writers, 2 & 3 are racing: * Writer 1: Put K, write @ {file 1, offset 63} * Writer 2: Delete operation starts ... but there is no write to disk yet * Writer 3: Merge scans file 1, sees K @ {1,30} -> not out of date -> but there is no write to disk yet * Writer 2: writes a tombstone @ {3,48} * Writer 2: Keydir conditional delete @ old location @ {1,63} is ok * Writer 2: keydir delete returns from NIF-land * Writer 3: merge copies data from {1, 63} -> {4, 42} * Writer 3: keydir put {4, 42} conditional on {1,63} succeeds due to incorrect conditional validation: the record is gone, but bug permits put to return 'ok'. Fix bug in previous commit: offset=0 is valid, do not check it Use larger retry # for open_fold_files() and keydir_wait_ready() when PULSE testing Small bitcask_pulse changes: file size, pretty the keys used, slightly less verbosity Adjust command frequency weightings for PULSE model Add no_tombstones_after_reopen_test() to test desired condition Avoid storing tombstones in the keydir when possible When a Bitcask is opened and scan_key_files() is reading data from disk and loading the RAM keydir, we now detect if the key is a tombstone and, if so, do not store it in the keydir. Normally, only hint files are scanned during startup. However, hint files have not stored enough information to confirm that a key is/is not a tombstone. I have added such a flag in a backward-compatible way: the offset size has been reduced from 64 -> 63 bits, and the uppermost bit (which is assumed to be 0 in all cases -- we assume nobody has actually written a file bit enough to require 64 bits to describe the offset) is used to signal tombstone status. An optional argument was given to the increment_file_id() NIF to communicate to the NIF that a data file exists ... a fact that would otherwise be lost if a hint/data file contains only tombstones. For testing purposes, fold() and fold_keys() are extended with another argument to expose the presence of keydir tombstones. Adjust timeouts and concurrency limits in bitcask_pulse.erl to avoid the worst of false-positive errors when using the PULSE model: {badrpc,timeout} nonsense. Add checks to the PULSE model to try to catch improper tombstones in keydir Dialyzer fixes Add failing test: tombstones exist in keydir if hint files are destroyed Add {tombstone,Key} wrapper when folding .data files; fixes test failure in last commit Fix intermittent bitcask_merge_worker intermittent timeout failure NIF review: avoid looking at argv[1] directly Remove unwanted oopsie foo.erl Restore put will resize check for multi-fold Restoring the check that will postpone freezing the keydir until puts will require a structural change. Without this, the new multifold code is crippled. Redo merge/delete race fix Consolidating retry on merge race logic in do_put, making delete simply a wrapper around a put(tombstone) operation. Some cleanup regarding the different tombstone versions Add missing file truncate op for NIF mode Plus a little unit test to make sure the file truncation operation works.
slfritchie
added a commit
that referenced
this pull request
Mar 7, 2014
Change order of scan & fold: now oldest -> youngest New regression test (currently broken) for recent scan change of oldest -> youngest Fix rough edges on test/bitcask_qc.erl & test/bitcask_qc_expiry.erl Write tombstone on delete only if there's a keydir entry, plus race bugfix Fix remaining bug when bitcask scan/fold order was changed to oldest->youngest The NIF change fixes a long-standing latent bug: when put'ing a key that does not exist, if there's a race with a merge, keydir_put_int() would return 'ok' (error) rather than 'already_exists' (correct). The 'already_exists' return value is a signal to the read-write owner of the bitcask that the current append file must be closed and a new one opened (with a larger fileid than any merge). The tombstone change adds a new tombstone data format. Old tombstones will be handled correctly. New tombstones for any key K contain the fileid & offset of the key that it is deleting. If the fileid F still exists, then the tombstone will always be merged forward. If the fileid F does not exist, then merging forward is not necessary. When F was merged, the on-disk representation of key K not be merged forward: K does not exist in the keydir (because it was deleted by this tombstone), or it was replaced by a newer put. FML: fix tombstone merging bug when K & K's tombstone are in same fileid Originally found with bitcask_pulse, I deconstructed the test case to help understand what was happening: the new EUnit test is new_20131217_a_test_. As a result of the puts, key #13 is written 3x to fileid #1 (normal, tombstone, normal) and 1x to fileid #2 (normal @ the very beginning of the file). The merge creates fileid #3 and copies only the tombstone (the normal entry isn't copied because it is out-of-date). Before the close, the internal keydir contains the correct info about key #13, but after the close and re-open, we see key #13's entries: normal (and most recent) in fileid 32, and tombstone in fileid #3, oops. The fix is to remove all of the merge input fileids from the set of fileids that will survive/exist after the merge is finished. WIP: new test new_20131217_c_test_() is broken, need to experiment with bitcask:delete & NIF usage before continuing Fix non-determinism in the fold_visits_unfrozen_test test ... now 100% broken Fix iterator freeze bug demonstrated by last commit: remove kh_put_will_resize() predicate test for creating a keydir->pending Fix bug introduced by almost-bugfix in commit 1a9c99 that was supposed to be a partial fix for GH #82 Fix unexported function errors, thanks Buildbot Omnibus bugfixing, including: * Add 'already_exists' return to bitcask_nifs_keydir_remove(): we need it to signal races with merge, alas. * Add state to #filestate to be able to 'undo' last update to both a data file and its hint file. This probably means that we're going to have to play some games with merge file naming, TBD, stay tuned. * For bitcask:delete(), make the keydir delete conditional: if it fails, redo the entire thing again. * inner_merge_write() can have a race that, if a partial merge happens at the proper time after, we see an old value reappearing. Fix by checking the return value of the keydir put, and if 'already_exists', then undo the write. * When do_put() has a race and gets 'already_exists' from the keydir, undo the write before retrying. If this key is deleted sometime later, and then a partial merge happens after that, we might see this value reappear after the merge is done. * Add file_truncate() to bitcask_file.erl. TODO: do the same for the NIF style I/O. * Better robustness (I hope) to EUnit tests in bitcask_merge_delete.erl Add UNIX epoch time to put_int NIF call, to avoid C use of time(3). I hope this will eliminate a nasty source of nondeterminism during PULSE testing. Fix buildbot ?TESTDIR definition location problem Uff da: most PULSE non-determinism problems fixed? Fix merge race when keydir_put specifies old fileid & offset races with delete Scenario, with 3 writers, 2 & 3 are racing: * Writer 1: Put K, write @ {file 1, offset 63} * Writer 2: Delete operation starts ... but there is no write to disk yet * Writer 3: Merge scans file 1, sees K @ {1,30} -> not out of date -> but there is no write to disk yet * Writer 2: writes a tombstone @ {3,48} * Writer 2: Keydir conditional delete @ old location @ {1,63} is ok * Writer 2: keydir delete returns from NIF-land * Writer 3: merge copies data from {1, 63} -> {4, 42} * Writer 3: keydir put {4, 42} conditional on {1,63} succeeds due to incorrect conditional validation: the record is gone, but bug permits put to return 'ok'. Fix bug in previous commit: offset=0 is valid, do not check it Use larger retry # for open_fold_files() and keydir_wait_ready() when PULSE testing Small bitcask_pulse changes: file size, pretty the keys used, slightly less verbosity Adjust command frequency weightings for PULSE model Add no_tombstones_after_reopen_test() to test desired condition Avoid storing tombstones in the keydir when possible When a Bitcask is opened and scan_key_files() is reading data from disk and loading the RAM keydir, we now detect if the key is a tombstone and, if so, do not store it in the keydir. Normally, only hint files are scanned during startup. However, hint files have not stored enough information to confirm that a key is/is not a tombstone. I have added such a flag in a backward-compatible way: the offset size has been reduced from 64 -> 63 bits, and the uppermost bit (which is assumed to be 0 in all cases -- we assume nobody has actually written a file bit enough to require 64 bits to describe the offset) is used to signal tombstone status. An optional argument was given to the increment_file_id() NIF to communicate to the NIF that a data file exists ... a fact that would otherwise be lost if a hint/data file contains only tombstones. For testing purposes, fold() and fold_keys() are extended with another argument to expose the presence of keydir tombstones. Adjust timeouts and concurrency limits in bitcask_pulse.erl to avoid the worst of false-positive errors when using the PULSE model: {badrpc,timeout} nonsense. Add checks to the PULSE model to try to catch improper tombstones in keydir Dialyzer fixes Add failing test: tombstones exist in keydir if hint files are destroyed Add {tombstone,Key} wrapper when folding .data files; fixes test failure in last commit Fix intermittent bitcask_merge_worker intermittent timeout failure NIF review: avoid looking at argv[1] directly Remove unwanted oopsie foo.erl Restore put will resize check for multi-fold Restoring the check that will postpone freezing the keydir until puts will require a structural change. Without this, the new multifold code is crippled. Redo merge/delete race fix Consolidating retry on merge race logic in do_put, making delete simply a wrapper around a put(tombstone) operation. Some cleanup regarding the different tombstone versions Add missing file truncate op for NIF mode Plus a little unit test to make sure the file truncation operation works.
slfritchie
added a commit
that referenced
this pull request
Mar 7, 2014
Add regression test for GH #82 Change order of scan & fold: now oldest -> youngest New regression test (currently broken) for recent scan change of oldest -> youngest Fix rough edges on test/bitcask_qc.erl & test/bitcask_qc_expiry.erl Write tombstone on delete only if there's a keydir entry, plus race bugfix Fix remaining bug when bitcask scan/fold order was changed to oldest->youngest The NIF change fixes a long-standing latent bug: when put'ing a key that does not exist, if there's a race with a merge, keydir_put_int() would return 'ok' (error) rather than 'already_exists' (correct). The 'already_exists' return value is a signal to the read-write owner of the bitcask that the current append file must be closed and a new one opened (with a larger fileid than any merge). The tombstone change adds a new tombstone data format. Old tombstones will be handled correctly. New tombstones for any key K contain the fileid & offset of the key that it is deleting. If the fileid F still exists, then the tombstone will always be merged forward. If the fileid F does not exist, then merging forward is not necessary. When F was merged, the on-disk representation of key K not be merged forward: K does not exist in the keydir (because it was deleted by this tombstone), or it was replaced by a newer put. FML: fix tombstone merging bug when K & K's tombstone are in same fileid Originally found with bitcask_pulse, I deconstructed the test case to help understand what was happening: the new EUnit test is new_20131217_a_test_. As a result of the puts, key #13 is written 3x to fileid #1 (normal, tombstone, normal) and 1x to fileid #2 (normal @ the very beginning of the file). The merge creates fileid #3 and copies only the tombstone (the normal entry isn't copied because it is out-of-date). Before the close, the internal keydir contains the correct info about key #13, but after the close and re-open, we see key #13's entries: normal (and most recent) in fileid 32, and tombstone in fileid #3, oops. The fix is to remove all of the merge input fileids from the set of fileids that will survive/exist after the merge is finished. WIP: new test new_20131217_c_test_() is broken, need to experiment with bitcask:delete & NIF usage before continuing Fix non-determinism in the fold_visits_unfrozen_test test ... now 100% broken Fix iterator freeze bug demonstrated by last commit: remove kh_put_will_resize() predicate test for creating a keydir->pending Fix bug introduced by almost-bugfix in commit 1a9c99 that was supposed to be a partial fix for GH #82 Fix unexported function errors, thanks Buildbot Omnibus bugfixing, including: * Add 'already_exists' return to bitcask_nifs_keydir_remove(): we need it to signal races with merge, alas. * Add state to #filestate to be able to 'undo' last update to both a data file and its hint file. This probably means that we're going to have to play some games with merge file naming, TBD, stay tuned. * For bitcask:delete(), make the keydir delete conditional: if it fails, redo the entire thing again. * inner_merge_write() can have a race that, if a partial merge happens at the proper time after, we see an old value reappearing. Fix by checking the return value of the keydir put, and if 'already_exists', then undo the write. * When do_put() has a race and gets 'already_exists' from the keydir, undo the write before retrying. If this key is deleted sometime later, and then a partial merge happens after that, we might see this value reappear after the merge is done. * Add file_truncate() to bitcask_file.erl. TODO: do the same for the NIF style I/O. * Better robustness (I hope) to EUnit tests in bitcask_merge_delete.erl Add UNIX epoch time to put_int NIF call, to avoid C use of time(3). I hope this will eliminate a nasty source of nondeterminism during PULSE testing. Fix buildbot ?TESTDIR definition location problem Uff da: most PULSE non-determinism problems fixed? Fix merge race when keydir_put specifies old fileid & offset races with delete Scenario, with 3 writers, 2 & 3 are racing: * Writer 1: Put K, write @ {file 1, offset 63} * Writer 2: Delete operation starts ... but there is no write to disk yet * Writer 3: Merge scans file 1, sees K @ {1,30} -> not out of date -> but there is no write to disk yet * Writer 2: writes a tombstone @ {3,48} * Writer 2: Keydir conditional delete @ old location @ {1,63} is ok * Writer 2: keydir delete returns from NIF-land * Writer 3: merge copies data from {1, 63} -> {4, 42} * Writer 3: keydir put {4, 42} conditional on {1,63} succeeds due to incorrect conditional validation: the record is gone, but bug permits put to return 'ok'. Fix bug in previous commit: offset=0 is valid, do not check it Use larger retry # for open_fold_files() and keydir_wait_ready() when PULSE testing Small bitcask_pulse changes: file size, pretty the keys used, slightly less verbosity Adjust command frequency weightings for PULSE model Add no_tombstones_after_reopen_test() to test desired condition Avoid storing tombstones in the keydir when possible When a Bitcask is opened and scan_key_files() is reading data from disk and loading the RAM keydir, we now detect if the key is a tombstone and, if so, do not store it in the keydir. Normally, only hint files are scanned during startup. However, hint files have not stored enough information to confirm that a key is/is not a tombstone. I have added such a flag in a backward-compatible way: the offset size has been reduced from 64 -> 63 bits, and the uppermost bit (which is assumed to be 0 in all cases -- we assume nobody has actually written a file bit enough to require 64 bits to describe the offset) is used to signal tombstone status. An optional argument was given to the increment_file_id() NIF to communicate to the NIF that a data file exists ... a fact that would otherwise be lost if a hint/data file contains only tombstones. For testing purposes, fold() and fold_keys() are extended with another argument to expose the presence of keydir tombstones. Adjust timeouts and concurrency limits in bitcask_pulse.erl to avoid the worst of false-positive errors when using the PULSE model: {badrpc,timeout} nonsense. Add checks to the PULSE model to try to catch improper tombstones in keydir Dialyzer fixes Add failing test: tombstones exist in keydir if hint files are destroyed Add {tombstone,Key} wrapper when folding .data files; fixes test failure in last commit Fix intermittent bitcask_merge_worker intermittent timeout failure NIF review: avoid looking at argv[1] directly Remove unwanted oopsie foo.erl Restore put will resize check for multi-fold Restoring the check that will postpone freezing the keydir until puts will require a structural change. Without this, the new multifold code is crippled. Redo merge/delete race fix Consolidating retry on merge race logic in do_put, making delete simply a wrapper around a put(tombstone) operation. Some cleanup regarding the different tombstone versions Add missing file truncate op for NIF mode Plus a little unit test to make sure the file truncation operation works. Fix unit tests Scott's time fudge was leaking into other tests, causing them to see static timestamps and sometimes fail. The un_write code requires support for file:position style offset parameters in our NIF version ({bof, 5}, {eof, -1}, etc). Remove debug logging and clear on set fudged time
slfritchie
added a commit
that referenced
this pull request
Mar 10, 2014
Add regression test for GH #82 Change order of scan & fold: now oldest -> youngest New regression test (currently broken) for recent scan change of oldest -> youngest Fix rough edges on test/bitcask_qc.erl & test/bitcask_qc_expiry.erl Write tombstone on delete only if there's a keydir entry, plus race bugfix Fix remaining bug when bitcask scan/fold order was changed to oldest->youngest The NIF change fixes a long-standing latent bug: when put'ing a key that does not exist, if there's a race with a merge, keydir_put_int() would return 'ok' (error) rather than 'already_exists' (correct). The 'already_exists' return value is a signal to the read-write owner of the bitcask that the current append file must be closed and a new one opened (with a larger fileid than any merge). The tombstone change adds a new tombstone data format. Old tombstones will be handled correctly. New tombstones for any key K contain the fileid & offset of the key that it is deleting. If the fileid F still exists, then the tombstone will always be merged forward. If the fileid F does not exist, then merging forward is not necessary. When F was merged, the on-disk representation of key K not be merged forward: K does not exist in the keydir (because it was deleted by this tombstone), or it was replaced by a newer put. FML: fix tombstone merging bug when K & K's tombstone are in same fileid Originally found with bitcask_pulse, I deconstructed the test case to help understand what was happening: the new EUnit test is new_20131217_a_test_. As a result of the puts, key #13 is written 3x to fileid #1 (normal, tombstone, normal) and 1x to fileid #2 (normal @ the very beginning of the file). The merge creates fileid #3 and copies only the tombstone (the normal entry isn't copied because it is out-of-date). Before the close, the internal keydir contains the correct info about key #13, but after the close and re-open, we see key #13's entries: normal (and most recent) in fileid 32, and tombstone in fileid #3, oops. The fix is to remove all of the merge input fileids from the set of fileids that will survive/exist after the merge is finished. WIP: new test new_20131217_c_test_() is broken, need to experiment with bitcask:delete & NIF usage before continuing Fix non-determinism in the fold_visits_unfrozen_test test ... now 100% broken Fix iterator freeze bug demonstrated by last commit: remove kh_put_will_resize() predicate test for creating a keydir->pending Fix bug introduced by almost-bugfix in commit 1a9c99 that was supposed to be a partial fix for GH #82 Fix unexported function errors, thanks Buildbot Omnibus bugfixing, including: * Add 'already_exists' return to bitcask_nifs_keydir_remove(): we need it to signal races with merge, alas. * Add state to #filestate to be able to 'undo' last update to both a data file and its hint file. This probably means that we're going to have to play some games with merge file naming, TBD, stay tuned. * For bitcask:delete(), make the keydir delete conditional: if it fails, redo the entire thing again. * inner_merge_write() can have a race that, if a partial merge happens at the proper time after, we see an old value reappearing. Fix by checking the return value of the keydir put, and if 'already_exists', then undo the write. * When do_put() has a race and gets 'already_exists' from the keydir, undo the write before retrying. If this key is deleted sometime later, and then a partial merge happens after that, we might see this value reappear after the merge is done. * Add file_truncate() to bitcask_file.erl. TODO: do the same for the NIF style I/O. * Better robustness (I hope) to EUnit tests in bitcask_merge_delete.erl Add UNIX epoch time to put_int NIF call, to avoid C use of time(3). I hope this will eliminate a nasty source of nondeterminism during PULSE testing. Fix buildbot ?TESTDIR definition location problem Uff da: most PULSE non-determinism problems fixed? Fix merge race when keydir_put specifies old fileid & offset races with delete Scenario, with 3 writers, 2 & 3 are racing: * Writer 1: Put K, write @ {file 1, offset 63} * Writer 2: Delete operation starts ... but there is no write to disk yet * Writer 3: Merge scans file 1, sees K @ {1,30} -> not out of date -> but there is no write to disk yet * Writer 2: writes a tombstone @ {3,48} * Writer 2: Keydir conditional delete @ old location @ {1,63} is ok * Writer 2: keydir delete returns from NIF-land * Writer 3: merge copies data from {1, 63} -> {4, 42} * Writer 3: keydir put {4, 42} conditional on {1,63} succeeds due to incorrect conditional validation: the record is gone, but bug permits put to return 'ok'. Fix bug in previous commit: offset=0 is valid, do not check it Use larger retry # for open_fold_files() and keydir_wait_ready() when PULSE testing Small bitcask_pulse changes: file size, pretty the keys used, slightly less verbosity Adjust command frequency weightings for PULSE model Add no_tombstones_after_reopen_test() to test desired condition Avoid storing tombstones in the keydir when possible When a Bitcask is opened and scan_key_files() is reading data from disk and loading the RAM keydir, we now detect if the key is a tombstone and, if so, do not store it in the keydir. Normally, only hint files are scanned during startup. However, hint files have not stored enough information to confirm that a key is/is not a tombstone. I have added such a flag in a backward-compatible way: the offset size has been reduced from 64 -> 63 bits, and the uppermost bit (which is assumed to be 0 in all cases -- we assume nobody has actually written a file bit enough to require 64 bits to describe the offset) is used to signal tombstone status. An optional argument was given to the increment_file_id() NIF to communicate to the NIF that a data file exists ... a fact that would otherwise be lost if a hint/data file contains only tombstones. For testing purposes, fold() and fold_keys() are extended with another argument to expose the presence of keydir tombstones. Adjust timeouts and concurrency limits in bitcask_pulse.erl to avoid the worst of false-positive errors when using the PULSE model: {badrpc,timeout} nonsense. Add checks to the PULSE model to try to catch improper tombstones in keydir Dialyzer fixes Add failing test: tombstones exist in keydir if hint files are destroyed Add {tombstone,Key} wrapper when folding .data files; fixes test failure in last commit Fix intermittent bitcask_merge_worker intermittent timeout failure NIF review: avoid looking at argv[1] directly Remove unwanted oopsie foo.erl Restore put will resize check for multi-fold Restoring the check that will postpone freezing the keydir until puts will require a structural change. Without this, the new multifold code is crippled. Redo merge/delete race fix Consolidating retry on merge race logic in do_put, making delete simply a wrapper around a put(tombstone) operation. Some cleanup regarding the different tombstone versions Add missing file truncate op for NIF mode Plus a little unit test to make sure the file truncation operation works. Fix unit tests Scott's time fudge was leaking into other tests, causing them to see static timestamps and sometimes fail. The un_write code requires support for file:position style offset parameters in our NIF version ({bof, 5}, {eof, -1}, etc). Remove debug logging and clear on set fudged time
slfritchie
added a commit
that referenced
this pull request
Mar 13, 2014
* A delete in the writer process may race with a merge writing a value for the same key on a file with id greater than the current write file. Re-opening the cask would end up loading the merged value and forgetting the delete. The delete path will now detect the race and retry the tombstone write on a file with id greater than the merge file until it prevails. * A partial merge could end up removing the tombstone for a key that still has a value in one of the unmerged files. To detect this, a new kind of tombstone that contains a pointer to the value it is replacing has been added. * Change order of scan & fold: now oldest -> youngest Fix rough edges on test/bitcask_qc.erl & test/bitcask_qc_expiry.erl Write tombstone on delete only if there's a keydir entry, plus race bugfix The NIF change fixes a long-standing latent bug: when put'ing a key that does not exist, if there's a race with a merge, keydir_put_int() would return 'ok' (error) rather than 'already_exists' (correct). The 'already_exists' return value is a signal to the read-write owner of the bitcask that the current append file must be closed and a new one opened (with a larger fileid than any merge). The tombstone change adds a new tombstone data format. Old tombstones will be handled correctly. New tombstones for any key K contain the fileid & offset of the key that it is deleting. If the fileid F still exists, then the tombstone will always be merged forward. If the fileid F does not exist, then merging forward is not necessary. When F was merged, the on-disk representation of key K not be merged forward: K does not exist in the keydir (because it was deleted by this tombstone), or it was replaced by a newer put. FML: fix tombstone merging bug when K & K's tombstone are in same fileid Originally found with bitcask_pulse, I deconstructed the test case to help understand what was happening: the new EUnit test is new_20131217_a_test_. As a result of the puts, key #13 is written 3x to fileid #1 (normal, tombstone, normal) and 1x to fileid #2 (normal @ the very beginning of the file). The merge creates fileid #3 and copies only the tombstone (the normal entry isn't copied because it is out-of-date). Before the close, the internal keydir contains the correct info about key #13, but after the close and re-open, we see key #13's entries: normal (and most recent) in fileid 32, and tombstone in fileid #3, oops. The fix is to remove all of the merge input fileids from the set of fileids that will survive/exist after the merge is finished. WIP: new test new_20131217_c_test_() is broken, need to experiment with bitcask:delete & NIF usage before continuing Fix non-determinism in the fold_visits_unfrozen_test test ... now 100% broken Fix iterator freeze bug demonstrated by last commit: remove kh_put_will_resize() predicate test for creating a keydir->pending Fix bug introduced by almost-bugfix in commit 1a9c99 that was supposed to be a partial fix for GH #82 Fix unexported function errors, thanks Buildbot Omnibus bugfixing, including: * Add 'already_exists' return to bitcask_nifs_keydir_remove(): we need it to signal races with merge, alas. * Add state to #filestate to be able to 'undo' last update to both a data file and its hint file. This probably means that we're going to have to play some games with merge file naming, TBD, stay tuned. * For bitcask:delete(), make the keydir delete conditional: if it fails, redo the entire thing again. * inner_merge_write() can have a race that, if a partial merge happens at the proper time after, we see an old value reappearing. Fix by checking the return value of the keydir put, and if 'already_exists', then undo the write. * When do_put() has a race and gets 'already_exists' from the keydir, undo the write before retrying. If this key is deleted sometime later, and then a partial merge happens after that, we might see this value reappear after the merge is done. * Add file_truncate() to bitcask_file.erl. TODO: do the same for the NIF style I/O. * Better robustness (I hope) to EUnit tests in bitcask_merge_delete.erl Add UNIX epoch time to put_int NIF call, to avoid C use of time(3). I hope this will eliminate a nasty source of nondeterminism during PULSE testing. Fix buildbot ?TESTDIR definition location problem Uff da: most PULSE non-determinism problems fixed? Fix merge race when keydir_put specifies old fileid & offset races with delete Scenario, with 3 writers, 2 & 3 are racing: * Writer 1: Put K, write @ {file 1, offset 63} * Writer 2: Delete operation starts ... but there is no write to disk yet * Writer 3: Merge scans file 1, sees K @ {1,30} -> not out of date -> but there is no write to disk yet * Writer 2: writes a tombstone @ {3,48} * Writer 2: Keydir conditional delete @ old location @ {1,63} is ok * Writer 2: keydir delete returns from NIF-land * Writer 3: merge copies data from {1, 63} -> {4, 42} * Writer 3: keydir put {4, 42} conditional on {1,63} succeeds due to incorrect conditional validation: the record is gone, but bug permits put to return 'ok'. Fix bug in previous commit: offset=0 is valid, do not check it Use larger retry # for open_fold_files() and keydir_wait_ready() when PULSE testing Small bitcask_pulse changes: file size, pretty the keys used, slightly less verbosity Adjust command frequency weightings for PULSE model Add no_tombstones_after_reopen_test() to test desired condition Avoid storing tombstones in the keydir when possible When a Bitcask is opened and scan_key_files() is reading data from disk and loading the RAM keydir, we now detect if the key is a tombstone and, if so, do not store it in the keydir. Normally, only hint files are scanned during startup. However, hint files have not stored enough information to confirm that a key is/is not a tombstone. I have added such a flag in a backward-compatible way: the offset size has been reduced from 64 -> 63 bits, and the uppermost bit (which is assumed to be 0 in all cases -- we assume nobody has actually written a file bit enough to require 64 bits to describe the offset) is used to signal tombstone status. An optional argument was given to the increment_file_id() NIF to communicate to the NIF that a data file exists ... a fact that would otherwise be lost if a hint/data file contains only tombstones. For testing purposes, fold() and fold_keys() are extended with another argument to expose the presence of keydir tombstones. Adjust timeouts and concurrency limits in bitcask_pulse.erl to avoid the worst of false-positive errors when using the PULSE model: {badrpc,timeout} nonsense. Add checks to the PULSE model to try to catch improper tombstones in keydir Dialyzer fixes Add failing test: tombstones exist in keydir if hint files are destroyed Add {tombstone,Key} wrapper when folding .data files; fixes test failure in last commit Fix intermittent bitcask_merge_worker intermittent timeout failure NIF review: avoid looking at argv[1] directly Remove unwanted oopsie foo.erl Restore put will resize check for multi-fold Restoring the check that will postpone freezing the keydir until puts will require a structural change. Without this, the new multifold code is crippled. Redo merge/delete race fix Consolidating retry on merge race logic in do_put, making delete simply a wrapper around a put(tombstone) operation. Some cleanup regarding the different tombstone versions Add missing file truncate op for NIF mode Plus a little unit test to make sure the file truncation operation works. Fix unit tests Scott's time fudge was leaking into other tests, causing them to see static timestamps and sometimes fail. The un_write code requires support for file:position style offset parameters in our NIF version ({bof, 5}, {eof, -1}, etc). Remove debug logging and clear on set fudged time
This pull request was closed.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Please review this fix to bz888.