Add transaction batching #285

tv42 · 2015-01-17T07:21:37Z

Batcher makes it easy to make lots of small transactions with
significantly better performance. Batcher.Update combines multiple
Update calls into a single disk transaction, managing errors smartly.

I'm aware of https://github.com/boltdb/coalescer. I wrote the original version of this before you pushed coalescer, it just took a while for me to circle back to cleaning it up. This code doesn't leave a goroutine around, and is a few percent faster ;) But the real difference to coalescer is this: with this code, callers don't need to write a for loop around ErrRollback. I pull failing updaters out of the transaction, and retry the others automatically.

I'd really like to see this as part of bolt, and I feel like having a good batcher mechanism would help significantly with performance complaints like #244 . If this looks promising, it could even sit alongside DB.Update as DB.Batch -- or even become DB.Update (though beware the "runs multiple times" behavior difference).

With this in place, I see no reason why any of my apps would ever again bother doing non-batched transactions.

Let me know what you think. If you say no, this'll live as tv42/semiauto (as opposed to bolt action..)

benbjohnson · 2015-01-18T15:39:53Z

@tv42 I definitely like the idea of pushing it into Bolt core. It's such a common use case that it doesn't make sense to push it into another repo. We can't have it replace DB.Update() because the idempotency piece will cause backwards incompatibility.

I like your implementation better since it does the retries instead of rolling back everything. I don't have time right this moment to give it enough testing but I'll test and merge it in within the next couple of days.

tv42 · 2015-01-18T15:51:07Z

Yeah, DB.Batch() would be better than DB.Update().

tv42 · 2015-01-18T16:39:17Z

Rebased and edited the commit to become DB.Batch().

davecgh · 2015-01-18T18:09:55Z

I would highly recommend this pull request include documentation additions to README.md with example usage, caveats, etc.

tv42 · 2015-01-18T18:28:59Z

Needs more test coverage too. That work is just not useful if the whole approach is unwanted.

arctica · 2015-01-28T14:58:57Z

Excellent addition. I hope this can be merged into Bolt core and the documentation expanded as davecgh suggested.

benbjohnson · 2015-01-31T00:22:01Z

Sorry for the delay but I wanted to make sure I had a couple hours to dedicate to the PR and really understand it better. Here's some feedback after playing with it for a while:

What's the benefit of catching panic() calls and re-panicing? It seems like we'd lose the stacktrace of where the panic originated from.
The loop/LoadPointer()/CompareAndSwapPointer() in DB.Batch() is really confusing. I understand it conceptually but it feels clever and that worries me. Can we simplify this to kick off a new batch into a separate goroutine and add calls to that? Adding a call once full would cause it to flush and generate a new batch.
Can you rename BatchMaxSize and BatchMaxDelay to MaxBatchSize and MaxBatchDelay? That's just a personal preference.
Can you move MaxBatchSize/MaxBatchDelay to exported properties on DB instead of passing in through Options? The Options exists to set options that are required during Open().

Also, I refactored some of the tests a bit:

benbjohnson@ccee92e

tv42 · 2015-01-31T01:16:52Z

Good catch.

I can't just let the panic bubble up like Update or View do, because the func is typically executed in a different goroutine than where the matching Batch call is. The error has to be shipped to the right caller (hence the channel return from merge), and the type panicked is just used as a marker that it was a panic not an error.

Second, the point of re-panicking instead of just returning an error is to keep the panic, well, exceptional. It just felt like the right thing; don't swallow the panic because we actually have a caller that can deal with it (as opposed to e.g. a net/http Handler).

Maybe the recover bit should store the stack trace, and store it in the error sent to the caller. runtime.Stack makes it easy. (All goroutines or just current goroutine?) This seems to be the right thing to do regardless of whether it's re-panicked or returned as an error.

Let me know what you think.

It's essentially locking so it is bound to be clever by association. (Well, a full mutex would work too, but this oughta be faster). I wrote it originally with a goroutine but liked that even less; the atomic swap is the trick that makes it so nicely able to oppfoortunistically join an existing batch, the nil->me (or prevFullBatchIJustObserved->me) transition with CompareAndSwapPointer can only happen one at a time so that's a good signal for "I should become master". And once I added that, the need for the goroutine and channel just naturally went away.

I'm very much open to making the code better, but I wonder if that's achievable with e.g. a nicer description of the state machine.

Sure. I was trying to group by being related to Batch.
I did it this way to avoid the situation where callers can change the values on the fly, both for races and the weird bugs that could trigger. I'm a little worried about leaving that open. This also let me do error handling up front. I understand that that is what Options currently does, but if you look at it with fresh eyes nothing about Options itself seems specific to the actual opening event; it's just configuration passed to a constructor function.

I can't vouch for the test changes right now, but I appreciate the motivation; less indentation is good, I was copying your older programming style ;)

tv42 · 2015-01-31T02:05:48Z

The test changes look good. Hint for reviewing: make diff ignore whitespace changes.

tv42 · 2015-01-31T03:38:42Z

digraph batch_control_flow {
  start [shape=point];
  testCur [shape=diamond, label="is there a batch?"];
  start -> testCur;

  tryJoin [shape=diamond, label="try to join\ncurrent batch"];
  testCur -> tryJoin [label="yes"];
  waitForErr [label="wait for error"];
  tryJoin -> waitForErr [label="joined,\nif full\nnotify master"];
  returnOrPanic [label="return or panic", shape=doublecircle];
  waitForErr -> returnOrPanic;
  tryJoin -> tryMaster [label="join refused\n(already started)"];

  tryMaster [shape=diamond, label="try to\nbecome\nmaster"];
  testCur -> tryMaster [label="no"];
  tryMaster:se -> testCur:n [label="lost race"];
  waitFullOrDelay [label="wait for full or max delay"];
  tryMaster:sw -> waitFullOrDelay [label="we are master"];
  doTransactions [label="call db.Update, handle errors"];
  waitFullOrDelay -> doTransactions [label="full or timer /\nmark as started"];
  doTransactions -> waitForErr;
}

tv42 · 2015-02-11T21:21:13Z

Alright for your reviewing pleasure, https://github.com/tv42/bolt/tree/batch-3 / 29c0764 contains a rewrite using just a mutex for coordination. It got really simple once I realized I can wire a timer to a sync.Once.

For actual merging, I squashed the branch as 6931794.

It could still use more docs and tests, but is the current approach more understandable?

Performance is close enough to the atomics version, I'm seeing a lot of jitter so any real difference is probably drowned by the noise and would need several repeats to get anything conclusive; I'd rather get the code out there.

tv42 · 2015-02-11T21:22:27Z

Err I just saw a bug in handling batch full, consider the details to be in flux. Need those tests ;)

tv42 · 2015-02-11T21:51:54Z

The fix was pretty trivial: ee8e42e (but once again, I don't think these intermediates are worth merging).

New squashed merge candidate is 4ce6f86.

Coverage looks pretty good, the only thing missing is panicked.Error, and that's there only to make it fit through chan error.

tv42 · 2015-02-11T22:09:58Z

And I should still rip out the checks & errors on bad MaxBatchSize, MaxBatchDelay values and just handle whatever comes; if the user can set the fields on DB directly, there's no validation possible at that time anyway, and they degrade nicely in a way that just disables batching.

j · 2015-02-12T00:04:36Z

👍 and 😄

tv42 · 2015-02-12T18:29:51Z

I'm rebasing the "batch" branch again, new commit is b906f4e.
Here's what's new (from before the rebase, archived in branch batch-6):

commit 63c0780

Move batch item retrying to happen the original goroutine

This preserves panic stack traces, as we no longer need to communicate
panics as special errors between goroutines.

commit e698383

Internal docs for batch

commit d90b13e

Enforce MaxBatchSize

Previous logic relied on the scheduler to start the batch "soon
enough", this one will never exceed MaxBatchSize.

Fix broken test.

commit 0fabc63

Saner batch tunables

No need for errors here, and the copying logic was a leftover from
when they were actually in Options.

tv42 · 2015-02-12T18:50:51Z

@benbjohnson what do you think of b906f4e / 5261724?

@davecgh See the README addition in 5261724, does this match what you wanted to see? If not, can you talk more about what you would need as a user.

davecgh · 2015-02-12T19:27:10Z

README.md

+err := db.Batch(func(tx *bolt.Tx) error {
+	// Find last key in bucket, decode as bigendian uint64, increment
+    // by one, encode back to []byte, and add new key.
+    ...


Inconsistent indentation here.

davecgh · 2015-02-12T20:15:11Z

@tv42 That addition to the README looks pretty good. I think it might benefit from a couple of extra things:

An example that shows it being used concurrently. It really doesn't do anything differently (from a caller perspective) than Update when called as shown in the README which I think could easily lead to confusion for a user unfamiliar with the package. This is especially true when coming at it from the perspective of leveldb batches which are asynchronous calls that essentially just add the data to a batch of operations which you specifically trigger with a call to perform when ready (essentially what bolt already does inside of Update).
A short discussion of when you wouldn't want to use a Batch. For example, given the previous point, calling it from a single goroutine is semantically equivalent to calling Update. Since a batch involves additional overhead, it might make sense to call out that Update should be called instead unless you specifically arrange/need to call Batch in a concurrent fashion.

tv42 · 2015-02-12T20:31:06Z

@davecgh Thank you.

The problem with actually showing concurrency in the README is that it just gets too verbose. That would have to be a separate example.

Batch is definitely aimed at the case where the code is e.g. serving incoming network requests with multiple goroutines; anything else cannot fit into the same "returns error" calling pattern, and then you might as well use Update or manual transactions.

663b133 adds "concurrent" in key spots and explicitly says "Batch is only useful when there are multiple goroutines calling it" -- how's that?

davecgh · 2015-02-12T20:45:56Z

@tv42 Looks good to me with those additions.

Agreed about the example directly in the README. I was thinking more along the lines of referencing an example that is done in in db_test.go. For example, func ExampleDB_Batch() similar to the existing Update example.

That also has the benefit of being a testable example to ensure the example code actually remains current.

tv42 · 2015-02-12T21:27:38Z

@davecgh I still have trouble coming up with a non-silly example that's brief enough. Usually concurrency is driven by communication with the outside world, but that's not something you can encapsulate in ExampleDB_Batch easily.

tv42 · 2015-02-12T22:53:31Z

@davecgh b8ad031 has an example for Batch. I couldn't come up with anything simpler that looked like actual use.

j · 2015-02-16T21:42:41Z

Can't wait. Example looks clear enough.

benbjohnson · 2015-02-17T21:29:14Z

@tv42 Sorry for the delay in re-reviewing this. I'm going to block out some time tonight to go through this PR again. Thanks again for all the work on it. It'll be nice to have batching in Bolt core.

tv42 · 2015-02-17T22:11:35Z

Rebased and squashed with all of the above nits handled (as far as I can tell) as commit 1578db9. If there's any more feedback, please let me know.

benbjohnson · 2015-02-18T05:17:56Z

batch.go

+
+		if failIdx >= 0 {
+			// take the failing transaction out of the batch. it's
+			// safe to shorten b.calls here because b.batch no longer


b.batch -> db.batch

benbjohnson · 2015-02-18T05:28:52Z

@tv42 I like the changes a lot. I think it flows really well now and it's easy to read. I like the use of sync.Once too. I made a few minor comments. The only other things I'd like to see are:

Validate the inserted data at the end of the benchmarks to make sure that they are correct. We can call b.StopTimer() to exclude that time to validate.
Add a testing/quick test to try a lot of different settings and validate that they all work as expected. I can try to add this in the next couple of days (or feel free to add it yourself). It'd be nice to run go test -quickchecks=1000000 and just let it run for a while to make sure there's not any unexpected behavior.

tv42 · 2015-02-18T19:05:38Z

I just realized the benchmarks are racy, they share hashing across goroutines. Fixing..

tv42 · 2015-02-18T19:50:24Z

Alright fixes for the above are in commits leading to e927eff and I rebased & squashed on top of the new master as 15389f9.

DB.Batch makes it easy to make lots of small transactions with significantly better performance. Batch combines multiple concurrent Update calls into a single disk transaction, managing errors smartly.

tv42 · 2015-02-18T20:28:24Z

Somewhere along the line I screwed up the commit message, fixed in adbb1a1

tv42 · 2015-03-18T03:29:00Z

Ping.

benbjohnson · 2015-03-18T14:26:12Z

@tv42 Sorry for the delay on this. I keep wanting to add some testing/quick tests but I haven't had the time. Just to confirm, is this PR ready for merge?

tv42 · 2015-03-18T16:09:36Z

@benbjohnson Apart from deserving more testing coverage -- which is just about always true -- I feel like I have nothing more to add, and the comments from others have stopped too.

Add transaction batching

tv42 force-pushed the batch branch from 3f76319 to 4109272 Compare January 18, 2015 16:38

tv42 force-pushed the batch branch 2 times, most recently from 29c0764 to 6931794 Compare February 11, 2015 21:14

tv42 force-pushed the batch branch from 6931794 to 4ce6f86 Compare February 11, 2015 21:50

tv42 force-pushed the batch branch from 4ce6f86 to b906f4e Compare February 12, 2015 18:28

davecgh reviewed Feb 12, 2015
View reviewed changes

tv42 force-pushed the batch branch from 9fb0ff8 to 663b133 Compare February 12, 2015 20:30

tv42 force-pushed the batch branch from 8b928c8 to 1578db9 Compare February 17, 2015 22:10

benbjohnson reviewed Feb 18, 2015
View reviewed changes

tv42 force-pushed the batch branch from e927eff to 15389f9 Compare February 18, 2015 19:48

Add transaction batching

adbb1a1

DB.Batch makes it easy to make lots of small transactions with significantly better performance. Batch combines multiple concurrent Update calls into a single disk transaction, managing errors smartly.

tv42 force-pushed the batch branch from 15389f9 to adbb1a1 Compare February 18, 2015 20:28

benbjohnson added a commit that referenced this pull request Mar 18, 2015

Merge pull request #285 from tv42/batch

4d30731

Add transaction batching

benbjohnson merged commit 4d30731 into boltdb:master Mar 18, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add transaction batching #285

Add transaction batching #285

tv42 commented Jan 17, 2015

benbjohnson commented Jan 18, 2015

tv42 commented Jan 18, 2015

tv42 commented Jan 18, 2015

davecgh commented Jan 18, 2015

tv42 commented Jan 18, 2015

arctica commented Jan 28, 2015

benbjohnson commented Jan 31, 2015

tv42 commented Jan 31, 2015

tv42 commented Jan 31, 2015

tv42 commented Jan 31, 2015

tv42 commented Feb 11, 2015

tv42 commented Feb 11, 2015

tv42 commented Feb 11, 2015

tv42 commented Feb 11, 2015

j commented Feb 12, 2015

tv42 commented Feb 12, 2015

tv42 commented Feb 12, 2015

davecgh Feb 12, 2015

davecgh commented Feb 12, 2015

tv42 commented Feb 12, 2015

davecgh commented Feb 12, 2015

tv42 commented Feb 12, 2015

tv42 commented Feb 12, 2015

j commented Feb 16, 2015

benbjohnson commented Feb 17, 2015

tv42 commented Feb 17, 2015

benbjohnson Feb 18, 2015

benbjohnson commented Feb 18, 2015

tv42 commented Feb 18, 2015

tv42 commented Feb 18, 2015

tv42 commented Feb 18, 2015

tv42 commented Mar 18, 2015

benbjohnson commented Mar 18, 2015

tv42 commented Mar 18, 2015

Add transaction batching #285

Add transaction batching #285

Conversation

tv42 commented Jan 17, 2015

benbjohnson commented Jan 18, 2015

tv42 commented Jan 18, 2015

tv42 commented Jan 18, 2015

davecgh commented Jan 18, 2015

tv42 commented Jan 18, 2015

arctica commented Jan 28, 2015

benbjohnson commented Jan 31, 2015

tv42 commented Jan 31, 2015

tv42 commented Jan 31, 2015

tv42 commented Jan 31, 2015

tv42 commented Feb 11, 2015

tv42 commented Feb 11, 2015

tv42 commented Feb 11, 2015

tv42 commented Feb 11, 2015

j commented Feb 12, 2015

tv42 commented Feb 12, 2015

tv42 commented Feb 12, 2015

davecgh Feb 12, 2015

Choose a reason for hiding this comment

davecgh commented Feb 12, 2015

tv42 commented Feb 12, 2015

davecgh commented Feb 12, 2015

tv42 commented Feb 12, 2015

tv42 commented Feb 12, 2015

j commented Feb 16, 2015

benbjohnson commented Feb 17, 2015

tv42 commented Feb 17, 2015

benbjohnson Feb 18, 2015

Choose a reason for hiding this comment

benbjohnson commented Feb 18, 2015

tv42 commented Feb 18, 2015

tv42 commented Feb 18, 2015

tv42 commented Feb 18, 2015

tv42 commented Mar 18, 2015

benbjohnson commented Mar 18, 2015

tv42 commented Mar 18, 2015