Add server-level write batching. #1572

benbjohnson · 2015-02-11T19:22:01Z

Overview

This commit changes the Server to batch writes together -- even across multiple requests. The server then periodically sends writes together to the broker by shard in a single message.

This approach uses a lot of goroutines but it allows for all points to manage the returned error and index. This simpler approach was used to get basic batching working. I'll submit a separate PR for optimizing batching to minimize goroutines.

Notes

Batching is only implemented for raw writes. Non-raw writes only occur once for brand new series so I don't think it's worth implementing for that right now.

This commit changes the Server to batch writes together -- even across multiple requests. The server then periodically sends writes together to the broker by shard in a single message. This approach uses a lot of goroutines but it allows for all points to manage the returned error and index. This simpler approach was used to get basic batching working. I'll submit a separate PR for optimizing batching to minimize goroutines.

otoolep · 2015-02-11T19:25:50Z

server.go

@@ -42,6 +42,9 @@ const (

 	// DefaultShardRetention is the length of time before a shard is dropped.
 	DefaultShardRetention = 7 * (24 * time.Hour)
+
+	// DefaultFlushInterval is the time between flushing raw point data.
+	DefaultFlushInterval = 100 * time.Millisecond


Definitely need to get this stuff into the config. Can be a separate PR.

otoolep · 2015-02-11T22:34:31Z

Perhaps an alternative using channels would work, and we could remove the use of server locks.

WriteSeries would create a buffer, shove all the points into the buffer (well, pointers to the points perhaps), and also tack on a "done" channel to the buffer object. WriteSeries sends the buffer down another channel, specially for the Flusher. WriteSeries then blocks, waiting on the "done" channel for a single message to be received.

The Flusher then performs the batching it wants, and sends a new message back over the "done" channel when that batch has been processed, the message containing "OK" or "ERROR" for each point in the batch. I know that a batch could contain a mix of non-raw and raw series data, but handling that is all within Flusher, WriteSeries wouldn't care. Additionally, Flusher could be more complex, and process multiple buffers at a time received over its channel, combining the buffers into an even bigger batch, signalling "done" on each buffer when the larger batch has completed.

If we want completely decoupled flow, then WriteSeries doesn't even bother to block on its "done" channel.

Obviously I may be missing some requirement, but I like this because it means a) no goroutines in WriteSeries and b) no server-level locking (that I can think of).

benbjohnson · 2015-02-20T22:48:17Z

Closing because #1644 implements batching in a different way.

…ate-task-page Tasks/fill in options in update task page

benbjohnson added the 2 - Working label Feb 11, 2015

otoolep reviewed Feb 11, 2015
View reviewed changes

pauldix assigned otoolep Feb 20, 2015

benbjohnson closed this Feb 20, 2015

benbjohnson removed the 2 - Working label Feb 20, 2015

otoolep deleted the batching branch September 2, 2015 21:54

mark-rushakoff pushed a commit that referenced this pull request Jan 11, 2019

Merge pull request #1572 from influxdata/tasks/fill-in-options-in-upd…

2a35824

…ate-task-page Tasks/fill in options in update task page

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add server-level write batching. #1572

Add server-level write batching. #1572

benbjohnson commented Feb 11, 2015

otoolep Feb 11, 2015

otoolep commented Feb 11, 2015

benbjohnson commented Feb 20, 2015

Add server-level write batching. #1572

Add server-level write batching. #1572

Conversation

benbjohnson commented Feb 11, 2015

Overview

Notes

otoolep Feb 11, 2015

Choose a reason for hiding this comment

otoolep commented Feb 11, 2015

benbjohnson commented Feb 20, 2015