Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add server-level write batching. #1572

Closed
wants to merge 1 commit into from
Closed

Add server-level write batching. #1572

wants to merge 1 commit into from

Conversation

benbjohnson
Copy link
Contributor

Overview

This commit changes the Server to batch writes together -- even across multiple requests. The server then periodically sends writes together to the broker by shard in a single message.

This approach uses a lot of goroutines but it allows for all points to manage the returned error and index. This simpler approach was used to get basic batching working. I'll submit a separate PR for optimizing batching to minimize goroutines.

Notes

Batching is only implemented for raw writes. Non-raw writes only occur once for brand new series so I don't think it's worth implementing for that right now.

This commit changes the Server to batch writes together -- even across
multiple requests. The server then periodically sends writes together
to the broker by shard in a single message.

This approach uses a lot of goroutines but it allows for all points to
manage the returned error and index. This simpler approach was used to
get basic batching working. I'll submit a separate PR for optimizing
batching to minimize goroutines.
@@ -42,6 +42,9 @@ const (

// DefaultShardRetention is the length of time before a shard is dropped.
DefaultShardRetention = 7 * (24 * time.Hour)

// DefaultFlushInterval is the time between flushing raw point data.
DefaultFlushInterval = 100 * time.Millisecond
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely need to get this stuff into the config. Can be a separate PR.

@otoolep
Copy link
Contributor

otoolep commented Feb 11, 2015

Perhaps an alternative using channels would work, and we could remove the use of server locks.

WriteSeries would create a buffer, shove all the points into the buffer (well, pointers to the points perhaps), and also tack on a "done" channel to the buffer object. WriteSeries sends the buffer down another channel, specially for the Flusher. WriteSeries then blocks, waiting on the "done" channel for a single message to be received.

The Flusher then performs the batching it wants, and sends a new message back over the "done" channel when that batch has been processed, the message containing "OK" or "ERROR" for each point in the batch. I know that a batch could contain a mix of non-raw and raw series data, but handling that is all within Flusher, WriteSeries wouldn't care. Additionally, Flusher could be more complex, and process multiple buffers at a time received over its channel, combining the buffers into an even bigger batch, signalling "done" on each buffer when the larger batch has completed.

If we want completely decoupled flow, then WriteSeries doesn't even bother to block on its "done" channel.

Obviously I may be missing some requirement, but I like this because it means a) no goroutines in WriteSeries and b) no server-level locking (that I can think of).

@benbjohnson
Copy link
Contributor Author

Closing because #1644 implements batching in a different way.

@otoolep otoolep deleted the batching branch September 2, 2015 21:54
mark-rushakoff pushed a commit that referenced this pull request Jan 11, 2019
…ate-task-page

Tasks/fill in options in update task page
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants