Write throughput/concurrency improvements #8302

jwilder · 2017-04-19T02:34:06Z

Required for all non-trivial PRs

Rebased/mergable
Tests pass
CHANGELOG.md updated
Sign CLA (if not already signed)

This PR has a few performance improvements and bug fixes to the write path which improve overall write throughput when there are many concurrent writers.

The graphs below show the write latency and throughput changes from 1.2.3 to this PR (based off of 1.3) using batches of 5k and 10k and varying concurrent writers from 50-200 in each run. Each of the humps is a write load of 500M values over 100k series with varying concurrency and batch sizes. The top row is 5k batches. Second row is 10k batches. Left column is 1.2 and right column is this PR. These were run a m4.16xlarge w/ 1000 IOPS SSDs using go1.8.1 and inmem and tsi1 index.

For 5k batches, write throughput improved from 1.1M w/s to 2.3M w/s and latency decreased by ~50% and stayed below 500ms after the change.

For 10k batches, write throughput improved from 1.8M w/s to 2.3M w/s and latency decreased by ~30%.

The majority of the changes focus on reducing lock contention and allocations in the hot path.

The throughput was remarkably similar between inmem and tsi. tsi on master hit write timeouts under higher concurrency immediately when the run started. With this PR those, timesout were fixed as well.

inmem

tsi

The lock shows up under write load. It only needs to be assigned once so a read lock eliminates the contention.

Series and Measurment have their own locks and we do not need to hold locks on the index while using those types.

nussjustin · 2017-04-19T06:30:16Z

tsdb/engine/tsm1/pools.go

+	if x == nil {
+		return make([]byte, size)
+	}
+	buf := x.([]byte)


Correct me if I'm wrong, but putting a []byte (or any slice) into a sync.Pool will allocate as the conversion from slice to interface needs to allocate space for the slice on the heap. This can be avoided by pooling *[]byte instead. For example:

// getBuf returns a buffer with length size from the buffer pool. func getBuf(size int) *[]byte { x := bufPool.Get() if x == nil { b := make([]byte, size) return &b } buf := x.(*[]byte) if cap(*buf) < size { b := make([]byte, size) return &b } *buf = (*buf)[:size] return buf } // putBuf returns a buffer to the pool. func putBuf(buf *[]byte) { bufPool.Put(buf) }

A quick local benchmark shows that this avoids an allocation.

ghost · 2017-04-19T15:10:58Z

tsdb/engine/tsm1/wal.go

@@ -121,7 +121,7 @@ func NewWAL(path string) *WAL {
 		// these options should be overriden by any options in the config
 		SegmentSize: DefaultSegmentSize,
 		closing:     make(chan struct{}),
-		syncWaiters: make(chan chan error, 256),
+		syncWaiters: make(chan chan error, 1024),


(nit) Commit message says "This increases the buffer to 4096"

If the sync waiters channel was full, it would block sending to the channel while holding a the wal write lock. The sync goroutine would then be stuck acquiring the write lock and could not drain the channel. This increases the buffer to 1024 which would require a very high write load to fill as well as retuns and error if the channel is full to prevent the blocking.

jwilder · 2017-04-20T00:13:25Z

Added tsi tests and fixes.

e-dard

LGTM 👍

Under high write load, the sync goroutine would startup, and end very frequently. Starting a new goroutine so frequently adds a small amount of latency which causes writes to take long and sometimes timeout. This changes the goroutine to loop until there are no more waiters which reduce the churn and latency.

The inmem index would call CreateSeriesIfNotExist for each series which takes and releases and RLock to see if a series exists. Under high write load, the lock shows up in profiles quite a bit. This adds a filtering step that obtains a single RLock and checks all the series and returns the non-existent series to contine though the slow path.

When many TSM files are being compacted, the buffers can add up fairly quickly.

The current bytes.Pool will hold onto byte slices indefinitely. Large writes can cause the pool to hold onto very large buffers over time. Testing w/ sync/pool seems to perform similarly now so using a sync/pool will allow these buffers to be GC'd when necessary.

Under high write load, the check for each series was done sequentially which caused a lot of CPU time to acquire/release the RLock on LogFile. This switches the code to check multiple series at once under an RLock similar to the chang for inmem.

There was contention on the write lock which only needs to be acquired when checking to see if the log file should be rolled over.

jwilder added 2 commits April 18, 2017 16:32

Reduce lock content in AssignShard

883b3dc

The lock shows up under write load. It only needs to be assigned once so a read lock eliminates the contention.

Reduce index lock contention

a19ce9c

Series and Measurment have their own locks and we do not need to hold locks on the index while using those types.

jwilder added this to the 1.3.0 milestone Apr 19, 2017

jwilder requested review from benbjohnson and e-dard April 19, 2017 02:34

jwilder added the review label Apr 19, 2017

nussjustin reviewed Apr 19, 2017

View reviewed changes

ghost reviewed Apr 19, 2017

View reviewed changes

jwilder force-pushed the jw-writes2 branch from 33d9dc0 to 82d62f8 Compare April 19, 2017 18:20

benbjohnson approved these changes Apr 20, 2017

View reviewed changes

e-dard approved these changes Apr 20, 2017

View reviewed changes

jwilder added 8 commits April 20, 2017 12:28

Avoid growing slice when mapping points to shards

8aeda47

Reduce TSM write buffer

d155d37

When many TSM files are being compacted, the buffers can add up fairly quickly.

Reduce lock contention on MeasurementFields

0e715b5

Fix lock contention in Index.CreateSeriesListIfNotExists

02b663b

There was contention on the write lock which only needs to be acquired when checking to see if the log file should be rolled over.

jwilder force-pushed the jw-writes2 branch from ac372af to 02b663b Compare April 20, 2017 18:30

Update changelog

71825d2

jwilder merged commit 4da7054 into master Apr 20, 2017

jwilder deleted the jw-writes2 branch April 20, 2017 19:23

jwilder removed the review label Apr 20, 2017

jwilder mentioned this pull request May 11, 2017

Write and compaction stability #8384

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write throughput/concurrency improvements #8302

Write throughput/concurrency improvements #8302

jwilder commented Apr 19, 2017 •

edited

Loading

nussjustin Apr 19, 2017

ghost Apr 19, 2017 •

edited by ghost

Loading

jwilder Apr 19, 2017

jwilder commented Apr 20, 2017

e-dard left a comment

Write throughput/concurrency improvements #8302

Write throughput/concurrency improvements #8302

Conversation

jwilder commented Apr 19, 2017 • edited Loading

Required for all non-trivial PRs

inmem

tsi

nussjustin Apr 19, 2017

Choose a reason for hiding this comment

ghost Apr 19, 2017 • edited by ghost Loading

Choose a reason for hiding this comment

jwilder Apr 19, 2017

Choose a reason for hiding this comment

jwilder commented Apr 20, 2017

e-dard left a comment

Choose a reason for hiding this comment

jwilder commented Apr 19, 2017 •

edited

Loading

ghost Apr 19, 2017 •

edited by ghost

Loading