Add WAL sync delay #8143

jwilder · 2017-03-15T18:17:27Z

This is a follow up to address an issue introduced #7942 where write performance was lower due to an 100ms delay added to batch up fsyncs calls to the WAL. This reworks it so that there is no delay by default, but multiple, concurrent fsyncs are coalesced into one. It also adds a wal-fsync-delay config option that can be increased to add a delay. The previous version also added another long running goroutine per shard. That goroutine is removed now.

This change helps with high write load use cases, users not running on SSDs and when there are lots of small writes.

This also avoids some of the work done to collect shard stat data. Previously, we'd stat the shard data and wal dirs on every collection interval (10s). For cold shards, this these stats don't change. When there are many shards (1000's), these thousands of syscalls can be taxing and on the system when run so frequently.

These stats are only collected now when the shard data on disk has actually changed due to a write or a compaction.

I tested this on an m4.xlarge w/ sc1 volume (HDD 250 IOPS) using various workloads.

For small writes using influx-stress insert --pps 1200 -n 100000000 -b 10, the disk is 100% utilized continuously and write latency is around 5.7s.

1.2

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdf              0.00    23.00    0.00   63.00     0.00   348.00    11.05     1.02   16.32    0.00   16.32  15.87 100.00

New w/ `100ms` delay

Write latency improves to 1.5s and disk util stays around 2-3%.

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     1.00    0.00    8.00     0.00    36.00     9.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdf              0.00    20.00    0.00   32.00     0.00   212.00    13.25     0.03    0.88    0.00    0.88   0.75   2.40

Testing with SSDs shows some improvement as well, but doesn't require the delay.

mark-rushakoff

Confirmed that this change brings the slow TestShard_WritePoints_FieldConflictConcurrentQuery back to sub-second speed.

Please update (tsdb.Config).Diagnostics() to includewal-fsync-delay so that SHOW DIAGNOSTICS FOR 'config-data' includes this setting.

jwilder · 2017-03-15T18:32:16Z

@mark-rushakoff Fixed in c72f69c

joelegasse · 2017-03-15T18:44:05Z

tsdb/engine/tsm1/wal.go

+// setSyncDelay sets the duration to wait before fsync writes.  A value of 0
+// will cause every write to be fsync'd.  This must be called before the WAL
+// is opened.
+func (l *WAL) setSyncDelay(d time.Duration) {


Maybe just set the delay field directly in NewEngine, or in NewWAL? This is an unexported method that has one line that sets an unexported field to the only parameter that's passed in and returns nothing. It's called from the constructor for Engine, but is defined on the WAL. Something just feels weird about the interaction here.

I was trying to avoid adding another parameter to NewWAL and passing in the whole Config was kind of messy too.

I'd say just set the field directly, then. Creating an unexported "setter" method just feels wrong since it's not doing anything other than a field assignment.

joelegasse · 2017-03-15T18:46:12Z

tsdb/engine/tsm1/wal.go

+func (l *WAL) sync() {
+	// If we're not the first to increment the sync counter, then another
+	// goroutine is and will fsync the wal for us.
+	if n := atomic.AddInt64(&l.syncCount, 1); n > 1 {


I think a better primitive here would be atomic.CompareAndSwapUint64(0,1), which is what essentially is being implemented with the decrement that follows.

Good idea. I'll give that try.

joelegasse · 2017-03-15T18:52:29Z

tsdb/engine/tsm1/wal.go

 		}
-	}
+		t.Stop()


This only needs to be in the <-l.closing case, otherwise the timer has already stopped

Calling DiskSize can be expensive with many shards. Since the stats collection runs this every 10s by default, it can be expensive and wasteful to calculate the stats when nothing has changed. This avoids re-calculating the shard size unless something has chagned.

Fsyncs to the WAL can cause higher IO with lots of small writes or slower disks. This reworks the previous wal fsyncing to remove the extra goroutine and remove the hard-coded 100ms delay. Writes to the wal still maintain the invariant that they do not return to the caller until the write is fsync'd. This also adds a new config options wal-fsync-delay (default 0s) which can be increased if a delay is desired. This is somewhat useful for system with slower disks, but the current default works well as is.

jwilder added this to the 1.3.0 milestone Mar 15, 2017

jwilder requested review from mark-rushakoff and joelegasse March 15, 2017 18:17

jwilder added the in progress label Mar 15, 2017

jwilder force-pushed the jw-wal-sync-goroutine branch from df8d957 to 78651fb Compare March 15, 2017 18:18

mark-rushakoff suggested changes Mar 15, 2017

View reviewed changes

mark-rushakoff approved these changes Mar 15, 2017

View reviewed changes

joelegasse suggested changes Mar 15, 2017

View reviewed changes

jwilder force-pushed the jw-wal-sync-goroutine branch from c72f69c to bbb25ac Compare March 15, 2017 22:29

jwilder added 3 commits March 15, 2017 16:29

Add wal-fsync-delay to Diagnostics

27ae292

jwilder force-pushed the jw-wal-sync-goroutine branch from bbb25ac to 27ae292 Compare March 15, 2017 22:31

joelegasse approved these changes Mar 16, 2017

View reviewed changes

jwilder merged commit ce8d8e4 into master Mar 16, 2017

jwilder deleted the jw-wal-sync-goroutine branch March 16, 2017 15:03

jwilder removed the in progress label Mar 16, 2017

jwilder mentioned this pull request Apr 5, 2017

Low WAL performance for many small write requests #7222

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add WAL sync delay #8143

Add WAL sync delay #8143

jwilder commented Mar 15, 2017 •

edited

Loading

mark-rushakoff left a comment

jwilder commented Mar 15, 2017

joelegasse Mar 15, 2017

jwilder Mar 15, 2017

joelegasse Mar 15, 2017

joelegasse Mar 15, 2017

jwilder Mar 15, 2017

joelegasse Mar 15, 2017

Add WAL sync delay #8143

Add WAL sync delay #8143

Conversation

jwilder commented Mar 15, 2017 • edited Loading

1.2

New w/ 100ms delay

mark-rushakoff left a comment

Choose a reason for hiding this comment

jwilder commented Mar 15, 2017

joelegasse Mar 15, 2017

Choose a reason for hiding this comment

jwilder Mar 15, 2017

Choose a reason for hiding this comment

joelegasse Mar 15, 2017

Choose a reason for hiding this comment

joelegasse Mar 15, 2017

Choose a reason for hiding this comment

jwilder Mar 15, 2017

Choose a reason for hiding this comment

joelegasse Mar 15, 2017

Choose a reason for hiding this comment

jwilder commented Mar 15, 2017 •

edited

Loading

New w/ `100ms` delay