Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WAL sync delay #8143

Merged
merged 3 commits into from
Mar 16, 2017
Merged

Add WAL sync delay #8143

merged 3 commits into from
Mar 16, 2017

Conversation

jwilder
Copy link
Contributor

@jwilder jwilder commented Mar 15, 2017

This is a follow up to address an issue introduced #7942 where write performance was lower due to an 100ms delay added to batch up fsyncs calls to the WAL. This reworks it so that there is no delay by default, but multiple, concurrent fsyncs are coalesced into one. It also adds a wal-fsync-delay config option that can be increased to add a delay. The previous version also added another long running goroutine per shard. That goroutine is removed now.

This change helps with high write load use cases, users not running on SSDs and when there are lots of small writes.

This also avoids some of the work done to collect shard stat data. Previously, we'd stat the shard data and wal dirs on every collection interval (10s). For cold shards, this these stats don't change. When there are many shards (1000's), these thousands of syscalls can be taxing and on the system when run so frequently.

These stats are only collected now when the shard data on disk has actually changed due to a write or a compaction.

I tested this on an m4.xlarge w/ sc1 volume (HDD 250 IOPS) using various workloads.

For small writes using influx-stress insert --pps 1200 -n 100000000 -b 10, the disk is 100% utilized continuously and write latency is around 5.7s.

1.2

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdf              0.00    23.00    0.00   63.00     0.00   348.00    11.05     1.02   16.32    0.00   16.32  15.87 100.00

New w/ 100ms delay

Write latency improves to 1.5s and disk util stays around 2-3%.

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     1.00    0.00    8.00     0.00    36.00     9.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdf              0.00    20.00    0.00   32.00     0.00   212.00    13.25     0.03    0.88    0.00    0.88   0.75   2.40

Testing with SSDs shows some improvement as well, but doesn't require the delay.

@jwilder jwilder added this to the 1.3.0 milestone Mar 15, 2017
@jwilder jwilder force-pushed the jw-wal-sync-goroutine branch from df8d957 to 78651fb Compare March 15, 2017 18:18
Copy link
Contributor

@mark-rushakoff mark-rushakoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed that this change brings the slow TestShard_WritePoints_FieldConflictConcurrentQuery back to sub-second speed.

Please update (tsdb.Config).Diagnostics() to includewal-fsync-delay so that SHOW DIAGNOSTICS FOR 'config-data' includes this setting.

@jwilder
Copy link
Contributor Author

jwilder commented Mar 15, 2017

@mark-rushakoff Fixed in c72f69c

// setSyncDelay sets the duration to wait before fsync writes. A value of 0
// will cause every write to be fsync'd. This must be called before the WAL
// is opened.
func (l *WAL) setSyncDelay(d time.Duration) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just set the delay field directly in NewEngine, or in NewWAL? This is an unexported method that has one line that sets an unexported field to the only parameter that's passed in and returns nothing. It's called from the constructor for Engine, but is defined on the WAL. Something just feels weird about the interaction here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to avoid adding another parameter to NewWAL and passing in the whole Config was kind of messy too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say just set the field directly, then. Creating an unexported "setter" method just feels wrong since it's not doing anything other than a field assignment.

func (l *WAL) sync() {
// If we're not the first to increment the sync counter, then another
// goroutine is and will fsync the wal for us.
if n := atomic.AddInt64(&l.syncCount, 1); n > 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better primitive here would be atomic.CompareAndSwapUint64(0,1), which is what essentially is being implemented with the decrement that follows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I'll give that try.

}
}
t.Stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only needs to be in the <-l.closing case, otherwise the timer has already stopped

@jwilder jwilder force-pushed the jw-wal-sync-goroutine branch from c72f69c to bbb25ac Compare March 15, 2017 22:29
jwilder added 3 commits March 15, 2017 16:29
Calling DiskSize can be expensive with many shards.  Since the stats
collection runs this every 10s by default, it can be expensive and
wasteful to calculate the stats when nothing has changed. This avoids
re-calculating the shard size unless something has chagned.
Fsyncs to the WAL can cause higher IO with lots of small writes or
slower disks.  This reworks the previous wal fsyncing to remove the
extra goroutine and remove the hard-coded 100ms delay.  Writes to
the wal still maintain the invariant that they do not return to the
caller until the write is fsync'd.

This also adds a new config options wal-fsync-delay (default 0s)
which can be increased if a delay is desired.  This is somewhat useful
for system with slower disks, but the current default works well as
is.
@jwilder jwilder force-pushed the jw-wal-sync-goroutine branch from bbb25ac to 27ae292 Compare March 15, 2017 22:31
@jwilder jwilder merged commit ce8d8e4 into master Mar 16, 2017
@jwilder jwilder deleted the jw-wal-sync-goroutine branch March 16, 2017 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants