-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add WAL sync delay #8143
Add WAL sync delay #8143
Conversation
df8d957
to
78651fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed that this change brings the slow TestShard_WritePoints_FieldConflictConcurrentQuery
back to sub-second speed.
Please update (tsdb.Config).Diagnostics()
to includewal-fsync-delay
so that SHOW DIAGNOSTICS FOR 'config-data'
includes this setting.
@mark-rushakoff Fixed in c72f69c |
tsdb/engine/tsm1/wal.go
Outdated
// setSyncDelay sets the duration to wait before fsync writes. A value of 0 | ||
// will cause every write to be fsync'd. This must be called before the WAL | ||
// is opened. | ||
func (l *WAL) setSyncDelay(d time.Duration) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just set the delay field directly in NewEngine
, or in NewWAL
? This is an unexported method that has one line that sets an unexported field to the only parameter that's passed in and returns nothing. It's called from the constructor for Engine
, but is defined on the WAL
. Something just feels weird about the interaction here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to avoid adding another parameter to NewWAL
and passing in the whole Config
was kind of messy too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say just set the field directly, then. Creating an unexported "setter" method just feels wrong since it's not doing anything other than a field assignment.
tsdb/engine/tsm1/wal.go
Outdated
func (l *WAL) sync() { | ||
// If we're not the first to increment the sync counter, then another | ||
// goroutine is and will fsync the wal for us. | ||
if n := atomic.AddInt64(&l.syncCount, 1); n > 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a better primitive here would be atomic.CompareAndSwapUint64(0,1)
, which is what essentially is being implemented with the decrement that follows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I'll give that try.
tsdb/engine/tsm1/wal.go
Outdated
} | ||
} | ||
t.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only needs to be in the <-l.closing
case, otherwise the timer has already stopped
c72f69c
to
bbb25ac
Compare
Calling DiskSize can be expensive with many shards. Since the stats collection runs this every 10s by default, it can be expensive and wasteful to calculate the stats when nothing has changed. This avoids re-calculating the shard size unless something has chagned.
Fsyncs to the WAL can cause higher IO with lots of small writes or slower disks. This reworks the previous wal fsyncing to remove the extra goroutine and remove the hard-coded 100ms delay. Writes to the wal still maintain the invariant that they do not return to the caller until the write is fsync'd. This also adds a new config options wal-fsync-delay (default 0s) which can be increased if a delay is desired. This is somewhat useful for system with slower disks, but the current default works well as is.
bbb25ac
to
27ae292
Compare
This is a follow up to address an issue introduced #7942 where write performance was lower due to an 100ms delay added to batch up fsyncs calls to the WAL. This reworks it so that there is no delay by default, but multiple, concurrent fsyncs are coalesced into one. It also adds a
wal-fsync-delay
config option that can be increased to add a delay. The previous version also added another long running goroutine per shard. That goroutine is removed now.This change helps with high write load use cases, users not running on SSDs and when there are lots of small writes.
This also avoids some of the work done to collect shard stat data. Previously, we'd
stat
the shard data and wal dirs on every collection interval (10s). For cold shards, this these stats don't change. When there are many shards (1000's), these thousands of syscalls can be taxing and on the system when run so frequently.These stats are only collected now when the shard data on disk has actually changed due to a write or a compaction.
I tested this on an
m4.xlarge
w/sc1
volume (HDD 250 IOPS) using various workloads.For small writes using
influx-stress insert --pps 1200 -n 100000000 -b 10
, the disk is100%
utilized continuously and write latency is around 5.7s.1.2
New w/
100ms
delayWrite latency improves to
1.5s
and disk util stays around 2-3%.Testing with SSDs shows some improvement as well, but doesn't require the delay.