-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved compaction scheduling #8886
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily changes, but just a couple of queries about closing/removing files.
tsdb/engine/tsm1/writer.go
Outdated
return nil | ||
} | ||
|
||
if err := d.fd.Close(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if there is a problem closing the file we're not going to clean up, but we're not going to return an error to the caller either? This scenario seems unusual enough that we don't want to ignore it or not log it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be returning the err
here. Will fix.
tsdb/engine/tsm1/writer.go
Outdated
} | ||
|
||
if f, ok := t.wrapped.(*os.File); ok { | ||
f.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we're going to ignore a Close
error (fine) but then still try and remove the file. This seems at odds with how the directIndex
is handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix this up too.
Doing this for the WAL reduces throughput quite a bit.
This changes the compaction scheduling to better utilize the available cores that are free. Previously, a level was planned in its own goroutine and would kick off a number of compactions groups. The problem with this model was that if there were 4 groups, and 3 completed quickly, the planning would be blocked for that level until the last group finished. If the compactions at the prior level are running more quickly, a large backlog could accumlate. This now moves the planning to a single goroutine that plans each level in succession and starts as many groups as it can. When one group finishes, the planning will start the next group for the level.
Shows up under high cardinality compactions.
This switches the thresholds that are used for writing snapshots concurrently. This scales better than the prior model.
The chunked slice is unnecessary and we can re-use k.blocks throughout the compaction.
This gives an indication as to whether compactions are backed up or not.
With higher cardinality or larger series keys, the files can roll over early which causes them to take longer to be compacted by higher levels. This causes larger disk usage and higher numbers of tsm files at times.
One shard might be able to run a compaction, but could fail to limits being hit. This loop would continue indefinitely as the same task would continue to be rescheduled.
This check doesn't make sense for high cardinality data as the files typically get big and sparse very quickly. This causes a lot of extra disk space to be used which is taken up by large indexes and sparse data.
If there is a backlog of level 3 and 4 compacitons, but few level 1 and 2 compactions, allow them to use some excess capacity.
Some files seem to get orphan behind higher levels. This causes the compactions to get blocked as the lowere level files will not get picked up by their lower level planners. This allows the full plan to identify them and pull them into their plans.
Required for all non-trivial PRs
This fixes some regressions and new issues due to recent changes from #8872 and #8856.
O_SYNC
flag for the wal for now.max-concurrent-compactions
).There are also two bug fixes where the temp TSM index file could be left around when running out of disk space as well as fixing the cache mem bytes stat to include the size of the series key.