Prevent excessive memory usage when dropping series #8630
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a change to speed up deleting and dropping measurements
that executed the deletes in parallel for all shards at once around the time 1.0 was released (#7015).
When TSI was merged in #7618, the series keys passed into
Shard.DeleteMeasurement
were removed and were expanded lower down. This causes memory to blow up
when a delete across many shards occurs as we now expand the set of series
keys N times (1 for each shard) instead of just once as before. We actually didn't expand them in the old version since the in-memory index slice was passed in directly. This isn't possible in TSI though.
While running the deletes in parallel would be ideal, there have been a number
of optimizations in the delete path that make running deletes serially pretty
good. This change just limits the concurrency of the deletes which keeps memory
more stable. I think we can still get back to running them concurrently, but that would involve a much larger interface change.
In my local tests, dropping 1M series in 75 shards went from
1m33s
to3.9s
usingdrop measurement
, but degraded from2m37s
to3m25s
when usingdelete from cpu
. Memory usage now only uses a few hundred MB instead of ballooning to several GBs.Before
After
Required for all non-trivial PRs