-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle high cardinality deletes in TSM engine #9084
Conversation
tsdb/engine/tsm1/engine.go
Outdated
for len(tempKeys) > 0 && bytes.Compare(tempKeys[0], seriesKey) < 0 { | ||
tempKeys = tempKeys[1:] | ||
// Strip off the field portion of the key | ||
seriesKey, _ := SeriesAndFieldFromCompositeKey([]byte(k)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it appears k
already a []byte
, therefore []byte(k)
is redundant
pkg/bytesutil/bytesutil.go
Outdated
for i < len(a) { | ||
// Find the next gap to remove | ||
iStart = i | ||
for ; i < len(a) && a[i] == val; i += width { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think about moving the increment inside the for
statement block?
for i < len(a) && a[i] == val {
i += width
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍🏻
14a45aa
to
a27f6d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 just a nit.
return nil | ||
} else if e := itr.Next(); e != nil { | ||
mm := fs.Measurement(mname) | ||
if mm == nil || mm.HasSeries() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if a measurement is nil
, then it can't have any series. Why not just make HasSeries
check if the receiver is nil
and return false
? Then you don't need the mm == nil || mm.HasSeries()
This adds a new v4 tombstone format that extends the v3 format by allowing multiple batches of tombstones to be written without having to re-read all the existing tombstones. This uses gzip multi stream to append multiple v3 files together to create a v4 format.
This removes the in-memory tombstone buffer when writing tombstones which eliminates one source of large memory spikes during deletes.
Allows callers to use []byte and avoid a string allocation
This is a version of DeleteRange that take a func predicate to determine whether a series key should be deleted or not. This avoids the large slice allocations with higher cardinalities.
The query language min and max times are slighly different than the values used in the engine. This allows faster codes to be used when the whole time range is deleted.
If fn returned and error, the goroutines sending keys from TSM files would get blocked indefinitely and leak.
This removes the containsSeries func which ends up creating a map sized to the slice of keys passed in. This doesn't scale well to high cardinalities and creates a lot of garbage.
If you have lots of data stored locally, this test takes a while to complete since it loads it all up from the users home dir.
This filters out keys that do not exist in a TSM file to avoid writing entries that would end up being ignored when applied.
a27f6d0
to
a6e1fae
Compare
This optimizes how deletes are processed to reduce memory usage and improve efficiency.
This removes more allocations and speeds up some critical sections.
This fixes a race where writes and deletes to the same series and measurements could sometimes leave the index in an inconsistent state.
a6e1fae
to
e772cbb
Compare
The DropSeries code path ended up creating a MeasurementSeriesIterator for each dropped series, this was too expensive just to see if a series exists. This adds a HasSeries func and fixes and issue where TSI files were compacted while an iterator was still in use causing a panic.
e772cbb
to
8b18cc4
Compare
[ci skip]
Required for all non-trivial PRs
This PR has a number of changes to reduce memory usage in the TSM engine for deletes that could lead to OOMs. This required reworking how deletes are processed altogether.
The initial version of the deletes in the engine took a slice of all the series keys to be deleted and then processed those keys. This worked fine for smaller cardinality deletes, but is problematic for higher cardinalities as well as when the data set grows across many TSM files.
The main problems addressed are:
Engine.DeleteSeriesRange
needed to create several intermediate slices and maps that were at least as big as the original series keys passed in. These slices and maps are used to track the field-less series keys (passed in), to the actual series keys stored in TSM (with fields).containsSeries
which determined if any data exists for the series in the engine. If nothing existed, the entries were removed from the index as well. This was racy and memory intensive and has been reworked to use smaller, re-usable batches.The graphs below are a before and after deleting 8M series from
inmem
in a single shard with about 20 large TSM files that are not fully compacted. The first graphs shows a fairly large spike that is correlated w/ the number of series and TSM files to write tombstones. The seconds still uses a bit too much memory (still looking into), but the large spike is gone.Inmem Deleting 8M series Before/After
TSI Deleting 32M series Before/After
Master ran for 12+hrs and I eventually killed it. This change completed in 17m, but I should be able to improve that further.