Add experimental support to force compact TSDB head when ingester in-memory series are above a threshold #5371

pracucci · 2023-06-28T11:57:13Z

What this PR does

In this PR I propose to add the experimental support to force compact the per-tenant TSDB Head when the ingester in-memory series are above a configured threshold and the ingester estimates that compacting the head up until "now - active series idle timeout" can significantly reduce the number of in-memory series.

How it works:

When enabled, the ingester checks if in-memory series are above the configured threshold every -blocks-storage.tsdb.head-compaction-interval.
When above the threshold, for each TSDB compares the in-memory series vs the active series. If the inactive active series % (vs the total in-memory series in the ingester) are above a configured % threshold, it force the compaction of that per-tenant TSDB.
While forcing the compaction, the ingester allows requests containing samples newer than "now - active series idle timeout", while will reject requests containing older samples. Once the compaction is completed, the TSDB Head "min time" will be set to "now - active series idle timeout", so the in-order head will allow to only accept samples newer than that. If OOO ingestion is enabled, older samples will be allowed to be ingesters via OOO Head.

Notes about forced compaction:

I haven't added a configuration option to set the maximum frequency because in practice it should be limited by:
- -blocks-storage.tsdb.head-compaction-interval
- -blocks-storage.tsdb.forced-head-compaction-min-estimated-series-reduction-percentage

Notes about TSDB:

Talking to @codesome, he mentioned that he didn't remember if appending to a chunk while being compacted was safe (assuming that we append a sample with timestamp greater than the head compaction range). I checked the compaction logic and, from the head chunks reading perspective, it shouldn't be different than any other query (at least I haven't found any proof of the opposite yet).

Which issue(s) this PR fixes or relates to

N/A

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

pstibrany

I don't see any major blockers in the PR, changes make sense to me, and it's worth giving it a try.

pkg/ingester/user_tsdb.go

pstibrany · 2023-06-28T13:23:28Z

pkg/ingester/user_tsdb.go

 	return u.db.CompactOOOHead()
 }

+// nextForcedHeadCompactionRange computes the next TSDB head range to compact when a forced compaction
+// is triggered. If the returned isValid is false, then the returned range should not be compacted.
+func nextForcedHeadCompactionRange(blockDuration, headMinTime, headMaxTime, forcedMaxTime int64) (minTime, maxTime int64, isValid, isLast bool) {


nit: isLast bool looks like unnecessary optimization to save one call to the function, and it can be wrong (if forcedMaxTime is aligned on block boundary). I'd drop it.

It's not an optimization. The problem is that when compacting a partial head (so for the case of this PR) without that flag we may end up in a infinite loop.

Since no pushes are allowed that would modify maxTime, why would we end up in an infinite loop? Are you talking about the case when minTime == maxTime?

pkg/ingester/ingester.go

pkg/ingester/ingester_forced_compaction_test.go

pracucci · 2023-06-29T09:08:17Z

No good. Just got a panic in CI. Will investigate:

panic: corruption in head chunk file /tmp/TestIngester_compactBlocksToReduceInMemorySeries_ConcurrencyRun_31016170653/001/1/chunks_head/000000: head chunk file index 0 does not exist on disk

goroutine 1224 [running]:
github.com/prometheus/prometheus/tsdb.(*memSeries).chunk(0xc0021ba0d0, 0x0, {0x3d2cb40, 0xc0000da2d0}, 0xc000fd4c02?)
	/__w/mimir/mimir/vendor/github.com/prometheus/prometheus/tsdb/head_read.go:393 +0x4f0
github.com/prometheus/prometheus/tsdb.(*headChunkReader).chunk(0xc0039ac020, {0x2e000000, {0x0, 0x0}, 0x189065c0425, 0x189065c049c, 0x0, 0x0, 0x0}, 0x0)
	/__w/mimir/mimir/vendor/github.com/prometheus/prometheus/tsdb/head_read.go:329 +0x1c5
github.com/prometheus/prometheus/tsdb.(*headChunkReader).Chunk(0xc00102c[52](https://github.com/grafana/mimir/actions/runs/5410305115/jobs/9831584129?pr=5371#step:8:53)0?, {0x2e000000, {0x0, 0x0}, 0x189065c0425, 0x189065c049c, 0x0, 0x0, 0x0})
	/__w/mimir/mimir/vendor/github.com/prometheus/prometheus/tsdb/head_read.go:306 +0x105
github.com/prometheus/prometheus/tsdb.(*populateWithDelGenericSeriesIterator).next(0xc000afeb40, 0x1)
	/__w/mimir/mimir/vendor/github.com/prometheus/prometheus/tsdb/querier.go:601 +0x778
github.com/prometheus/prometheus/tsdb.(*populateWithDelChunkSeriesIterator).Next(0xc000afeb40)
	/__w/mimir/mimir/vendor/github.com/prometheus/prometheus/tsdb/querier.go:736 +0x52
github.com/prometheus/prometheus/tsdb.DefaultBlockPopulator.PopulateBlock({}, {0x3d1eb48, 0xc000601450}, 0xc000419a40, {0x3d087a0, 0xc000601360}, {0x3d12b80, 0xc003e89200}, 0xc003f3a030?, {0x1, ...}, ...)
	/__w/mimir/mimir/vendor/github.com/prometheus/prometheus/tsdb/compact.go:1095 +0x2166
github.com/prometheus/prometheus/tsdb.(*LeveledCompactor).write(0xc003e89680, {0xc000102360, 0x[56](https://github.com/grafana/mimir/actions/runs/5410305115/jobs/9831584129?pr=5371#step:8:57)}, {0xc001db7360?, 0x1, 0x1}, {0x3d0e200, 0x4c1a018}, {0xc00029b870, 0x1, ...})
	/__w/mimir/mimir/vendor/github.com/prometheus/prometheus/tsdb/compact.go:817 +0x1083
github.com/prometheus/prometheus/tsdb.(*LeveledCompactor).Write(0xc003e89680, {0xc000102360, 0x56}, {0x3d21880?, 0xc001cb6520}, 0x189065c04[57](https://github.com/grafana/mimir/actions/runs/5410305115/jobs/9831584129?pr=5371#step:8:58), 0x1890[65](https://github.com/grafana/mimir/actions/runs/5410305115/jobs/9831584129?pr=5371#step:8:66)c046f, 0x0)
	/__w/mimir/mimir/vendor/github.com/prometheus/prometheus/tsdb/compact.go:[69](https://github.com/grafana/mimir/actions/runs/5410305115/jobs/9831584129?pr=5371#step:8:70)4 +0x4e7
github.com/prometheus/prometheus/tsdb.(*DB).compactHead(0xc0006b9100, 0xc001cb6520)
	/__w/mimir/mimir/vendor/github.com/prometheus/prometheus/tsdb/db.go:1290 +0x115
github.com/prometheus/prometheus/tsdb.(*DB).CompactHead(0xc0006b9100, 0xc001cb6520)
	/__w/mimir/mimir/vendor/github.com/prometheus/prometheus/tsdb/db.go:1181 +0xc5
github.com/grafana/mimir/pkg/ingester.(*userTSDB).compactHead(0xc0018be2a0, 0x6ddd00, 0x189065c046e)
	/__w/mimir/mimir/pkg/ingester/user_tsdb.go:220 +0x45f
github.com/grafana/mimir/pkg/ingester.(*Ingester).compactBlocks.func1({0x0?, 0x0?}, {0x3cfb6e0, 0x1})
	/__w/mimir/mimir/pkg/ingester/ingester.go:24[70](https://github.com/grafana/mimir/actions/runs/5410305115/jobs/9831584129?pr=5371#step:8:71) +0x317
github.com/grafana/dskit/concurrency.ForEachUser.func1()
	/__w/mimir/mimir/vendor/github.com/grafana/dskit/concurrency/runner.go:45 +0x1[76](https://github.com/grafana/mimir/actions/runs/5410305115/jobs/9831584129?pr=5371#step:8:77)
created by github.com/grafana/dskit/concurrency.ForEachUser
	/__w/mimir/mimir/vendor/github.com/grafana/dskit/concurrency/runner.go:36 +0x1e5

pracucci · 2023-06-29T10:27:31Z

Another panic observed locally, while investigating the previous one:

panic: expected newly cut file to have sequence:offset 1:8, got 2:8

goroutine 1295 [running]:
github.com/prometheus/prometheus/tsdb.handleChunkWriteError({0x2e20ea0?, 0xc000e698f0?})
	/Users/marco/workspace/src/github.com/grafana/mimir/vendor/github.com/prometheus/prometheus/tsdb/head_append.go:1518 +0x76
github.com/prometheus/prometheus/tsdb/chunks.(*chunkWriteQueue).processJob(0xc000b706e0, {0x1, 0x7, 0x18906b0f1ec, 0x18906b0f263, {0x2e3ca68, 0xc002ed4f00}, 0x100000008, 0x0, 0x2a4e6f0})
	/Users/marco/workspace/src/github.com/grafana/mimir/vendor/github.com/prometheus/prometheus/tsdb/chunks/chunk_write_queue.go:150 +0x8c
github.com/prometheus/prometheus/tsdb/chunks.(*chunkWriteQueue).start.func1()
	/Users/marco/workspace/src/github.com/grafana/mimir/vendor/github.com/prometheus/prometheus/tsdb/chunks/chunk_write_queue.go:138 +0xc9
created by github.com/prometheus/prometheus/tsdb/chunks.(*chunkWriteQueue).start
	/Users/marco/workspace/src/github.com/grafana/mimir/vendor/github.com/prometheus/prometheus/tsdb/chunks/chunk_write_queue.go:119 +0x6d

pracucci · 2023-06-29T20:19:28Z

panic: expected newly cut file to have sequence:offset 1:8, got 2:8

I did some investigation and this panic shouldn't happen in the real world. It's triggered by an edge case in the unit test, that should be fixed by this: prometheus/prometheus#12500

panic: corruption in head chunk file /tmp/TestIngester_compactBlocksToReduceInMemorySeries_ConcurrencyRun_31016170653/001/1/chunks_head/000000: head chunk file index 0 does not exist on disk

This should also be fixed by prometheus/prometheus#12500 and, again, in practice shouldn't happen in the real world, but about this panic I want to do more investigation tomorrow to be more confident.

pstibrany

Great work! I've left some minor comments, but nothing blocking as far as I can see.

pkg/ingester/user_tsdb.go

pkg/storage/tsdb/config.go

pkg/ingester/user_tsdb_test.go

pkg/ingester/ingester_forced_compaction_test.go

pkg/storage/tsdb/config.go

osg-grafana

Unblocking with nits

cmd/mimir/config-descriptor.json

cmd/mimir/help-all.txt.tmpl

pkg/storage/tsdb/config.go

pkg/ingester/ingester.go

colega · 2023-06-30T09:25:00Z

pkg/ingester/ingester.go

+		estimatedSeriesReduction := db.Head().NumSeries() - uint64(db.activeSeries.Active())
+
+		// Skip if the estimated series reduction is too low (there would be no big benefit).
+		if float64(estimatedSeriesReduction)/float64(numInitialMemorySeries)*100 < float64(i.cfg.BlocksStorageConfig.TSDB.EarlyHeadCompactionMinEstimatedSeriesReductionPercentage) {


Shouldn't we do this?

Suggested change

if float64(estimatedSeriesReduction)/float64(numInitialMemorySeries)*100 < float64(i.cfg.BlocksStorageConfig.TSDB.EarlyHeadCompactionMinEstimatedSeriesReductionPercentage) {

if float64(estimatedSeriesReduction)/float64(db.Head().NumSeries())*100 < float64(i.cfg.BlocksStorageConfig.TSDB.EarlyHeadCompactionMinEstimatedSeriesReductionPercentage) {

The idea of this PR is to check it against the total number of series in the ingester, and not the per-tenant TSDB. Let's say you configure the percentage to 10%, a tenant has 100 series, and 20 are inactive. If you check against the per-tenant TSDB, it will trigger the compaction, but in practice it will not move a needle. Comparing against the total number of series in the ingester allow us to only trigger the early compaction in case there are big opportunities.

But then this setting will be quite useless in multi-tenant environments. If you have 10 tenants each one saving 50% of their series, you could save a total of 50% by compacting them, but with current logic you won't trigger the compaction because each tenant will save only 5% of the total. Right?

The math is correct but I think it's all about the percentage you configure. Maybe in multi-tenant environments it makes sense a lower %. I think this is something we can refine over time, but I definitely want to keep a way to avoid compacting a TSDB when the overall impact is negligible.

I think that having it this way will require tons of tuning, because the setting doesn't relate to the final effect. I have an ingester with a number of in-memory series, and I want to make sure that we compact heads when I can save a given amount of memory. I shouldn't need to update the percentage based on the number of tenants I have in order to achieve that.

IMO it's counter-intuitive.

OTOH, what's the problem about compacting 10 smaller Heads vs compacting one bigger?

Discussed offline. I pushed a commit to include Oleg's suggestion, but slightly modified to also compact all TSDBs with an high number of inactive series (even if below the configured %).

There's no strong consensus between the two of us on what's the right way to decide which TSDBs to compact. The gist is that is difficult to have good arguments because we'll learn it the hard way (running it for some time).

…memory series are above a threshold Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>

…tedBeforeForcedCompaction Signed-off-by: Marco Pracucci <marco@pracucci.com>

….yaml Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>

…HeadUpUntilNowMinusActiveSeriesMetricsIdleTimeout from subtests to code blocks Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>

…-limit

…ke it testable Signed-off-by: Marco Pracucci <marco@pracucci.com>

…locksHonoringBlockRangePeriod Signed-off-by: Marco Pracucci <marco@pracucci.com>

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci requested review from pstibrany and codesome June 28, 2023 11:57

pracucci force-pushed the compact-tsdb-head-before-reaching-max-series-limit branch 3 times, most recently from 0184c4e to 953dce6 Compare June 28, 2023 12:13

pstibrany reviewed Jun 28, 2023

View reviewed changes

56quarters reviewed Jun 28, 2023

View reviewed changes

pkg/ingester/ingester_forced_compaction_test.go Outdated Show resolved Hide resolved

pracucci force-pushed the compact-tsdb-head-before-reaching-max-series-limit branch from 953dce6 to 692dc45 Compare June 29, 2023 08:51

pracucci mentioned this pull request Jun 29, 2023

Fix race condition in ChunkDiskMapper.Truncate() prometheus/prometheus#12500

Merged

pracucci force-pushed the compact-tsdb-head-before-reaching-max-series-limit branch from 3a56d65 to 0fea21d Compare June 30, 2023 03:48

pracucci marked this pull request as ready for review June 30, 2023 04:09

pracucci requested review from a team as code owners June 30, 2023 04:09

pstibrany approved these changes Jun 30, 2023

View reviewed changes

colega reviewed Jun 30, 2023

View reviewed changes

pkg/storage/tsdb/config.go Outdated Show resolved Hide resolved

osg-grafana approved these changes Jun 30, 2023

View reviewed changes

cmd/mimir/config-descriptor.json Outdated Show resolved Hide resolved

cmd/mimir/help-all.txt.tmpl Outdated Show resolved Hide resolved

pkg/storage/tsdb/config.go Outdated Show resolved Hide resolved

colega reviewed Jun 30, 2023

View reviewed changes

pkg/ingester/ingester.go Outdated Show resolved Hide resolved

colega reviewed Jun 30, 2023

View reviewed changes

pracucci added 9 commits June 30, 2023 13:33

Add experimental support to force compact TSDB head when ingester in-…

00ed67d

…memory series are above a threshold Signed-off-by: Marco Pracucci <marco@pracucci.com>

Improved TestIngester_compactBlocksToReduceInMemorySeries_Concurrency

423af86

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Renamed inFlightAppendsWithoutForcedCompaction to inFlightAppendsStar…

71479b6

…tedBeforeForcedCompaction Signed-off-by: Marco Pracucci <marco@pracucci.com>

Reverted changes in development/mimir-microservices-mode/config/mimir…

c8bd808

….yaml Signed-off-by: Marco Pracucci <marco@pracucci.com>

Added CHANGELOG entry

2909fbc

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Clarify CHANGELOG entry

7e916f4

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Wrap errTSDBForcedCompaction with timestamps for easier debugging

2fc4f53

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Rename 'forced' to 'early' and address few other comments

2476bfe

Signed-off-by: Marco Pracucci <marco@pracucci.com>

More renaming

87817d8

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci added 9 commits June 30, 2023 13:33

Change forcedCompactionMaxTime to int64

6070532

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Address the nit of nits

0879dbc

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Improve TestNextForcedHeadCompactionRange

7003683

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Change TestIngester_compactBlocksToReduceInMemorySeries_ShouldCompact…

2000198

…HeadUpUntilNowMinusActiveSeriesMetricsIdleTimeout from subtests to code blocks Signed-off-by: Marco Pracucci <marco@pracucci.com>

Fixed typo

fff205a

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Improve comment

964acd2

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Moved active series config to its own package

28dde7a

Signed-off-by: Marco Pracucci <marco@pracucci.com>

More renaming from 'forced' to 'early'

a891828

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Disable TSDB head compaction interval jitter in unit tests

6ad09d2

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci force-pushed the compact-tsdb-head-before-reaching-max-series-limit branch from 30db2d1 to 6ad09d2 Compare June 30, 2023 11:33

pracucci added 2 commits June 30, 2023 13:59

Change logic used to trigger early compaction

637d14e

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Previous commit is missing in GitHub

fe562d0

colega approved these changes Jun 30, 2023

View reviewed changes

pracucci added 3 commits June 30, 2023 14:52

Updated doc

7cb3336

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Merge branch 'main' into compact-tsdb-head-before-reaching-max-series…

76fa493

…-limit

Extracted logic to filterUsersToCompactToReduceInMemorySeries() to ma…

b03ada7

…ke it testable Signed-off-by: Marco Pracucci <marco@pracucci.com>

pstibrany approved these changes Jun 30, 2023

View reviewed changes

Fixed TestIngester_compactBlocksToReduceInMemorySeries_ShouldCompactB…

67ad9ac

…locksHonoringBlockRangePeriod Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci enabled auto-merge (squash) June 30, 2023 14:57

Simplify logic to make tests easier

2462974

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci merged commit d855dff into main Jun 30, 2023

pracucci deleted the compact-tsdb-head-before-reaching-max-series-limit branch June 30, 2023 15:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add experimental support to force compact TSDB head when ingester in-memory series are above a threshold #5371

Add experimental support to force compact TSDB head when ingester in-memory series are above a threshold #5371

pracucci commented Jun 28, 2023 •

edited

Loading

pstibrany left a comment

pstibrany Jun 28, 2023

pracucci Jun 28, 2023

pstibrany Jun 28, 2023

pracucci commented Jun 29, 2023

pracucci commented Jun 29, 2023

pracucci commented Jun 29, 2023

pstibrany left a comment

osg-grafana left a comment

colega Jun 30, 2023

pracucci Jun 30, 2023

colega Jun 30, 2023

pracucci Jun 30, 2023

colega Jun 30, 2023

pracucci Jun 30, 2023

	if float64(estimatedSeriesReduction)/float64(numInitialMemorySeries)*100 < float64(i.cfg.BlocksStorageConfig.TSDB.EarlyHeadCompactionMinEstimatedSeriesReductionPercentage) {
	if float64(estimatedSeriesReduction)/float64(db.Head().NumSeries())*100 < float64(i.cfg.BlocksStorageConfig.TSDB.EarlyHeadCompactionMinEstimatedSeriesReductionPercentage) {

Add experimental support to force compact TSDB head when ingester in-memory series are above a threshold #5371

Add experimental support to force compact TSDB head when ingester in-memory series are above a threshold #5371

Conversation

pracucci commented Jun 28, 2023 • edited Loading

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

pstibrany left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pracucci commented Jun 29, 2023

pracucci commented Jun 29, 2023

pracucci commented Jun 29, 2023

pstibrany left a comment

Choose a reason for hiding this comment

osg-grafana left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pracucci commented Jun 28, 2023 •

edited

Loading