M-map full chunks of Head from disk #6679

codesome · 2020-01-22T14:37:18Z

tl-dr desc for the PR from @krasi-georgiev

When appending to the head and a chunk is full it is flushed to the disk and m-mapped (memory mapped) to free up memory

Prom startup now happens in these stages

Iterate the m-maped chunks from disk and keep a map of series reference to its slice of mmapped chunks.
Iterate the WAL as usual. Whenever we create a new series, look for it's mmapped chunks in the map created before and add it to that series.

If a head chunk is corrupted the currpted one and all chunks after that are deleted and the data after the corruption is recovered from the existing WAL which means that a corruption in m-mapped files results in NO data loss.

Mmaped chunks format - main difference is that the chunk for mmaping now also includes series reference because there is no index for mapping series to chunks.
The block chunks are accessed from the index which includes the offsets for the chunks in the chunks file - example - chunks of series ID have offsets 200, 500 etc in the chunk files.
In case of mmaped chunks, the offsets are stored in memory and accessed from that. During WAL replay, these offsets are restored by iterating all m-mapped chunks as stated above by matching the series id present in the chunk header and offset of that chunk in that file.

Prombench results

WAL Replay

1h Wal reply time
30% less wal reply time - 4m31 vs 3m36
2h Wal reply time
20% less wal reply time - 8m16 vs 7m

Memory During WAL Replay

High Churn
10-15% less RAM - 32gb vs 28gb
20% less RAM after compaction 34gb vs 27gb
No Churn
20-30% less RAM - 23gb vs 18gb
40% less RAM after compaction 32.5gb vs 20gb

Screenshots are in this comment

Prerequisite: #6830 (Merged)

Closes #6377. More info in the linked issue and the doc in that issue and the doc inside that doc inside that issue :)

Add tests
Explore possible ways to get rid of new globals added in head.go
Wait for Live m-mapping of chunks on disk #6830 to be merged
Fix windows tests

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

codesome · 2020-01-22T14:38:20Z

Forgot to open as a draft PR, changed the title to WIP.

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

codesome · 2020-01-27T14:43:00Z

I have removed the WIP tag now and it is ready for review. The windows tests are failing and I could not find a way yet to get rid of the failure where the files are being removed in defer in the unit tests.

I will run prombench for a few hours tomorrow on this to see the benefit and also stress test this PR.

codesome · 2020-01-28T09:41:10Z

/prombench master

prombot · 2020-01-28T09:41:13Z

⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️

Compared versions: PR-6679 and master

After successful deployment, the benchmarking metrics can be viewed at:

Other Commands:
To stop benchmark: /prombench cancel
To restart benchmark: /prombench restart v2.12.0

bwplotka

Nice! Amazing work, looking good in general. Benchmarks are solid as well 💪

Some comments, but mostly minor nits.

Also, maybe it's my own preference but this PR is too long to review properly ): I think we will have better quality reviews if we would try to ship smaller functionalities one by one. I got definitiely tired and dragged to other duties when I reached like 70% of this PR. But again, this my sidenote/suggestion, maybe for next featues. (:

Thanks!

tsdb/chunks/chunks.go

tsdb/chunks/head_chunks.go

tsdb/head.go

bwplotka · 2020-01-28T11:39:51Z

tsdb/head.go

+	if s.headChunk != nil {
+		chunkRef, err := chunkReadWriter.WriteChunk(s.ref, s.headChunk.minTime, s.headChunk.maxTime, s.headChunk.chunk)
+		if err != nil {
+			panic(err)


panic? Can we do better?

I forgot to put a TODO here. We should be able to get rid of the panic, I will look into it.

There is a panic below too in the same function for the appender. And looking through the flow where this is called, hard fail seems to be the pattern in the ingest path if it fails. I will keep it as panic for now.

brian-brazil · 2020-01-28T14:12:03Z

tsdb/chunks/chunks.go

 	}
-	if err = w.dirFile.Sync(); err != nil {
-		return err
+	if err = dirFile.Sync(); err != nil {


We don't currently have an fsync on any critical path, this could stall things for many many seconds.

What would you suggest doing here?

tsdb/chunks/head_chunks.go

tsdb/head.go

codesome · 2020-01-28T16:37:15Z

@geekodour helped me run another benchmark on this with a little less churn than default prombench, you can find the links to the dashboard in this pr prometheus-community#50. This will give an idea of improvement for a saner Prometheus setup (do note that prombench uses 5s as scrape interval so the memory difference is visible early on in both prombench instances than what would be on a typical setup if I assume 15s scrape interval)

codesome · 2020-01-29T05:47:24Z

Benchmark results are here

With a high churn (which was run on this PR), with series ranging between 4.3M to 9.8M because of churn, the RSS savings was ranging between 10-22%.
With a slightly lower churn (which was run on another PR but with the same code), with series ranging between 2.2M to 3.2M because of churn, the RSS savings was ranging between 30-45%.

Some conclusions

Memory reduction is as expected. With higher churn, as there are more incomplete chunks, the gain shows slightly lower as incomplete chunks stay in the head (incomplete=didn't touch 120 samples). With a lower churn, the gain was higher. On a Prometheus setup with very little churn, the gain would be even higher.

Some observations (graphs are from high churn benchmark):

The number of Head chunks look weird for this PR, it slightly grew compared to master at a constant rate. When it came down suddenly, there was a spike in the number of active series for a small while before it became normal (it shot up to 27.7M then came back to the normal 5.6M). This needs to be investigated. (Update: This is fixed)

codesome · 2020-01-29T12:24:32Z

/prombench cancel

prombot · 2020-01-29T12:24:34Z

Benchmark cancel is in progress.

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

tsdb/head.go

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

krasi-georgiev · 2020-05-05T10:28:00Z

lgtm

bwplotka

Nice, thanks!

Generally looks good some style and readability comments only. (:

tsdb/block_test.go

tsdb/db_test.go

tsdb/head.go

bwplotka

Cool, some style suggestions and responded on your questions.

I think it's slowly good to go (:

tsdb/head.go

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

codesome · 2020-05-06T11:10:32Z

Windows test is finally fixed in eb2e6c1

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

bwplotka

Let's try it out 💪

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

bwplotka · 2020-05-06T16:14:33Z

🎉

brancz · 2020-05-06T16:33:37Z

Incredible work! 🎉🎊

codesome added 5 commits December 31, 2019 20:28

HeadReadWriter for writing and mmapping Head chunks

e5e4ca6

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

Mmap chunks in Head

e9e3ec9

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

Use pool for chunks

c812d79

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

Replay the chunks on disk

46a5ef1

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

Merge remote-tracking branch 'upstream/master' into mmap-chunks

527c0d3

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

codesome changed the title ~~M-map full chunks of Head from disk~~ [WIP] M-map full chunks of Head from disk Jan 22, 2020

codesome force-pushed the mmap-chunks branch 4 times, most recently from e4d052a to caca00a Compare January 23, 2020 11:42

Make HeadReadWriter local to Head and fix tests

e8225bf

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

codesome force-pushed the mmap-chunks branch 3 times, most recently from 42f798b to 2746f72 Compare January 27, 2020 12:32

Cleanup, fixes and unit tests

5766f3e

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

codesome force-pushed the mmap-chunks branch from 2746f72 to 5766f3e Compare January 27, 2020 14:39

codesome changed the title ~~[WIP] M-map full chunks of Head from disk~~ M-map full chunks of Head from disk Jan 27, 2020

bwplotka self-requested a review January 27, 2020 17:25

prombot added the prombench label Jan 28, 2020

krasi-georgiev self-assigned this Jan 28, 2020

bwplotka reviewed Jan 28, 2020

View reviewed changes

brian-brazil reviewed Jan 28, 2020

View reviewed changes

Fix Head mint maxt updates on setting mmap chunks. Fix metrics.

cbb3508

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

codesome force-pushed the mmap-chunks branch from e35484e to cbb3508 Compare May 4, 2020 16:05

krasi-georgiev reviewed May 5, 2020

View reviewed changes

tsdb/head.go Outdated Show resolved Hide resolved

tsdb/head.go Outdated Show resolved Hide resolved

tsdb/head.go Outdated Show resolved Hide resolved

codesome added 2 commits May 5, 2020 15:00

Include mmapped chunks in TestHead_WALMultiRef

e9417b0

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

Rename Repair to DeleteCorrupted

2e2068e

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

krasi-georgiev approved these changes May 5, 2020

View reviewed changes

bwplotka reviewed May 5, 2020

View reviewed changes

codesome force-pushed the mmap-chunks branch from 2af6e87 to f733282 Compare May 5, 2020 15:47

bwplotka approved these changes May 5, 2020

View reviewed changes

tsdb/head.go Outdated Show resolved Hide resolved

tsdb/head.go Outdated Show resolved Hide resolved

Fix review comments

9f2974d

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

codesome force-pushed the mmap-chunks branch 3 times, most recently from 0e9712e to c8e059b Compare May 6, 2020 10:31

Fix FlushWAL test in windows

eb2e6c1

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

codesome force-pushed the mmap-chunks branch from c8e059b to eb2e6c1 Compare May 6, 2020 10:44

Merge remote-tracking branch 'upstream/master' into mmap-chunks

971f450

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

codesome force-pushed the mmap-chunks branch from 6584045 to 971f450 Compare May 6, 2020 12:00

bwplotka approved these changes May 6, 2020

View reviewed changes

Add a TODO for the panic in getting mmapped chunks

6ec3f23

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

codesome merged commit d4b9fe8 into prometheus:master May 6, 2020

pracucci mentioned this pull request May 7, 2020

Track number of discarded samples while committing in TSDB #7213

Closed

codesome mentioned this pull request May 28, 2020

M-mapping of Head chunks taking lots of disk space on first start #7305

Closed

clambin mentioned this pull request Jun 30, 2020

Compaction running out of memory #7483

Closed

lentil1016 mentioned this pull request Aug 26, 2020

fix: return a corruption error when chunk head out of sequence #7855

Merged

Harkishen-Singh mentioned this pull request Apr 26, 2021

fix crash when out of sequence wal segments #6129

Closed

grafanabot mentioned this pull request Aug 10, 2021

Blocks storage unable to ingest samples older than 1h after an outage grafana/mimir#116

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

M-map full chunks of Head from disk #6679

M-map full chunks of Head from disk #6679

codesome commented Jan 22, 2020 •

edited by krasi-georgiev

Loading

codesome commented Jan 22, 2020

codesome commented Jan 27, 2020

codesome commented Jan 28, 2020

prombot commented Jan 28, 2020

bwplotka left a comment

bwplotka Jan 28, 2020

codesome Jan 29, 2020

codesome Jan 30, 2020

brian-brazil Jan 28, 2020

codesome Jan 30, 2020

codesome commented Jan 28, 2020

codesome commented Jan 29, 2020 •

edited

Loading

codesome commented Jan 29, 2020

prombot commented Jan 29, 2020

krasi-georgiev commented May 5, 2020

bwplotka left a comment

bwplotka left a comment

codesome commented May 6, 2020

bwplotka left a comment

bwplotka commented May 6, 2020

brancz commented May 6, 2020

M-map full chunks of Head from disk #6679

M-map full chunks of Head from disk #6679

Conversation

codesome commented Jan 22, 2020 • edited by krasi-georgiev Loading

codesome commented Jan 22, 2020

codesome commented Jan 27, 2020

codesome commented Jan 28, 2020

prombot commented Jan 28, 2020

bwplotka left a comment

Choose a reason for hiding this comment

bwplotka Jan 28, 2020

Choose a reason for hiding this comment

codesome Jan 29, 2020

Choose a reason for hiding this comment

codesome Jan 30, 2020

Choose a reason for hiding this comment

brian-brazil Jan 28, 2020

Choose a reason for hiding this comment

codesome Jan 30, 2020

Choose a reason for hiding this comment

codesome commented Jan 28, 2020

codesome commented Jan 29, 2020 • edited Loading

codesome commented Jan 29, 2020

prombot commented Jan 29, 2020

krasi-georgiev commented May 5, 2020

bwplotka left a comment

Choose a reason for hiding this comment

bwplotka left a comment

Choose a reason for hiding this comment

codesome commented May 6, 2020

bwplotka left a comment

Choose a reason for hiding this comment

bwplotka commented May 6, 2020

brancz commented May 6, 2020

codesome commented Jan 22, 2020 •

edited by krasi-georgiev

Loading

codesome commented Jan 29, 2020 •

edited

Loading