-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
M-map full chunks of Head from disk #6679
Conversation
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Forgot to open as a draft PR, changed the title to WIP. |
e4d052a
to
caca00a
Compare
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
42f798b
to
2746f72
Compare
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
I have removed the WIP tag now and it is ready for review. The windows tests are failing and I could not find a way yet to get rid of the failure where the files are being removed in I will run prombench for a few hours tomorrow on this to see the benefit and also stress test this PR. |
/prombench master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Amazing work, looking good in general. Benchmarks are solid as well 💪
Some comments, but mostly minor nits.
Also, maybe it's my own preference but this PR is too long to review properly ): I think we will have better quality reviews if we would try to ship smaller functionalities one by one. I got definitiely tired and dragged to other duties when I reached like 70% of this PR. But again, this my sidenote/suggestion, maybe for next featues. (:
Thanks!
tsdb/head.go
Outdated
if s.headChunk != nil { | ||
chunkRef, err := chunkReadWriter.WriteChunk(s.ref, s.headChunk.minTime, s.headChunk.maxTime, s.headChunk.chunk) | ||
if err != nil { | ||
panic(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
panic? Can we do better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to put a TODO here. We should be able to get rid of the panic, I will look into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a panic below too in the same function for the appender. And looking through the flow where this is called, hard fail seems to be the pattern in the ingest path if it fails. I will keep it as panic for now.
tsdb/chunks/chunks.go
Outdated
} | ||
if err = w.dirFile.Sync(); err != nil { | ||
return err | ||
if err = dirFile.Sync(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't currently have an fsync on any critical path, this could stall things for many many seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you suggest doing here?
@geekodour helped me run another benchmark on this with a little less churn than default prombench, you can find the links to the dashboard in this pr prometheus-community#50. This will give an idea of improvement for a saner Prometheus setup (do note that prombench uses 5s as scrape interval so the memory difference is visible early on in both prombench instances than what would be on a typical setup if I assume 15s scrape interval) |
Benchmark results are here
Some conclusions
Some observations (graphs are from high churn benchmark):
|
/prombench cancel |
Benchmark cancel is in progress. |
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks!
Generally looks good some style and readability comments only. (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, some style suggestions and responded on your questions.
I think it's slowly good to go (:
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
0e9712e
to
c8e059b
Compare
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
Windows test is finally fixed in eb2e6c1 |
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try it out 💪
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
🎉 |
Incredible work! 🎉🎊 |
tl-dr desc for the PR from @krasi-georgiev
When appending to the head and a chunk is full it is flushed to the disk and m-mapped (memory mapped) to free up memory
Prom startup now happens in these stages
If a head chunk is corrupted the currpted one and all chunks after that are deleted and the data after the corruption is recovered from the existing WAL which means that a corruption in m-mapped files results in NO data loss.
Mmaped chunks format - main difference is that the chunk for mmaping now also includes series reference because there is no index for mapping series to chunks.
The block chunks are accessed from the index which includes the offsets for the chunks in the chunks file - example - chunks of series ID have offsets 200, 500 etc in the chunk files.
In case of mmaped chunks, the offsets are stored in memory and accessed from that. During WAL replay, these offsets are restored by iterating all m-mapped chunks as stated above by matching the series id present in the chunk header and offset of that chunk in that file.
Prombench results
WAL Replay
1h Wal reply time
30% less wal reply time - 4m31 vs 3m36
2h Wal reply time
20% less wal reply time - 8m16 vs 7m
Memory During WAL Replay
High Churn
10-15% less RAM - 32gb vs 28gb
20% less RAM after compaction 34gb vs 27gb
No Churn
20-30% less RAM - 23gb vs 18gb
40% less RAM after compaction 32.5gb vs 20gb
Screenshots are in this comment
Prerequisite: #6830 (Merged)
Closes #6377. More info in the linked issue and the doc in that issue and the doc inside that doc inside that issue :)
head.go