-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for ingestion of out-of-order native histogram samples #542
base: main
Are you sure you want to change the base?
Conversation
86a2683
to
a560cb3
Compare
b02a0e9
to
fab5b45
Compare
3376ccd
to
9d66f27
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done a first pass to the ingest part. Tomorrow I'll do first pass to reads.
Overall looks great nice work @fionaliao and @carrieedwards
There is some code duplication, specially in the Commit() bits but I think its an issue with how the code was before, its hard to abstract it so we have less duplication at this point. It would be a big effort so we have to live with it.
Name: "prometheus_tsdb_head_out_of_order_samples_appended_total", | ||
Help: "Total number of appended out of order samples.", | ||
}), | ||
}, []string{"type"}), | ||
outOfBoundSamples: prometheus.NewCounterVec(prometheus.CounterOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we make outOfOrderSamplesAppended
a countervec i think outOfBoundSamples
should be a countervec too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outOfBoundSamples
is already a countervec. Previously it only tracked oob floats though, this PR starts tracking oob histograms as well
tsdb/head_append.go
Outdated
@@ -581,10 +629,16 @@ func (a *headAppender) AppendHistogram(ref storage.SeriesRef, lset labels.Labels | |||
return 0, storage.ErrNativeHistogramsDisabled | |||
} | |||
|
|||
if t < a.minValidTime { | |||
// For OOO inserts, this restriction is irrelevant and will be checked later once we confirm the histogram sample is an in-order append. | |||
// If OOO inserts are disabled, we may as well as check this as early as we can and avoid more work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this comment but I'd rephrase it to say something along the lines of "fail fast if ooo is disabled"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not surprised you like this comment, this is copied from one you added to Append() when doing the inital OOO implementation 😆
Updated comment in ccbf860
tsdb/head_append.go
Outdated
a.head.metrics.outOfBoundSamples.WithLabelValues(sampleMetricTypeHistogram).Inc() | ||
return 0, storage.ErrOutOfBounds | ||
} | ||
// If OOO is enabled, but OOO native histogram ingestion is disabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like above "also fail fast if ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated comment in ccbf860
if err := s.appendableHistogram(t, h); err != nil { | ||
s.Unlock() | ||
if errors.Is(err, storage.ErrOutOfOrderSample) { | ||
// TODO(codesome): If we definitely know at this point that the sample is ooo, then optimise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this todo still apply here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - it's not been optimised yet. Currently, OOO samples are always appended to the WAL.
tsdb/head_append.go
Outdated
handleChunkWriteError(err) | ||
return nil | ||
} | ||
chunkRefs := make([]chunks.ChunkDiskMapperRef, 0, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the list of chunk refs also be len(chks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, updated in ccbf860
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com> Add unit test for observing out of bound histograms Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> TestOOOWALWrite: make generic and add integer histogram case Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Make TestOOOWalWrite pass with integer native histograms First version of the code. WIP. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Fix lint error Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Do not use 0 reference to indicate no mmap reference Now we can use nil, since we're returning a slice. Fixes broken TestOOOCompactionWithDisabledWriteLog Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Fix handling case of recording zero chunk written to disk. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Implement float histograms into WBL at append This is the dumb copy paste implementation to get tests to pass. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Test_ChunkQuerier_OOOQuery: add integer and float histograms Test with integer and float histograms Counter only. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Have a single OOOChunk.ToEncodedChunk function Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Make simple query work OOOHeadChunk yields a single encoded chunk. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Add TODO Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Update tsdb/ooo_head.go Use common extractor function for testcases Probably replicating some code here, this can't be the first place for this. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Fix updating appender when opening new chunk due to counterreset Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Fix writing chunk times to disk when multiple chunks from ooo head Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Allow reading multiple chunks from the tsdb ooo head Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Fix linter Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Update docstring wording Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Follow up merge from main Adopt #12352 improvement. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
- Extend TestOOODisabled with histograms - Add histogram test cases for Test_Querier_OOOQuery - Add histogram test cases for TestOOOCompaction - Add histogram test cases for TestOOOCompactionWithNormalCompaction - Add histogram test cases for TestOOOCompactionWithDisabledWriteLog - Add histogram test cases for TestOOOCompactionFailure
…dIterable - Fix encoding - Handle chunk valueType in boundedIterator - Reduce code duplication - Updated boundedChunk test with native histogram testcases - Fix expected samples for boundedChunk test - Use AtT() in boundedChunk seek
- Add histogram test cases - Add TestOOOAppendWithNoSeries test - Extend TestOOOMmapReplay with histograms
Also added a test case. Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
Extend TestDiskFillingUpAfterDisablingOOO Extend TestPanicOnApplyConfig Extend TestNoGapAfterRestartWithOOO and TestWblReplayAfterOOODisableAndRestart Extend TestOutOfOrderRuntimeConfig Extend TestOOOMmapCorruption Extend TestWBLAndMmapReplay Extend TestOOOAppendAndQuery Extend TestOOOQueryAfterRestartWithSnapshotAndRemovedWBL
Add histogram test cases to OOO head insertion tests Fix linting Add native histogram testcases for OOO head reads
Fix OOO WAL write test Fix WBL replay test Fix test
* Add counter reset test for mmapped OOO chunks Fixed a couple of bugs that were discovered while writing the test: - mint for ooo chunk needed to be set when a new chunk was created - first chunk was using the wrong previous appender Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com> * nolint:staticcheck --------- Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
- Allow counter reset header to be compared - Add timestamp value check - Add timestamp to error message Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
Refactor sample comparison util and use in head_test.go Linting
* Add test for ooo histogram compaction Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
* Add engine tests for OOO native histograms with counter resets * Expand test cases and refactor * Add in-order and OOO native histogram tests for increase function * Add in-order and OOO native histogram tests for resets function * Clean up tests * Fix linting * Test refactor and adding comments * Test cleanup * Fix lint errors * Fix case where in-order samples were ingested instead --------- Co-authored-by: Fiona Liao <fiona.y.liao@gmail.com>
* Remove unused GetBytes() function This is unnecessary after the ChunkOrIterable() refactoring (as oooMergedChunk doesn't need to implement the chunk interface anymore). * Move err to last parameter of appendFunc To make it more consistent with Go conventions. * Move expOOOMmappedChunks defintion into test Only used by TestOOOHistogramCounterResetHeaders so moving it within that test. * Extract sample reading to a util function
2073b36
to
cb8776d
Compare
@@ -50,6 +50,25 @@ func New(t testutil.T) *TestStorage { | |||
return &TestStorage{DB: db, exemplarStorage: es, dir: dir} | |||
} | |||
|
|||
func NewTestStorageWithOpts(t testutil.T, opts *tsdb.Options) *TestStorage { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the function New
above could just use this if we didn't set OutOfOrderCapMax here?
This PR adds initial support for out-of-order ingestion of native histograms. It also includes unit tests for ingestion, compaction, and querying of OOO native histogram samples and chunks.
Relevant issue: #11220