Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ingestion of out-of-order native histogram samples #542

Draft
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

carrieedwards
Copy link
Contributor

This PR adds initial support for out-of-order ingestion of native histograms. It also includes unit tests for ingestion, compaction, and querying of OOO native histogram samples and chunks.

Relevant issue: #11220

@fionaliao fionaliao force-pushed the cedwards/ooo-native-histograms branch from 86a2683 to a560cb3 Compare November 29, 2023 19:21
@carrieedwards carrieedwards force-pushed the cedwards/ooo-native-histograms branch 4 times, most recently from b02a0e9 to fab5b45 Compare January 9, 2024 20:35
@fionaliao fionaliao force-pushed the cedwards/ooo-native-histograms branch 2 times, most recently from 3376ccd to 9d66f27 Compare January 23, 2024 18:11
tsdb/head_read.go Outdated Show resolved Hide resolved
Copy link
Member

@jesusvazquez jesusvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done a first pass to the ingest part. Tomorrow I'll do first pass to reads.

Overall looks great nice work @fionaliao and @carrieedwards

There is some code duplication, specially in the Commit() bits but I think its an issue with how the code was before, its hard to abstract it so we have less duplication at this point. It would be a big effort so we have to live with it.

Name: "prometheus_tsdb_head_out_of_order_samples_appended_total",
Help: "Total number of appended out of order samples.",
}),
}, []string{"type"}),
outOfBoundSamples: prometheus.NewCounterVec(prometheus.CounterOpts{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we make outOfOrderSamplesAppended a countervec i think outOfBoundSamples should be a countervec too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outOfBoundSamples is already a countervec. Previously it only tracked oob floats though, this PR starts tracking oob histograms as well

@@ -581,10 +629,16 @@ func (a *headAppender) AppendHistogram(ref storage.SeriesRef, lset labels.Labels
return 0, storage.ErrNativeHistogramsDisabled
}

if t < a.minValidTime {
// For OOO inserts, this restriction is irrelevant and will be checked later once we confirm the histogram sample is an in-order append.
// If OOO inserts are disabled, we may as well as check this as early as we can and avoid more work.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this comment but I'd rephrase it to say something along the lines of "fail fast if ooo is disabled"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not surprised you like this comment, this is copied from one you added to Append() when doing the inital OOO implementation 😆

Updated comment in ccbf860

a.head.metrics.outOfBoundSamples.WithLabelValues(sampleMetricTypeHistogram).Inc()
return 0, storage.ErrOutOfBounds
}
// If OOO is enabled, but OOO native histogram ingestion is disabled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like above "also fail fast if ..."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated comment in ccbf860

if err := s.appendableHistogram(t, h); err != nil {
s.Unlock()
if errors.Is(err, storage.ErrOutOfOrderSample) {
// TODO(codesome): If we definitely know at this point that the sample is ooo, then optimise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this todo still apply here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - it's not been optimised yet. Currently, OOO samples are always appended to the WAL.

handleChunkWriteError(err)
return nil
}
chunkRefs := make([]chunks.ChunkDiskMapperRef, 0, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the list of chunk refs also be len(chks)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, updated in ccbf860

zenador and others added 21 commits February 5, 2024 15:55
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>

Add unit test for observing out of bound histograms

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

TestOOOWALWrite: make generic and add integer histogram case

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Make TestOOOWalWrite pass with integer native histograms

First version of the code. WIP.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Fix lint error

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Do not use 0 reference to indicate no mmap reference

Now we can use nil, since we're returning a slice.
Fixes broken TestOOOCompactionWithDisabledWriteLog

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Fix handling case of recording zero chunk written to disk.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Implement float histograms into WBL at append

This is the dumb copy paste implementation to get tests to pass.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Test_ChunkQuerier_OOOQuery: add integer and float histograms

Test with integer and float histograms
Counter only.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Have a single OOOChunk.ToEncodedChunk function

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Make simple query work

OOOHeadChunk yields a single encoded chunk.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Add TODO

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Update tsdb/ooo_head.go

Use common extractor function for testcases

Probably replicating some code here, this can't be the first place for this.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Fix updating appender when opening new chunk due to counterreset

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Fix writing chunk times to disk when multiple chunks from ooo head

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Allow reading multiple chunks from the tsdb ooo head

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Fix linter

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Update docstring wording

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

Follow up merge from main

Adopt #12352 improvement.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
- Extend TestOOODisabled with histograms
- Add histogram test cases for Test_Querier_OOOQuery
- Add histogram test cases for TestOOOCompaction
- Add histogram test cases for TestOOOCompactionWithNormalCompaction
- Add histogram test cases for TestOOOCompactionWithDisabledWriteLog
- Add histogram test cases for TestOOOCompactionFailure
…dIterable

- Fix encoding
- Handle chunk valueType in boundedIterator
- Reduce code duplication
- Updated boundedChunk test with native histogram testcases
- Fix expected samples for boundedChunk test
- Use AtT() in boundedChunk seek
- Add histogram test cases
- Add TestOOOAppendWithNoSeries test
- Extend TestOOOMmapReplay with histograms
Also added a test case.

Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
Extend TestDiskFillingUpAfterDisablingOOO

Extend TestPanicOnApplyConfig

Extend TestNoGapAfterRestartWithOOO and TestWblReplayAfterOOODisableAndRestart

Extend TestOutOfOrderRuntimeConfig

Extend TestOOOMmapCorruption

Extend TestWBLAndMmapReplay

Extend TestOOOAppendAndQuery

Extend TestOOOQueryAfterRestartWithSnapshotAndRemovedWBL
Add histogram test cases to OOO head insertion tests

Fix linting

Add native histogram testcases for OOO head reads
Fix OOO WAL write test

Fix WBL replay test

Fix test
* Add counter reset test for mmapped OOO chunks

Fixed a couple of bugs that were discovered while writing the test:
- mint for ooo chunk needed to be set when a new chunk was created
- first chunk was using the wrong previous appender

Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>

* nolint:staticcheck

---------

Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
- Allow counter reset header to be compared
- Add timestamp value check
- Add timestamp to error message

Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
Refactor sample comparison util and use in head_test.go

Linting
* Add test for ooo histogram compaction

Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
* Add engine tests for OOO native histograms with counter resets

* Expand test cases and refactor

* Add in-order and OOO native histogram tests for increase function

* Add in-order and OOO native histogram tests for resets function

* Clean up tests

* Fix linting

* Test refactor and adding comments

* Test cleanup

* Fix lint errors

* Fix case where in-order samples were ingested instead

---------

Co-authored-by: Fiona Liao <fiona.y.liao@gmail.com>
carrieedwards and others added 6 commits February 5, 2024 15:55
* Remove unused GetBytes() function

This is unnecessary after the ChunkOrIterable() refactoring (as oooMergedChunk doesn't need to implement the chunk interface anymore).

* Move err to last parameter of appendFunc

To make it more consistent with Go conventions.

* Move expOOOMmappedChunks defintion into test

Only used by TestOOOHistogramCounterResetHeaders so moving it within that test.

* Extract sample reading to a util function
@fionaliao fionaliao force-pushed the cedwards/ooo-native-histograms branch from 2073b36 to cb8776d Compare February 5, 2024 16:09
@@ -50,6 +50,25 @@ func New(t testutil.T) *TestStorage {
return &TestStorage{DB: db, exemplarStorage: es, dir: dir}
}

func NewTestStorageWithOpts(t testutil.T, opts *tsdb.Options) *TestStorage {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the function New above could just use this if we didn't set OutOfOrderCapMax here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants