Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support out of order ingestion for native histograms #11220

Closed
beorn7 opened this issue Aug 25, 2022 · 9 comments
Closed

Support out of order ingestion for native histograms #11220

beorn7 opened this issue Aug 25, 2022 · 9 comments

Comments

@beorn7
Copy link
Member

beorn7 commented Aug 25, 2022

This feature will be released soon for normal float samples, and it would be really weird if we did not support it for histogram samples.

Update 14th June 2023: @krajorama and @jesusvazquez are starting to work on this issue.

@krajorama
Copy link
Member

Overall task list from discussion:
Scope: everything is in the tsdb package: db.go, head_append.go, ooo_*.

Write path: start from headAppender.Append function
Need ooo_head.go OOOChunk.InsertHistogram, OOOChunk.ToHistogramChunk, modify writing ooo head to disk to account for different encodings.
Need head.go insertHistogram - keep it simple for now, no handling of counter reset, just use the 32 sample OOOchunks.

@jesusvazquez notes:
ooo_head.go

  • Creeate new InsertHistogram() method
  • Create new ToHistogramFloatHistogram() ToHistogram()

head_append.go

  • Modify AppendHistogram() with ooo logic
  • Modify appendableHistogram() with ooo logic
  • Create insertHistogram()
  • Modify mmapCurrentOOOHeadChunk() with new chunk encoding logic. The method currently works with ooo chunks as if they always were XOR chunks.

@beorn7
Copy link
Member Author

beorn7 commented Jul 12, 2023

Also assigning to @zenador and @krajorama to mark all the collaborators on this issue.

@zenador
Copy link
Contributor

zenador commented Jul 12, 2023

Testing to see if this grants permission to assign

@krajorama
Copy link
Member

From IRL discussion: we've decided that we do not need to support the of mixing different sample types (float/integer histogram/float histogram) in the OOO headchunk. This should make testing a bit simpler.

@krajorama
Copy link
Member

From IRL discussion:
@carrieedwards @fionaliao @jesusvazquez (cc @zenador )

Immediate goals in priority order for the code:

  • add native histograms to existing unit test happy paths
  • add tests around counter reset/multiple chunk handling
  • tests for float <-> native histogram series transitions

@krajorama
Copy link
Member

krajorama commented Aug 23, 2023

Counter reset between chunks test case discussion:

It is theoretically possible to write such sequence of histogram samples:

  • append and commit a recent histogram sample to initialize the normal in order head/WAL
  • append and commit opts.OutOfOrderCapMax number of out of order histogram samples without counter reset
  • append and commit 2 x opts.OutOfOrderCapMax more histograms that are out of order, don't have counter reset between them and later then the previous out of order samples AND the first sample in this batch has lower (bucket) counters than the last sample in the previous batch

When we query the samples back, something should detect that now the samples have a counter reset - I'm not sure if this should happen on chunk / chunk iterator level or simply somewhere in the PromQL engine (cc @beorn7 )

@beorn7
Copy link
Member Author

beorn7 commented Aug 23, 2023

The PromQL engine simply looks at the CounterResetHint as returned from the query to TSDB. If you want the PromQL engine to do the (expensive) traditional counter reset detection, just set the CounterResetHint in the returned histogram to UnknownCounterReset.

@beorn7
Copy link
Member Author

beorn7 commented Aug 23, 2023

Generally, it would be cool if we never end up with (counter) histogram chunks that have counter resets in the middle. (We have ruled that out so far, assuming that a histogram will change its bucket layout a lot after a reset so that handling that all in the same chunk is more expensive than cutting a new chunk. I'm not completely sure if it is even technically possible to encode a chunk with a counter reset in the middle. We definitely have created a lot of code around the invariant that there is never a counter reset within the same chunk. So keeping it that way would definitely be much much preferred.)

But if all of the above is just about returning histogram samples from a state where things haven't been consolidated into one chunk, and they come from overlapping chunks or buffers, you could indeed simply return a CounterResetHint as UnknownCounterReset whenever you want PromQL to detect the counter reset itself.

@beorn7
Copy link
Member Author

beorn7 commented Sep 24, 2024

Closing this as the code is merged, but note that this is a huge undertaking and it is quite likely that we will find bugs in the implementation. But those should be tracked in specific issues.

Thanks everyone for all the work here.

@beorn7 beorn7 closed this as completed Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

7 participants