Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeseries: reuse raw data used to unmarshal Timeseries when marshaling them again #5137

Merged
merged 20 commits into from
Jun 7, 2023

Conversation

pstibrany
Copy link
Member

@pstibrany pstibrany commented Jun 2, 2023

What this PR does

This PR adds an optimization to PreallocTimeseries: When PreallocTimeseries.Unmarshal(...) is called, we keep passed bytes buffer, and return it again for PreallocTimeseries.Marshal* methods.

This is useful in distributor, which receives write request with many Timeseries in it. Distributor then recombines timeseries into new write requests for ingesters, and marshals them again. Distributor usually (!) doesn't need to modify timeseries in any way, so it could reuse original marshalled version of the timeseries, instead of creating new one.

There are some cases when distributor does in fact modify the timeseries:

  • When using HA deduplication, distributor removes replica label
  • Distributor also removes empty labels, and makes sure that labels are sorted
  • Distributor can optionally drop some labels or run relabelling process

In these cases, we simply "clear" the marshalled version of timeseries, and let marshalling run again.

Is keeping the original raw unmarshalled buffer safe? Can the buffer be modified? Yes, but Timeseries already relies on that buffer being unchanged during the lifecycle of write request, because LabelAdapter type used inside Timeseries also creates its strings by using the same buffer.

In our test, doing this optimisation can save ~25% of CPU time when marshalling write request (when not using HA deduplication, label dropping or relabelling). How much it really helps depends on whether any of above-mentioned features are used.

Due to a risk that we have missed some place where this optimization is not applicable, it's possible to disable it for now by using -timeseries-unmarshal-caching-optimization-enabled=false.

Checklist

  • Tests updated
  • [na] Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@pstibrany pstibrany requested a review from a team as a code owner June 2, 2023 07:11
Copy link
Contributor

@56quarters 56quarters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes seem reasonable to me with a few comments, thanks! Are there any benchmark results we can share for this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this needs to be an integration test. The existing unit tests seem to make similar assertions without needing to start consul/minio/ingester/etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing unit tests make assertions on intermediate objects or messages, not on final bytes sent to ingester. I was looking at the test from different perspective — In a way I don’t care what happens in distributor, as long as ingester gets the correct data. That’s why I opted for integration test — it seemed to express that idea in a cleaner way.

If you feel strongly about this, I can try to write a unit test for this instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My primary reason for integration test was that when using unit test I could only mock ingester client to get not-yet marshaled WriteRequest. But I wanted to include marshaling in the test too.

pkg/distributor/distributor.go Outdated Show resolved Hide resolved
@flxbk
Copy link
Contributor

flxbk commented Jun 5, 2023

The CHANGELOG has just been cut to prepare for the next Mimir release. Please rebase main and eventually move the CHANGELOG entry added / updated in this PR to the top of the CHANGELOG document. Thanks!

@pstibrany
Copy link
Member Author

pstibrany commented Jun 6, 2023

Q: shall we keep this functionality behind a CLI option until it's proven to work fine?

A: Added CLI option -timeseries-unmarshal-caching-optimization-enabled.

@pracucci pracucci self-requested a review June 6, 2023 13:03
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty interesting optimization. The code changes are clean, but unfortunately it's tricky to find if any usage we make of the Timeseries is safe or not. I agree the integration test gives an higher confidence than unit testing, but I would love to see an integration test with high number of concurrent writes too.

pkg/mimirpb/timeseries.go Outdated Show resolved Hide resolved

func (p *PreallocTimeseries) MarshalTo(dAtA []byte) (int, error) {
if p.unmarshalData != nil {
return p.MarshalToSizedBuffer(dAtA)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MarshalToSizedBuffer() expects the input slice to be sized, while MarshalTo() doesn't. Here we call MarshalToSizedBuffer() assuming it's sized but it may be not. I think we should resize it here, before calling MarshalToSizedBuffer().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only called p.MarshalToSizedBuffer to reuse the same copying logic. I'll just copy that code instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkg/mimirpb/timeseries_test.go Outdated Show resolved Hide resolved
pkg/mimirpb/timeseries_test.go Outdated Show resolved Hide resolved
pkg/mimirpb/timeseries.go Show resolved Hide resolved
pkg/mimirpb/timeseries.go Outdated Show resolved Hide resolved
pkg/mimirpb/timeseries.go Outdated Show resolved Hide resolved
}

// clearUnmarshalData removes cached unmarshalled version of the message.
func (p *PreallocTimeseries) clearUnmarshalData() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Distributor.validateSeries() can manipulate ts.Exemplars. I think we need to clear it in that case too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have this case covered by the integration test too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Distributor.validateSeries() can manipulate ts.Exemplars. I think we need to clear it in that case too.

Good catch, thank you!

Copy link
Member Author

@pstibrany pstibrany Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added handling of exemplars and changed integration test to cover those cases.

@pstibrany pstibrany requested a review from a team as a code owner June 7, 2023 07:50
pstibrany and others added 19 commits June 7, 2023 09:53
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
…imeseries.

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
…thods.

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Co-authored-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>
@pstibrany pstibrany merged commit bbbe120 into main Jun 7, 2023
@pstibrany pstibrany deleted the cache-unmarshal-buffer branch June 7, 2023 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants