Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store-gateway: reduce overhead in fetching postings from cache #4978

Merged
merged 20 commits into from
May 12, 2023

Conversation

dimitarvdimitrov
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov commented May 11, 2023

This PR significantly reduces additional data structures when fetching postings from the cache. These can have comparable memory footprint to the actual posting lists when the lists themselves are small (lists in the index cache are diff-varint-encoded and snappy compressed).

Context

A simplified version of the current interface for fetching postings from the cache is below

FetchMultiPostings([]labels.Label) map[labels.Label][]byte

The existing RemoteIndexCache.FetchMultiPostings creates some auxiliary data structures in order to do its job:

  • a slice of string keys that it passes to the dskit cache implementation
  • a map from an input label.Label to a string key
  • a map from an input label.Label to a []byte that it got from the cache; this map is built with the help of the map above and the map of results that the cache returns

What this PR does

Result Iterator

This PR removes the two maps and changes the interface of the cache. The new interface returns an iterator where the order of elements is the same as the order of requested labels.Label keys.

FetchMultiPostings([]labels.Label) Iterator[[]byte]

By doing this we can avoid having to build maps, which is faster and needs less memory. The change is non-invasive because the consumer of FetchMultiPostings already iterates the result in the same order as requested keys.

Cache keys

This PR does another small, but impactful change: it avoids hashing each labels.Label key if the length of the key is shorter than the hash. Instead of hashing it base64-encodes the label itself.

Benchmarks

Details
goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/storegateway/indexcache
                                                    │ iterate-map-in-benchmark-307c9928f.txt │ Adjust-benchmark-for-changed-interface-7edc58e56.txt │
                                                    │                 sec/op                 │            sec/op              vs base               │
RemoteIndexCache_FetchMultiPostings/short_labels-10                             10.358m ± 3%                     2.659m ± 1%  -74.32% (p=0.002 n=6)
RemoteIndexCache_FetchMultiPostings/long_labels-10                               23.49m ± 7%                     19.04m ± 6%  -18.94% (p=0.002 n=6)
geomean                                                                          15.60m                          7.116m       -54.38%

                                                    │ iterate-map-in-benchmark-307c9928f.txt │ Adjust-benchmark-for-changed-interface-7edc58e56.txt │
                                                    │                  B/op                  │             B/op               vs base               │
RemoteIndexCache_FetchMultiPostings/short_labels-10                             6.286Mi ± 0%                    3.183Mi ± 0%  -49.36% (p=0.002 n=6)
RemoteIndexCache_FetchMultiPostings/long_labels-10                              6.285Mi ± 0%                    3.339Mi ± 0%  -46.88% (p=0.002 n=6)
geomean                                                                         6.286Mi                         3.260Mi       -48.13%

                                                    │ iterate-map-in-benchmark-307c9928f.txt │ Adjust-benchmark-for-changed-interface-7edc58e56.txt │
                                                    │               allocs/op                │           allocs/op             vs base              │
RemoteIndexCache_FetchMultiPostings/short_labels-10                              20.27k ± 0%                      20.25k ± 0%  -0.09% (p=0.002 n=6)
RemoteIndexCache_FetchMultiPostings/long_labels-10                               20.26k ± 0%                      20.26k ± 0%  -0.03% (p=0.002 n=6)
geomean                                                                          20.27k                           20.25k       -0.06%

Checklist

  • Tests updated
  • [n/a] Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
@pracucci pracucci self-requested a review May 11, 2023 09:28
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job, LGTM!

pkg/storegateway/indexcache/remote.go Outdated Show resolved Hide resolved
Co-authored-by: Marco Pracucci <marco@pracucci.com>
@dimitarvdimitrov dimitarvdimitrov merged commit 6c64aed into main May 12, 2023
@dimitarvdimitrov dimitarvdimitrov deleted the dimitar/st-gw/postigns-fetching-reduce-optimize branch May 12, 2023 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants