-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use unsafe labels when reading the index header #3397
Use unsafe labels when reading the index header #3397
Conversation
When we're reading the index header, we're creating new labels.Label struct by allocating new strings all the time, but in most of the cases we just want to compare whether the label name has changed or not. For that purpose, we can yolo the underlying buffer, avoiding that allocation. There are two cases when we need to unyolo: - When we create the entry in the offsets map (that reference will be stored) - When we store the label value. Benchmark results are pretty promising: goos: linux goarch: amd64 pkg: github.com/grafana/mimir/pkg/storegateway/indexheader cpu: 11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz BenchmarkBinaryReader_LargerBlock/benchmark 1018 1178001 ns/op 157075 B/op 1323 allocs/op BenchmarkBinaryReader_LargerBlock/benchmark 1074 1215507 ns/op 157074 B/op 1323 allocs/op BenchmarkBinaryReader_LargerBlock/benchmark 1002 1233107 ns/op 157074 B/op 1323 allocs/op BenchmarkBinaryReader_LargerBlock/benchmark 1051 1124481 ns/op 157074 B/op 1323 allocs/op BenchmarkBinaryReader_LargerBlock/benchmark 1132 1111790 ns/op 157075 B/op 1323 allocs/op PASS ok github.com/grafana/mimir/pkg/storegateway/indexheader 65.679s Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
@@ -947,30 +950,67 @@ func (b realByteSlice) Sub(start, end int) index.ByteSlice { | |||
// found entry. If f returns an error it stops decoding and returns the received error. | |||
// TODO: use the upstreamed version from Prometheus tsdb/index package when available. | |||
// TODO: see https://github.com/prometheus/prometheus/pull/11535 | |||
func readPostingsOffsetTable(bs index.ByteSlice, off uint64, f func(labels.Label, uint64, int) error) error { | |||
func readOffsetTable[T any](bs index.ByteSlice, off uint64, read func(d *encoding.Decbuf) (T, error), f func(T, uint64, int) error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched back to the generic version of this which is the same as one I'm upstreaming in prometheus/prometheus#11535
This is a trickier one, but I think the results are worth it. |
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement! Just one stylistic concern.
|
||
// yoloPostingsOffsetTableReader reads the postings table returning a yoloed label, which can be used for comparison | ||
// without allocating a new string (since in most of the cases, we just want to see if the label name has changed). | ||
func yoloPostingsOffsetTableReader(d *encoding.Decbuf) (yoloLabel, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is all new code, can we use a different prefix than "yolo"? Maybe something like "mapped", "unsafe", "shared", etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we also rename yoloLabel
to unsafeLabel
and its name()
and value()
functions to safeName()
and safeValue()
to clarify they make a copy and the returned string is safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 0a6f787
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
TL;DR: don't try to be too clever, compiler already is. In a previous PR #3397 I changed the labels reading to use an unsafe string backed by the original buffer slice, and then copying those strings manually when needed to store them. It happens that those optimizations already exist in the compiler, and looking up `map[string([]byte{...})]` does not allocate as string. Actually, I was surprised because benchmarks look even better, with even less allocations than when using the unsafe string. name old time/op new time/op delta BinaryReader_LargerBlock/benchmark 1.13ms ± 2% 1.05ms ± 2% -7.02% (p=0.008 n=5+5) name old alloc/op new alloc/op delta BinaryReader_LargerBlock/benchmark 157kB ± 0% 127kB ± 0% -19.22% (p=0.000 n=5+4) name old allocs/op new allocs/op delta BinaryReader_LargerBlock/benchmark 1.32k ± 0% 0.69k ± 0% -47.54% (p=0.008 n=5+5) Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Note that this is being superseded by #3436 |
* Use []byte when reading postings offset table TL;DR: don't try to be too clever, compiler already is. In a previous PR #3397 I changed the labels reading to use an unsafe string backed by the original buffer slice, and then copying those strings manually when needed to store them. It happens that those optimizations already exist in the compiler, and looking up `map[string([]byte{...})]` does not allocate as string. Actually, I was surprised because benchmarks look even better, with even less allocations than when using the unsafe string. name old time/op new time/op delta BinaryReader_LargerBlock/benchmark 1.13ms ± 2% 1.05ms ± 2% -7.02% (p=0.008 n=5+5) name old alloc/op new alloc/op delta BinaryReader_LargerBlock/benchmark 157kB ± 0% 127kB ± 0% -19.22% (p=0.000 n=5+4) name old allocs/op new allocs/op delta BinaryReader_LargerBlock/benchmark 1.32k ± 0% 0.69k ± 0% -47.54% (p=0.008 n=5+5) Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Update CHANGELOG.md Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Use yoloed labels when reading the index header When we're reading the index header, we're creating new labels.Label struct by allocating new strings all the time, but in most of the cases we just want to compare whether the label name has changed or not. For that purpose, we can yolo the underlying buffer, avoiding that allocation. There are two cases when we need to unyolo: - When we create the entry in the offsets map (that reference will be stored) - When we store the label value. Benchmark results are pretty promising: goos: linux goarch: amd64 pkg: github.com/grafana/mimir/pkg/storegateway/indexheader cpu: 11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz BenchmarkBinaryReader_LargerBlock/benchmark 1018 1178001 ns/op 157075 B/op 1323 allocs/op BenchmarkBinaryReader_LargerBlock/benchmark 1074 1215507 ns/op 157074 B/op 1323 allocs/op BenchmarkBinaryReader_LargerBlock/benchmark 1002 1233107 ns/op 157074 B/op 1323 allocs/op BenchmarkBinaryReader_LargerBlock/benchmark 1051 1124481 ns/op 157074 B/op 1323 allocs/op BenchmarkBinaryReader_LargerBlock/benchmark 1132 1111790 ns/op 157075 B/op 1323 allocs/op PASS ok github.com/grafana/mimir/pkg/storegateway/indexheader 65.679s Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Update CHANGELOG.md Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * make license Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Fix dumb bug Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * s/yolo/unsafe/ Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Use []byte when reading postings offset table TL;DR: don't try to be too clever, compiler already is. In a previous PR grafana#3397 I changed the labels reading to use an unsafe string backed by the original buffer slice, and then copying those strings manually when needed to store them. It happens that those optimizations already exist in the compiler, and looking up `map[string([]byte{...})]` does not allocate as string. Actually, I was surprised because benchmarks look even better, with even less allocations than when using the unsafe string. name old time/op new time/op delta BinaryReader_LargerBlock/benchmark 1.13ms ± 2% 1.05ms ± 2% -7.02% (p=0.008 n=5+5) name old alloc/op new alloc/op delta BinaryReader_LargerBlock/benchmark 157kB ± 0% 127kB ± 0% -19.22% (p=0.000 n=5+4) name old allocs/op new allocs/op delta BinaryReader_LargerBlock/benchmark 1.32k ± 0% 0.69k ± 0% -47.54% (p=0.008 n=5+5) Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Update CHANGELOG.md Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
What this PR does
When we're reading the index header, we're creating new labels.Label struct by allocating new strings all the time, but in most of the cases we just want to compare whether the label name has changed or not.
For that purpose, we can yolo the underlying buffer, avoiding that allocation.
There are two cases when we need to unyolo:
Benchmark results are quite promising:
Which issue(s) this PR fixes or relates to
Follow up on #3393
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]