Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use unsafe labels when reading the index header #3397

Merged
merged 5 commits into from
Nov 10, 2022

Conversation

colega
Copy link
Contributor

@colega colega commented Nov 7, 2022

What this PR does

When we're reading the index header, we're creating new labels.Label struct by allocating new strings all the time, but in most of the cases we just want to compare whether the label name has changed or not.

For that purpose, we can yolo the underlying buffer, avoiding that allocation.

There are two cases when we need to unyolo:

  • When we create the entry in the offsets map (that reference will be stored)
  • When we store the label value.

Benchmark results are quite promising:

name                                old time/op    new time/op    delta
BinaryReader_LargerBlock/benchmark    1.95ms ± 9%    1.17ms ± 5%  -39.99%  (p=0.008 n=5+5)

name                                old alloc/op   new alloc/op   delta
BinaryReader_LargerBlock/benchmark    1.06MB ± 0%    0.16MB ± 0%  -85.16%  (p=0.000 n=4+5)

name                                old allocs/op  new allocs/op  delta
BinaryReader_LargerBlock/benchmark     20.1k ± 0%      1.3k ± 0%  -93.42%  (p=0.008 n=5+5)

Which issue(s) this PR fixes or relates to

Follow up on #3393

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

When we're reading the index header, we're creating new labels.Label
struct by allocating new strings all the time, but in most of the cases
we just want to compare whether the label name has changed or not.

For that purpose, we can yolo the underlying buffer, avoiding that
allocation.

There are two cases when we need to unyolo:
- When we create the entry in the offsets map (that reference will be
stored)
- When we store the label value.

Benchmark results are pretty promising:

goos: linux
goarch: amd64
pkg: github.com/grafana/mimir/pkg/storegateway/indexheader
cpu: 11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz
BenchmarkBinaryReader_LargerBlock/benchmark         	    1018	   1178001 ns/op	  157075 B/op	    1323 allocs/op
BenchmarkBinaryReader_LargerBlock/benchmark         	    1074	   1215507 ns/op	  157074 B/op	    1323 allocs/op
BenchmarkBinaryReader_LargerBlock/benchmark         	    1002	   1233107 ns/op	  157074 B/op	    1323 allocs/op
BenchmarkBinaryReader_LargerBlock/benchmark         	    1051	   1124481 ns/op	  157074 B/op	    1323 allocs/op
BenchmarkBinaryReader_LargerBlock/benchmark         	    1132	   1111790 ns/op	  157075 B/op	    1323 allocs/op
PASS
ok  	github.com/grafana/mimir/pkg/storegateway/indexheader	65.679s

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
@colega colega requested a review from a team as a code owner November 7, 2022 15:28
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
@@ -947,30 +950,67 @@ func (b realByteSlice) Sub(start, end int) index.ByteSlice {
// found entry. If f returns an error it stops decoding and returns the received error.
// TODO: use the upstreamed version from Prometheus tsdb/index package when available.
// TODO: see https://github.com/prometheus/prometheus/pull/11535
func readPostingsOffsetTable(bs index.ByteSlice, off uint64, f func(labels.Label, uint64, int) error) error {
func readOffsetTable[T any](bs index.ByteSlice, off uint64, read func(d *encoding.Decbuf) (T, error), f func(T, uint64, int) error) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched back to the generic version of this which is the same as one I'm upstreaming in prometheus/prometheus#11535

@colega
Copy link
Contributor Author

colega commented Nov 7, 2022

This is a trickier one, but I think the results are worth it.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
@colega colega marked this pull request as draft November 7, 2022 15:55
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
@colega colega marked this pull request as ready for review November 7, 2022 16:04
Copy link
Contributor

@56quarters 56quarters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement! Just one stylistic concern.


// yoloPostingsOffsetTableReader reads the postings table returning a yoloed label, which can be used for comparison
// without allocating a new string (since in most of the cases, we just want to see if the label name has changed).
func yoloPostingsOffsetTableReader(d *encoding.Decbuf) (yoloLabel, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is all new code, can we use a different prefix than "yolo"? Maybe something like "mapped", "unsafe", "shared", etc.?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we also rename yoloLabel to unsafeLabel and its name() and value() functions to safeName() and safeValue() to clarify they make a copy and the returned string is safe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 0a6f787

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
@colega colega changed the title Use yoloed labels when reading the index header Use unsafe labels when reading the index header Nov 8, 2022
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@colega colega merged commit 49b6eba into main Nov 10, 2022
@colega colega deleted the use-yoloed-labels-for-reading-postings-offset-table branch November 10, 2022 12:24
colega added a commit that referenced this pull request Nov 11, 2022
TL;DR: don't try to be too clever, compiler already is.

In a previous PR #3397 I changed
the labels reading to use an unsafe string backed by the original buffer
slice, and then copying those strings manually when needed to store
them.

It happens that those optimizations already exist in the compiler, and
looking up `map[string([]byte{...})]` does not allocate as string.

Actually, I was surprised because benchmarks look even better, with even
less allocations than when using the unsafe string.

name                                old time/op    new time/op    delta
BinaryReader_LargerBlock/benchmark    1.13ms ± 2%    1.05ms ± 2%   -7.02%  (p=0.008 n=5+5)

name                                old alloc/op   new alloc/op   delta
BinaryReader_LargerBlock/benchmark     157kB ± 0%     127kB ± 0%  -19.22%  (p=0.000 n=5+4)

name                                old allocs/op  new allocs/op  delta
BinaryReader_LargerBlock/benchmark     1.32k ± 0%     0.69k ± 0%  -47.54%  (p=0.008 n=5+5)

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
@colega
Copy link
Contributor Author

colega commented Nov 11, 2022

Note that this is being superseded by #3436

colega added a commit that referenced this pull request Nov 11, 2022
* Use []byte when reading postings offset table

TL;DR: don't try to be too clever, compiler already is.

In a previous PR #3397 I changed
the labels reading to use an unsafe string backed by the original buffer
slice, and then copying those strings manually when needed to store
them.

It happens that those optimizations already exist in the compiler, and
looking up `map[string([]byte{...})]` does not allocate as string.

Actually, I was surprised because benchmarks look even better, with even
less allocations than when using the unsafe string.

name                                old time/op    new time/op    delta
BinaryReader_LargerBlock/benchmark    1.13ms ± 2%    1.05ms ± 2%   -7.02%  (p=0.008 n=5+5)

name                                old alloc/op   new alloc/op   delta
BinaryReader_LargerBlock/benchmark     157kB ± 0%     127kB ± 0%  -19.22%  (p=0.000 n=5+4)

name                                old allocs/op  new allocs/op  delta
BinaryReader_LargerBlock/benchmark     1.32k ± 0%     0.69k ± 0%  -47.54%  (p=0.008 n=5+5)

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Update CHANGELOG.md

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
masonmei pushed a commit to udmire/mimir that referenced this pull request Dec 16, 2022
* Use yoloed labels when reading the index header

When we're reading the index header, we're creating new labels.Label
struct by allocating new strings all the time, but in most of the cases
we just want to compare whether the label name has changed or not.

For that purpose, we can yolo the underlying buffer, avoiding that
allocation.

There are two cases when we need to unyolo:
- When we create the entry in the offsets map (that reference will be
stored)
- When we store the label value.

Benchmark results are pretty promising:

goos: linux
goarch: amd64
pkg: github.com/grafana/mimir/pkg/storegateway/indexheader
cpu: 11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz
BenchmarkBinaryReader_LargerBlock/benchmark         	    1018	   1178001 ns/op	  157075 B/op	    1323 allocs/op
BenchmarkBinaryReader_LargerBlock/benchmark         	    1074	   1215507 ns/op	  157074 B/op	    1323 allocs/op
BenchmarkBinaryReader_LargerBlock/benchmark         	    1002	   1233107 ns/op	  157074 B/op	    1323 allocs/op
BenchmarkBinaryReader_LargerBlock/benchmark         	    1051	   1124481 ns/op	  157074 B/op	    1323 allocs/op
BenchmarkBinaryReader_LargerBlock/benchmark         	    1132	   1111790 ns/op	  157075 B/op	    1323 allocs/op
PASS
ok  	github.com/grafana/mimir/pkg/storegateway/indexheader	65.679s

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Update CHANGELOG.md

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* make license

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Fix dumb bug

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* s/yolo/unsafe/

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
masonmei pushed a commit to udmire/mimir that referenced this pull request Dec 16, 2022
* Use []byte when reading postings offset table

TL;DR: don't try to be too clever, compiler already is.

In a previous PR grafana#3397 I changed
the labels reading to use an unsafe string backed by the original buffer
slice, and then copying those strings manually when needed to store
them.

It happens that those optimizations already exist in the compiler, and
looking up `map[string([]byte{...})]` does not allocate as string.

Actually, I was surprised because benchmarks look even better, with even
less allocations than when using the unsafe string.

name                                old time/op    new time/op    delta
BinaryReader_LargerBlock/benchmark    1.13ms ± 2%    1.05ms ± 2%   -7.02%  (p=0.008 n=5+5)

name                                old alloc/op   new alloc/op   delta
BinaryReader_LargerBlock/benchmark     157kB ± 0%     127kB ± 0%  -19.22%  (p=0.000 n=5+4)

name                                old allocs/op  new allocs/op  delta
BinaryReader_LargerBlock/benchmark     1.32k ± 0%     0.69k ± 0%  -47.54%  (p=0.008 n=5+5)

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Update CHANGELOG.md

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants