-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use specific types when reading offset table #3393
Conversation
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Instead of reading a generic-ish []string, we can read a generic type which would be specifically labels.Label. This avoid allocating a slice that escapes to the heap, making it both faster and more efficient in terms of memory management. name old time/op new time/op delta BinaryReader 42.7µs ± 3% 33.9µs ± 8% -20.63% (p=0.008 n=5+5) name old alloc/op new alloc/op delta BinaryReader 41.7kB ± 0% 35.3kB ± 0% -15.31% (p=0.000 n=5+4) name old allocs/op new allocs/op delta BinaryReader 623 ± 0% 423 ± 0% -32.10% (p=0.008 n=5+5) Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
I initially assumed that we can just check for label emptiness, but we can't as there's a special entry in the postings offsets table that is an empty name/value label that corresponds to the "all postings" entry. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes seem reasonable to me but I'm not a TSDB expert
// ReadOffsetTable reads an offset table and at the given position calls f for each | ||
// found entry. If f returns an error it stops decoding and returns the received error. | ||
// TODO: use the upstreamed version from Prometheus tsdb/index package when available. | ||
func readOffsetTable[T any](bs index.ByteSlice, off uint64, read func(d *encoding.Decbuf) (T, error), f func(T, uint64, int) error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Does this need to be generic? It's always called with labels.Label
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I'm upstreaming the change to Prometheus I want to keep code exactly the same to just make a drop-in replacement later.
But meh, maybe you're right, it will be very easy to replace it anyway. Removing in 8633e9b
We'll bring them back once we start using the Prometheus version. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
@@ -533,84 +534,79 @@ func newFileBinaryReader(path string, postingOffsetsInMemSampling int, cfg Binar | |||
return nil, errors.Wrap(err, "read symbols") | |||
} | |||
|
|||
var lastKey []string | |||
var lastLbl labels.Label | |||
var lastSet bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super-nit: lastSet := false
(it's just more explicit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an internal fight for me, being more explicit vs being consistent with the var declaration above.
Changed in 1d2cfb6
prevRng = index.Range{Start: int64(off + postingLengthFieldSize)} | ||
return nil | ||
}); err != nil { | ||
return nil, errors.Wrap(err, "read postings table") | ||
} | ||
if lastKey != nil { | ||
if lastLbl.Name != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be: if lastSet
? (Although it doesn't change the logic, unless someone writes postings for empty label name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I'll change it.
Empty label name is a valid case for "all postings" entry, but if it's last, it would be just a noop here anyway.
Applied in 30b5533
r.postings[lastKey[0]].lastValOffset = int64(off - crc32.Size) | ||
lastKey = nil | ||
r.postings[lastLbl.Name].lastValOffset = int64(off - crc32.Size) | ||
lastLbl = labels.Label{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not lastSet = false
? (Or: why is this even needed? We overwrite lastSet
and lastLbl
shortly anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if not strictly needed, doing lastSet = false
here may better explain the intention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding yet another useless line, I just removed this lastLbl = labels.Label{}
line.
Once you have a last, you can't unhave it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Please can you ACK that you're okay with this change so I can merge?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
works for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (modulo Peter's comments)
r.postings[lastKey[0]].lastValOffset = int64(off - crc32.Size) | ||
lastKey = nil | ||
r.postings[lastLbl.Name].lastValOffset = int64(off - crc32.Size) | ||
lastLbl = labels.Label{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if not strictly needed, doing lastSet = false
here may better explain the intention.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Add BenchmarkBinaryReader_LargerBlock Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Use specific types when reading offset table Instead of reading a generic-ish []string, we can read a generic type which would be specifically labels.Label. This avoid allocating a slice that escapes to the heap, making it both faster and more efficient in terms of memory management. name old time/op new time/op delta BinaryReader 42.7µs ± 3% 33.9µs ± 8% -20.63% (p=0.008 n=5+5) name old alloc/op new alloc/op delta BinaryReader 41.7kB ± 0% 35.3kB ± 0% -15.31% (p=0.000 n=5+4) name old allocs/op new allocs/op delta BinaryReader 623 ± 0% 423 ± 0% -32.10% (p=0.008 n=5+5) Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Update CHANGELOG.md Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Use a specific flag to signal that lastLbl was set I initially assumed that we can just check for label emptiness, but we can't as there's a special entry in the postings offsets table that is an empty name/value label that corresponds to the "all postings" entry. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Remove generics as we don't need them now We'll bring them back once we start using the Prometheus version. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Apply rewiew comments Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> * Check for lastSet Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com> Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
What this PR does
Modified the
index.ReadOffsetTable
function to use a specific type instead of a slice, that escapes to heap (for every label value in the postings offset table).I will upstream this change to Prometheus, and then use the upstreamed version again when it's available, but wanted to bring it here first, especially because we already have benchmarks for this.
I also added a benchmark with a larger block size:
Which issue(s) this PR fixes or relates to
None, just was surfing the code.
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]