-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of TSI Bloom Filter #8857
Conversation
/cc @pauldix regarding destructive nature of this PR on existing TSI indexes. |
@e-dard if they happen to still have old TSI files, will it show an error in the logs and ignore them? |
I noticed
You get a little bump in perf:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit, but LGTM. Nice speedup!
pkg/bloom/bloom.go
Outdated
return murmur3.New128() | ||
}), | ||
} | ||
return &Filter{k: k, b: make([]byte, m/8), mask: m - 1} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of m/8
, you can use m >> 3
which is a little more efficient.
@jwilder @benbjohnson see be45245 for a Attempting to open an incompatible index results in @rbetts FYI - I think it's important we make a reminder somewhere to add something to the release notes for whatever version ships TSI. We'll need to describe the index conversion steps for any users who have been running TSI on the nightlies prior to this PR change. |
tsdb/index/tsi1/index.go
Outdated
} | ||
return nil | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually think of Err
as the error type instead of the validation function. Can you do something like:
// ErrIncompatibleVersion is returned when attempting to read from an
// incompatible tsi1 manifest file.
var ErrIncompatibleVersion = errors.New("incompatible tsi1 index MANIFEST")
// ErrIncompatibleVersion returns an IncompatibleVersionError if the provided
// manifest is incompatible with the current tsi1 package.
func (m *Manifest) Validate() error {
// If we don't have an explicit version in the manifest file then we know
// it's not compatible with the latest tsi1 Index.
if m.Version != Version {
return ErrIncompatibleVersion
}
return nil
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I should add that error types should end in ...Error
and error values should begin with Err...
but I know what you mean, and in this case an explicit type was overkill and I never checked it anywhere.
This commit replaces the previous hashing algorithm used by the pkg.Filter with one based on xxhash. Further, taking from the hashing literature, we can represent k hashes with only two hash function, where previously Filter was using four. Further, unlike `murmur3`, `xxhash` is allocation-free, so allocations have dramatically reduced when inserting and checking for hashes.
This commit adds a basic TSI versioning scheme, by adding a Version field to an index's MANIFEST file. Existing TSI indexes will not have this field present in their MANIFEST files, and thus will be deemed incomatible with the current version. Users with existing TSI indexes will be able to remove them, and convert the resulting inmem indexes to the current version of a TSI index using the influx_inspect tooling.
Thank you very much for adding this improvement, I experienced a multi-gigabyte drop in RSS memory due to the change from murmur to xxhash! (Previously I inspected influx with pprof and recognized that the major portion of heap memory was allocated to 'objects' from the bloom pkg) As you stated, I dropped the old indexes and recreated them and installed 1.3.7rc0 which includes this PR here :) |
Required for all non-trivial PRs
This PR changes the hashing regime currently used in the TSI Bloom Filter (
murmur3
) toxxhash
, and also, using the linear combination of two hashes trick, reduces the number of hashes generated perContains
/Insert
call from two to one.Further, allocations have been eradicated since
xxhash
is allocation-free. This allows us to remove the existing pool of hashers.The overall improvement in performance is about 5x .
Note
Since this changes the hashing functions used to lookup the (possible) existence of series in TSI index files, any existing TSI files are no longer going to work. Anyone who has existing TSI indexes will need to remove them (which will put the shard back on the
inmem
index), and then recreate a TSI index using theinflux_inspect inmem2tsi
tool.Benchmarks