Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve TSM1 cache performance #7633

Merged
merged 9 commits into from
Dec 14, 2016
Merged

Improve TSM1 cache performance #7633

merged 9 commits into from
Dec 14, 2016

Conversation

e-dard
Copy link
Contributor

@e-dard e-dard commented Nov 16, 2016

Required for all non-trivial PRs
  • Rebased/mergable
  • Tests pass
  • CHANGELOG.md updated

This PR optimises the TSM1 cache, improving write performance under load by up to 70% from 1.1.

It works by using a simple (crude, really) Hash Ring, which allows any series key to be consistently mapped to a partition in the ring. Each partition contains a map of series keys to TSM1 entries, and manages safe access to that map. Entries are distributed uniformly between a small (currently fixed to 128) partitions.

Since each partition manages access to the entries that map to it, contention over the entire cache is minimised, and restricted to cases when keys that map to the same partition as each other need to operate on the cache.

Further, currently the cache is essentially re-allocated and re-build after a snapshot is taken, which is expensive. This PR adds some logic that keeps track of how entries within the cache are currently allocated as hints, and uses these hints to pre-allocate the cache after a snapshot, so that subsequent additions of entries do not result in new allocations within the cache data-structure.

Performance

I've been mainly testing performance on a c4.8xlarge AWS instance, firing a combined 1.8 M writes/sec at the instance using influx-stress. Specifically:

# host 1
$ influx-stress insert -n 30000000 --pps 900000 --host http://influxdb:8086

# host 2
$ influx-stress insert -n 30000000 --pps 900000 --host http://influxdb:8086

In total 600M points are written.

  • Write Throughput InfluxDB 1.1 RC1: 1,082,621 writes/sec
  • Write Throughput er-cache: 1,682,816 writes/sec

This is a typical run, where the points have a single value. The case where there are multiple points have a similar difference.

@e-dard e-dard added this to the 1.2.0 milestone Nov 16, 2016
@e-dard e-dard changed the title [WIPImprove TSM1 cache performance [WIP] Improve TSM1 cache performance Nov 16, 2016
@e-dard e-dard force-pushed the er-cache branch 3 times, most recently from eff1549 to 6aa2456 Compare November 17, 2016 18:41
}

wg.Wait()
close(res)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this Wait and close are in their own goroutine, you can start looping over res sooner. Because res is buffered, it looks like there's no risk of deadlock.

@e-dard e-dard force-pushed the er-cache branch 8 times, most recently from a68fc86 to 344e576 Compare November 29, 2016 19:23
@e-dard e-dard changed the title [WIP] Improve TSM1 cache performance Improve TSM1 cache performance Dec 2, 2016
Copy link
Contributor

@jwilder jwilder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small nit, but looks great!

// {1, 2, 4, 8, 16, 32, 64, 128, 256}.
//
func newring(n int) (*ring, error) {
if n <= 0 || n > 256 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move 256 to a const since it's repeated a few times in this func?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we should probably test w/ higher partitions under high load to see if that has any benefit.


p.mu.RLock()
for k, e := range p.store {
if err := f(k, e); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I got a lot of timeouts under high load, this was one of the spots that showed up where many goroutines where busy. The issue was a that each entry was being sorted which was expensive and there were millions of them which I think caused CPU starvation for the waiting write HTTP goroutines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is huge, thank you very much for adding this!

Currently, whenever a snapshot occurs the Cache is reset and so many
allocations are repeated, as the same type of data is re-added to
the Cache.

This commit allows the stores to keep track of the number of values
within an entry, and use that size as a hint when the same entry needs
to be recreated after a snapshot.

To avoid hints persisting over a long period of time they are deleting
after every snapshot, and rebuilt using the most recent entries only.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants