-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track cache, WAL, filestore stats within tsm1 engine #5758
Conversation
I didn't run tests locally with -race enabled, so I'll eventually fix those and amend this PR. Happy to implement any other feedback along the way. |
1babc83
to
602043e
Compare
Complementing and extending the changes in #5758. Add 2 level statistics: * snapshotCount * cacheAgeMs Add 2 counter statistics * cachedBytes * WALCompactionTimeMs snapshotCount can be used to measure transient write errors that are causing snapshots to accumulate cacheAgeMs can be used to guage the level of write activity into the cache The differences between cachedBytes stats sampled at different times can be used to calculate cache throughput rates The ratio (cachedBytes-diskBytes)/WALCompactionTimeMs can be used calculate WAL compaction throughput. The ratio of difference between first and last WAL compaction time over the interval length is an estimate of percentage of cache throughput consumed. Signed-off-by: Jon Seymour <jon@wildducktheories.com>
tsm: cache: add cache throughput related statistics.
The intent of this change is to ensure that all statistic fields of the resulting tsm1_cache measurement are initialized on initialization of the cache. That way, any consumer of those measurements doesn't have to deal with the null case. Signed-off-by: Jon Seymour <jon@wildducktheories.com>
tsm: cache: ensure all statistics are initialised on cache creation.
Since we are not locking but relying on atomic arithmetic, use Add rather than Set. Will also result in slightly less garbage being created. Signed-off-by: Jon Seymour <jon@wildducktheories.com>
tsm: cache: during writes, update the memSize statistic outside the lock
@mark-rushakoff - I wonder if we want to do something about suppressing stats from the snapshot caches? For example... I'll submit a PR with my suggestion about how to do this. |
…ler constructor The intent of this change is to avoid writing caches created for snapshot cache instances into the tsm1_cache measurement. We can do this by avoiding use of the NewCache constructor. All other methods are only intended to be called from on the engine cache - never on a snapshot. Signed-off-by: Jon Seymour <jon@wildducktheories.com>
My suggestion for this is found in #5778. |
@mark-rushakoff There doesn't seem to be a way to stop publishing statistics so cache statistics live even after the shard has been closed or even if the database that contains them has been deleted. Is there some way to stop a statistics map being published when it is no longer in use? |
@jonseymour we're using the expvar package directly, which does not expose a way to remove or unpublish a stat. Internally there's been a conversation or two about this but we haven't discussed it in depth yet. For now, I think it's fine to have the "dead" stats around, although I'd certainly like to address it before the 0.11 release. |
@jwilder This PR should be ready for review now. |
@mark-rushakoff is it ok for me to raise an issue regarding cleaning up "idle" statistics keys? |
Sure, feel free to open a separate issue. Thanks. On Mon, Feb 22, 2016 at 10:45 AM, Jon Seymour notifications@github.com
|
Needs a changelog update. 👍 otherwise. |
Track cache, WAL, filestore stats within tsm1 engine
Complementing and extending the changes in influxdata#5758. Add 2 level statistics: * snapshotCount * cacheAgeMs Add 2 counter statistics * cachedBytes * WALCompactionTimeMs snapshotCount can be used to measure transient write errors that are causing snapshots to accumulate cacheAgeMs can be used to guage the level of write activity into the cache The differences between cachedBytes stats sampled at different times can be used to calculate cache throughput rates The ratio (cachedBytes-diskBytes)/WALCompactionTimeMs can be used calculate WAL compaction throughput. The ratio of difference between first and last WAL compaction time over the interval length is an estimate of percentage of cache throughput consumed. Signed-off-by: Jon Seymour <jon@wildducktheories.com>
This PR adds stats to track disk usage for the tsm1 FileStore and WAL, and disk+memory for the Cache. The stats are tracked per-engine, not per-file.
During manual testing, the stats seem to be consistent with file sizes on disk, inspected out-of-band from the influxd process.