[feature proposal]: add an idle cache monitor #5771

jonseymour · 2016-02-21T08:18:40Z

Currently influx can run out of memory when backfilling sparse historical data (ref: #5685)

This occurs if the amount of historical data in a given shard is less than 25MB (so doesn't trigger the flushing threshold) and historical data is being written into a large number of shards. Currently there is no limit which prevents exhaustion of RAM available given enough sparsely distributed historical data.

The problem can currently be worked around with configuration by adjusting either cache-snapshot-write-cold-duration or cache-snapshot-memory-size for the duration of the backfilling activity. However, this is a configuration change that requires a server restart and one may not be aware of the need for such a change until the server has first crashed because of running out of memory.

This proposal aims to add a dynamic tuning mechanism which avoids this issue.

First, we add two configuration parameters:

max-idle-cache-bytes

This is the total amount of memory consumed by all idle caches.

idle-cache-age-ms

This is the maximum cacheAgeMs (defined in 6697c72) a cache can reach before it is considered idle.
When deciding when to compact a shard, the cache compactor will compare the cache size against a dynamically determined parameter, idleLimitBytes which is calculated as the result of max-idle-cache-bytes / (# of idle-caches).

If the cache size exceeds idleLimitBytes, then the cache is eligible for compacting.

An otherwise idle cache which is empty is excluded from (# of idle-caches) so that we only focus on compacting caches that are consuming at least some memory.

If the value of max-idle-cache-bytes is set to 100MB, then existing behaviour will be unchanged since most of the time cache-snapshot-memory-size will be reached before max-idle-cache-bytes/(# of idle shards) is reached. However, if there are more than 4 idle caches caused by backfilling, the budget for each idle cache will be reduced, resulting in these caches eventually being compacted and removed from memory.

We might also add statistics that reveal what the current idleLimitBytes is for each cache.

The net result of these changes would be that a tight upper bound can be put on the total memory consumed by idle caches for longer than idle-cache-age-ms.

If there are no objections, @jwilder, I will start work on this.

The text was updated successfully, but these errors were encountered:

julienvienne · 2016-09-14T07:00:18Z

+1

nuoluo · 2017-10-19T02:59:38Z

+1
We are having the problem where one shard by itself exceeds the maximum cache size while others are empty.
I suspect influxdb cant free the cache because of the empty caches, and because of the maxed out cache no more data can come in, so it stuck in this dead loop.

stale · 2019-07-23T21:33:40Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2019-07-30T22:29:46Z

This issue has been automatically closed because it has not had recent activity. Please reopen if this issue is still important to you. Thank you for your contributions.

e-dard added the kind/feature-request label Nov 30, 2016

dgnorton added the 1.x label Jan 7, 2019

stale bot added the wontfix label Jul 23, 2019

stale bot closed this as completed Jul 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature proposal]: add an idle cache monitor #5771

[feature proposal]: add an idle cache monitor #5771

jonseymour commented Feb 21, 2016

julienvienne commented Sep 14, 2016

nuoluo commented Oct 19, 2017 •

edited

Loading

stale bot commented Jul 23, 2019

stale bot commented Jul 30, 2019

[feature proposal]: add an idle cache monitor #5771

[feature proposal]: add an idle cache monitor #5771

Comments

jonseymour commented Feb 21, 2016

julienvienne commented Sep 14, 2016

nuoluo commented Oct 19, 2017 • edited Loading

stale bot commented Jul 23, 2019

stale bot commented Jul 30, 2019

nuoluo commented Oct 19, 2017 •

edited

Loading