You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently influx can run out of memory when backfilling sparse historical data (ref: #5685)
This occurs if the amount of historical data in a given shard is less than 25MB (so doesn't trigger the flushing threshold) and historical data is being written into a large number of shards. Currently there is no limit which prevents exhaustion of RAM available given enough sparsely distributed historical data.
The problem can currently be worked around with configuration by adjusting either cache-snapshot-write-cold-duration or cache-snapshot-memory-size for the duration of the backfilling activity. However, this is a configuration change that requires a server restart and one may not be aware of the need for such a change until the server has first crashed because of running out of memory.
This proposal aims to add a dynamic tuning mechanism which avoids this issue.
First, we add two configuration parameters:
max-idle-cache-bytes
This is the total amount of memory consumed by all idle caches.
idle-cache-age-ms
This is the maximum cacheAgeMs (defined in 6697c72) a cache can reach before it is considered idle.
When deciding when to compact a shard, the cache compactor will compare the cache size against a dynamically determined parameter, idleLimitBytes which is calculated as the result of max-idle-cache-bytes / (# of idle-caches).
If the cache size exceeds idleLimitBytes, then the cache is eligible for compacting.
An otherwise idle cache which is empty is excluded from (# of idle-caches) so that we only focus on compacting caches that are consuming at least some memory.
If the value of max-idle-cache-bytes is set to 100MB, then existing behaviour will be unchanged since most of the time cache-snapshot-memory-size will be reached before max-idle-cache-bytes/(# of idle shards) is reached. However, if there are more than 4 idle caches caused by backfilling, the budget for each idle cache will be reduced, resulting in these caches eventually being compacted and removed from memory.
We might also add statistics that reveal what the current idleLimitBytes is for each cache.
The net result of these changes would be that a tight upper bound can be put on the total memory consumed by idle caches for longer than idle-cache-age-ms.
If there are no objections, @jwilder, I will start work on this.
The text was updated successfully, but these errors were encountered:
+1
We are having the problem where one shard by itself exceeds the maximum cache size while others are empty.
I suspect influxdb cant free the cache because of the empty caches, and because of the maxed out cache no more data can come in, so it stuck in this dead loop.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please reopen if this issue is still important to you. Thank you for your contributions.
Currently influx can run out of memory when backfilling sparse historical data (ref: #5685)
This occurs if the amount of historical data in a given shard is less than 25MB (so doesn't trigger the flushing threshold) and historical data is being written into a large number of shards. Currently there is no limit which prevents exhaustion of RAM available given enough sparsely distributed historical data.
The problem can currently be worked around with configuration by adjusting either
cache-snapshot-write-cold-duration
orcache-snapshot-memory-size
for the duration of the backfilling activity. However, this is a configuration change that requires a server restart and one may not be aware of the need for such a change until the server has first crashed because of running out of memory.This proposal aims to add a dynamic tuning mechanism which avoids this issue.
First, we add two configuration parameters:
This is the total amount of memory consumed by all idle caches.
This is the maximum
cacheAgeMs
(defined in 6697c72) a cache can reach before it is considered idle.When deciding when to compact a shard, the cache compactor will compare the cache size against a dynamically determined parameter, idleLimitBytes which is calculated as the result of
max-idle-cache-bytes / (# of idle-caches)
.If the cache size exceeds idleLimitBytes, then the cache is eligible for compacting.
An otherwise idle cache which is empty is excluded from
(# of idle-caches)
so that we only focus on compacting caches that are consuming at least some memory.If the value of
max-idle-cache-bytes
is set to 100MB, then existing behaviour will be unchanged since most of the timecache-snapshot-memory-size
will be reached beforemax-idle-cache-bytes/(# of idle shards)
is reached. However, if there are more than 4 idle caches caused by backfilling, the budget for each idle cache will be reduced, resulting in these caches eventually being compacted and removed from memory.We might also add statistics that reveal what the current idleLimitBytes is for each cache.
The net result of these changes would be that a tight upper bound can be put on the total memory consumed by idle caches for longer than
idle-cache-age-ms
.If there are no objections, @jwilder, I will start work on this.
The text was updated successfully, but these errors were encountered: