-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[0.10.0] Unbounded memory usage when inserting data #5685
Comments
What kind of time period is covered by the timestamps in the total data set? If the timestamps in the data set cover a wide period of time (say several weeks, months or years), then influx will be creating multiple caches and WAL logs one for each shard that covers the time periods of interest so this might be causing the memory requirements to be larger than you might otherwise expect. If this case applies, then you can try to reduce the size of each cache (use the cache-max-memory-size parameter inside the [data] section of the configuration) which will encourage the WAL to start compacting earlier. This should reduce the total memory consumed by influx during your initial load. If you then start to consume data contemporaneously, you can revert to the default configuration which might be better optimised for processing data in real time. |
I have same problem too. Before Influxdb run on 2 G, now I need 16g. I am only writing data. If I set less than 16 G I got 500 http code, and even restarting influxdb dosent fix it. I think there is something wrong with 0.10 I will try your recommendation tomorrow. But I would like to know if this is a bug or it's normal. There is a huge difference. |
I don't speak for influx, and my comments may only apply to the hypothetical case where a large amount of relatively sparsely distributed historical data is being bulk loaded; they don't apply to cases of memory issues where data is being loaded in real time or in cases where the historical data is dense enough to ensure that most writes are concentrated in a handful of active shards. |
Ok, |
How big and how many .wal files are in the database directory? |
Thanks for the suggestion @jonseymour. The data I'm loading covers a period of about 3 years, so it definitely qualifies. I managed to finish loading the data before I saw your comment by periodically restarting influxdb whenever the memory usage grew too large, so I didn't have a chance to see if the If |
17 wals files and all is around 120 mo, the biggest is about 6mo |
@SaganBolliger personally, I think there is a case for dividing a total cache budget across the active shards so that the system can dynamically adjust to different load scenarios |
@easyrasta based on this it seems unlikely that your memory issues are related to the caches of a large number of active shards. It might be worth raising a separate issue detailing your case so that you can solicit feedback about your particular problem from the influx support team. |
Thank's @jonseymour I did it |
@SaganBolliger Another setting you could adjust is |
@SaganBolliger The size parameter that will increase flushing frequency if its value is decreased is actually |
It would be great that someone write a blog on this kind of tuning. 1h isn't too high value for default ? |
would be great if influx would not need tuning, did not have this problem on 0.9.x |
It's quite possible the value is too high and the default should be lower. If you are able to test with lower values that would be useful data to share. |
@francisdb - I am sure this on the influx roadmap somewhere - there comes a time when you just have to ship. I also have some ideas about how this might be done. First things, first, though. It would be really useful I think, if the shard stats that are published to the _internal database were extended with the following metrics:
This would make it much easier to reason about where the big memory usage is and if there are any problems (for example; snapShots > 1) in the compaction path. It would help confirm issues such as those caused by backfilling as hypothesised in this issue. Having such stats would also allow the before and after benefits of any later change aimed at dynamically optimising caching behaviour to be measured quantitatively. I am quite keen to have a crack at adding such stats. @jwilder are you happy for me to propose changes in this area? If influxdata staff are already working on such changes let me know and I'll find some other things to do :-) |
@jonseymour #5499 is still open. Any stats and diagnostics would be useful and no one is working on that currently. |
Just hit this myself, about 640 series, even if I massively delay my writes (1000 per block with 10s delay between writes) I run out of memory on a VM with 16Gb ram. I have about 3 years of data for 10 devices that I'm inserting into one measurement. Just about to try messing with the config options but cache-snapshot-write-cold-duration down to "1m" didn't help. |
We have seen this when uploading data that introduced a lot of new shards being created. Like you have data for jan 2016 to april 2016 and you do a batch upload of sparse historic data all the way to jan 2010. |
Also hit this when importing 2 years of data... was only able to work around it by restarting the database when the memory limit was hit. |
Closing this because the write path has been changed significantly since 0.10. Please open a new issue if you're running into problems on a current release. @joshughes if you had to restart v1.1 after writing many new series you likely ran into #7832, fixed in 1.2rc1 but not yet backported to the 1.1 release. |
I recently upgraded from 0.9.0 to 0.10.0 and since then have been running into memory issues that I'm guessing are related to the new tsm engine. My dataset consists of 6 series, each with 4 fields, and about 75M records per series. I'm not using tags. I'm inserting this data into influxdb using the Python Pandas client in chronological order in chunks of approximately 4k records at a rate of about 6 chunks per second (one chunk per series per second). Previously when I tried doing this with 0.9.0 the memory usage would stay moderate never going over around 2 GB, but with 0.10.0 memory usage seems to increase linearly with the amount of data inserted. After around 1/3rd of the data, the memory usage hits 15GB, at which point I'm forced to kill the process. I've had this same issue running on OSX El Cap and Ubuntu 14.04.
The text was updated successfully, but these errors were encountered: