-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[2.1] Heavy memory footprint regression from 1.8 inmem to 2.1 #22936
Comments
Do you see the same when upgrading to 2.0.9? |
Yes, seems like the behavior is the same as 2.1.1. Slow upgrade, massive memory usage. Did you suspect that 2.0.9 had better performance for a specific reason? Note that we had also very similar issues with 1.8.x and the tsi1 index. I think the issue is really that we hit an edge case regarding performance here, probably related to the massive amount of files that are created. |
We noticed the high (unlimited) memory usage as well after the upgrade to 2.1.1 but i guess it was a coincidence then as we're only starting to pour data into it. |
FYI: We also tried the recommendations (specifically |
@sahib there was a change present in 2.0.9 and forward (including 2.1.1) that improved TSI memory consumption: #22334 . You may also be hitting #23085 - do you have any scrapers configured? Finally, @sahib @jo-me would either of you be able to share profiles? You can collect a full set of profiles with something like |
@lesam Thanks for your response 🙏
I just checked the git log, and the commit mentioned in that PR was already in the version I tested with. Or do you mean that this change might be the issue?
No, we did not have any scrapers configured. Regarding the profiles: Sorry, can't help with that anymore. We switched to Timescale roughly some weeks ago and don't have any InfluxDB instances running anymore. I really hope @jo-me can be of more service than me here. In hindsight, the thing that felt the strangest was the millions of files produced in the database directory. |
No way to identify or reproduce this issue, and it may have been a duplicate of #23085 , so closing. |
Hello again,
I'm in the process of upgrading my 1.8 db to 2.1.1. Background is that we as a company
hope to get rid of the issue we had with upgrading to the tsi1 index in 1.8.
The upgrade works fine on my test setup and on a machine with small amount of
data. Once deployed to a staging environment though, which has an equal
amount of data as our prod environments, the problems start to show up. We're
talking about roughly 30G of data (measured by directory size), so rather
a medium workload for Influx.
First problem is that the update is abysmal slow. It takes about 2 hours until
it does anything - I had to increase INFLUXD_INIT_PING_ATTEMPTS to 1000000 so
that the automated docker upgrade doesn't kill the process. Before that time it
writes only minimal amounts of data to disk, then it suddenly starts to write
large quantities of data in a few minutes. During the upgrade it wrote roughly
30G of data, which seems reasonable. The upgrade part is not the real issue
here, I just was a bit surprised by the performance. After all it makes the
conversion harder, especially if you have more than one instance. The whole upgrade
took roughly 4 hours.
Second (and actual) problem is that after starting
influxd
it consumesinsane amounts of memory. 1.8 happily ran with 8GB of memory usage (even though
we had
inmem
as index). 2.1 seemed to eat RAM endlessly once it started:I had a on a 32GB machine and I still had to add 10GB of swap, otherwise the
startup would fail. From the logs one could see that it was re-indexing data,
during which influxd was also not taking any queries. Swapping might have
slowed down things, but still it took roughly 10 hours until influxd
started accepting queries. Also compared to 1.8's
inmem
index the memoryusage is at least 5 times as high in our case. And that's on a machine that
does not see a lot of traffic. One issue we wanted to solve is the slow startup
time of influx 1.8 (due to rebuilding the
inmem
index). This also seemed togot worse, since it every start of influx seems to still rebuild the index.
Similar tickets (for 1.8 though, the first one opened by myself):
Any ideas on how to debug this? That performance metrics do not really make it
possible to update to 2.1 any time soon. Is there something obvious (like some config options)
that I'm missing?
Steps to reproduce:
(and the README of the docker image)
Expected behavior:
Actual behavior:
See description above.
Environment info:
Linux 5.4.0-1045-aws aarch64
InfluxDB 2.1.1 (git: 657e1839de) build_date: 2021-11-09T03:03:48Z
*-alpine
variant of the docker images.SHOW SERIES CARDINALITY
),which does not seem that high...
Config:
Completely standard from what I can see:
Logs:
These logs are visible after influxd start:
Once restarted, one can see that this command takes quite a bit time before starting influx.
The user is already set correctly, but it seems that directory contains over one million files.
Just counting them via
find
took a few minutes - might that be the issue?The text was updated successfully, but these errors were encountered: