-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TSI engine consumes significantly more memory than TSM engine #11830
Comments
Thanks for the the very comprehensive ticket! As you have 30 databases you will have at least 30 indexes—more if you have more than one shard per index. Sometimes, depending on the cardinality, your index data will never flush from the log files to the file-backed tsi files because it doesn't reach the default limit needed to flush. That limit is controlled by the My initial thought here is that your high heap is because you have lots of log files sat on the heap and there isn't enough cardinality in them to flush them to tsi file. Could you try reducing |
Hey @e-dard - thanks for getting back to me so quick! Before we reverted our nodes back to using the TSM engine, we tried tweaking that setting too. Across five identical TSI nodes with the same data, we tried the following values for
After restarting the nodes and letting them run throughout the evening, all of them climbed back up to the same memory usage they were at before. From what I remember, the heap profiles also didn't look too different, but unfortunately I didn't save those. Since we didn't see any noticeable difference in memory usage between a log file size of "1m" and "64k", we didn't try setting it any lower... but maybe we should have kept going? |
I have the same issue with influxdb 1.6.2,1.6.6 using tsi1 index:
The influxdb process have almost 120G memory reserved and oom-killed every week. |
i have the same issue with influxdb v1.5.2. Is there any news about this question? |
Hi, Same issue here. When I run InfluxDB with tsi1 engine and after having build all indexes with
The, I stop influxdb, change configuration to use After I start influxdb and all inmem indexing is calculated, influxdb is consumming around 4,5GB memory:
The overall startup time of InfluxDB is not especially improved using |
Quote myself, I found out that because my influxdb instance has too much shards. After setting the longer shard group duration, my tsi1 memory decreased from 120GB to 70GB. Meanwhile I disabled the precreate shard option, because some future datapoints may lead to create too much shards. |
@andreykaipov this might be the world’s longest wait for a reply... But, I was reviewing this ticket and noticed something. When you change the max log file size it will only affect new shards. Therefore in order to see the impact on current shards you need to rebuild your TSI indexes with the buildtsi tool (which accepts an optional max-log-file-size). |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@e-dard I must also apologize for taking a bit to get back with an update! 😅 At the time of this ticket, I totally missed the buildtsi tool had an option for that. We figured the setting in the config applied to existing shards too. Passing a lower max-log-file-size value to the buildtsi tool when converting from TSM to TSI did indeed cause InfluxDB to start up with lower memory usage, but things were still running rather hot. So next we also went with @seanlook's suggestion of increasing our shard group durations. By default we had daily shards, so as an experiment converted half our nodes in one cluster to use weekly shards. I wish we had tried this out sooner. As shard counts dropped over the next few months, so did memory and load average on the TSI nodes, eventually steadying out. Unfortunately it's hard to say at what counts we saw the improvement. We just looked at our dashboards one day and noticed the gradual improvement! Other improvements we noticed are the TSI nodes with fewer shards haven't had any significant or unusual buffered writes over these few months, and they start up a lot quicker too (which makes sense given there's less indices to open). So yeah, that's the story. As we slowly convert our other datacenters and environments to use TSI and weekly shards, I'll try to report back the results. I'm hopeful. Thank you to everybody on this thread. We learned a lot and we really appreciate it. :-) I'm okay with closing out this issue. I hope the operational knowledge helps and addresses the issue for everybody else too. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I hit the same issue with indexes. The only way to start my influx db (about 90Gb in disk) was deleting indexes and recreating them with
With both index versions the startup process takes about 7minutes =/ I'm running 1.7.9 in ubuntu server, (In DigitalOcean) 8cpu's and 16Gb... Cheers |
Likely fixed in 2.0.9 by #22520 |
Ever since switching our InfluxDB nodes to use TSI-based indices, we've seen nodes slowly climb up in memory until they've started crashing frequently during midnight compactions.
We set up a version 1.7.2 test node with around 4 million series spread across 30 databases to try and investigate, and found TSI-based indices consuming significantly more memory. In our tests we let Influx run for a while before stopping it, converting between TSI and TSM-based indices, and restarting it.
Here are the top processes for the node when TSI is enabled and all indices are TSI-based:
And here they are for the same node when TSM is enabled and all indices are TSM-based:
These heap profiles were found from hitting the
/debug/pprof/heap
endpoint and are attached below.When aggregating over the line level, the TSI-based profile shows the following line to be responsible for a good chunk of memory: https://github.com/influxdata/influxdb/blob/v1.7.2/tsdb/index/tsi1/log_file.go#L718.
From what it looks like, this is where InfluxDB saves the tag values for a specific tag key when writing to the write-ahead log. When saving tags for high cardinality series, the allocation from
string(v)
so many times must be overloading the heap. I don't know how accurate that guess is, but how come the TSM engine doesn't have a similar issue?TSI profile: tsi_based.pprof.influxd.alloc_objects.alloc_space.inuse_objects.inuse_space.pb.gz
TSM profile: tsm_based.pprof.influxd.alloc_objects.alloc_space.inuse_objects.inuse_space.pb.gz
The text was updated successfully, but these errors were encountered: