-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Show tag keys performance on 1.4.1 (tsi1) #9125
Comments
Hi @reistiago, How many retention policies do you have? Could you provide One of the bugs we fixed with If you have other retention policies on your database that could be part of the reason why it's now taking longer than it did on 1.3.7. Especially if those retention policies contain the measurement. P.S. we now have an explicit cardinality command, which is a bit easier to remember than the query for the |
@reistiago Thanks for the bug report. Can you run a profile while the query is running? That will help us narrow down what the issue is.
This will return a |
Only the default (autogen):
I know about @benbjohnson like I said above, we had to revert back to 1.3.7 so I can't really do that now, and it takes quite a long time (hours) to recreate the index based on our current dataset. I'll try to get it tomorrow. |
@reistiago OK, we understand. We can try and reproduce our side. Roughly how many measurements do you have in |
@e-dard very few, around 10. |
Attached a few pprof files while the query was executing, files name should be self explanatory. Don't know how long did the query took to run because I got busy with something else in the meantime and didn't noticed when it finished. But it took longer then 23 minutes. profiles-query-running-for-4-minutes.tag.gz In case this is useful, looking at the output of
Influx log while the query was running was mostly:
Using the new method to calculate the cardinality returns:
Space occupied in disk for that database data folder: 64G I've kept an image of this machine, so running further tests should be quicker in case you need more information. |
@reistiago do you have any idea how many tag keys you have in your database? Essentially:
You could run it on |
@reistiago Ah I think I have reproduced it... It's the number of shards you have that's the problem. It looks like you have around a 1,000 shards at the moment. I'll look into optimising the query when there are many shards. However, (and this advice isn't merely to improve the performance of this query), you should really consider an alternative retention policy unless you're dropping data a lot. Fewer shards are going to cause less stress on the system, and will significantly reduce the size of your What's your current retention policy duration? |
@e-dard assuming that shard count, equals the number of folders in /data/{database}/{retention-policy} your guess is correct about the number of shards: 1043. Only 9 tag keys on that measurement, so you are likely correct that the problem is not here. We are using the default retention policy, so infinite. Increasing the shard size / duration will probably cause slower reads correct? |
@e-dard sorry for the question, since it's only tangentially related with the issue. If I alter the retention policy (autogen in this case) with:
Is this retroactively applied to the shards that already exist? Or this must be done on db creation? |
@e-dard any news on this issue / pr? |
@reistiago a fix has been merged for this. I believe it will be back-ported into a Regarding your other question—yes |
@e-dard I still have the machine image with the problem so will give it a try when available. Thanks for the reply regarding the shards configuration. |
@e-dard this ended up not being back ported correct? Also looking at the 1.5.0 change log it's not referenced yet (in master). I'm just confused on what's the status on this issue / pr. |
Bug report
System info:
influxDB 1.4.1 with tsi1
Amazon Linux AMI release 2015.09 (centos based)
Machine type: r4.4xlarge
Description:
We have a high cardinality use case, as can be seen by:
This is mostly due to a specific measurement.
On 1.3.7 when running "show tag keys from measurement" it executes in a few seconds.
We tried to upgrade to 1.4.1, to do so we did the following:
Expected behavior:
Have similar performance between 1.3.7 and 1.4.1
Actual behavior:
"show tag keys from measurement" takes around 10 minutes to execute.
Tried other lower cardinality measurements on the same database and the performance looks similar to 1.3.7
Additional info:
We are trying to build a way to reproduce the issue that we can share, possibly a script that generates similar cardinality but we haven't had time to finish it yet.
The text was updated successfully, but these errors were encountered: