-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Key space is not cleanup up by retention policy #8819
Comments
Thanks for the @ahermspacketwerk, Could you just check that this issue occurs on |
Sorry for the late response. I was out of office and it take some time until we have collected enough data in our system to hit the limit. Anyway - yes, the same problem occurs with InfluxDB v1.3.5 (git: HEAD 9d90010) |
Thanks @ahermspacketwerk. I'm treating this as a bug and we'll work on a fix for the next release. |
I'm having trouble reproducing this, and the code path followed by deleting a retention policy seems to be the correct one. Can you confirm in your logs that you're seeing lines like
If you see any of those would you be able to provide your logs? You can email them to me privately if you don't want to post them on the ticket. My email is |
Hi @e-dard I will reproduce the problem on our setup and look into the log files. Once, the problem is established, we could take a closer look at the situation. We have seen weird effects like shards that are present, but the tables queries do not return any entries. Though, without the insider knowledge it's hard to figure out what this means. Unfortunately my previous setup had to be reinstalled, which means I will need another week to come to the point we see something. |
Just an update: I was able to reproduce this quite trivially with a 1h retention policy and setting the max series to 1 or 2 series. Insert a couple of bits, wait an hour or so and then |
Great, so you don't need my huge data example. If you need anything else, let me know. |
@ahermspacketwerk OK I think I've narrowed this down. I think it's a race inside of the service that manages the labelling of expired shards, and their subsequent removal. If I'm right then I would expect the max series limit to be reset after: [retention]
check-interval = "30m0s" To clarify:
I would expect when you experience this issue if you waited another |
As I observed this, the key space was not cleaned up for days. Our software is continuously trying to write data. Even after days there are no entries. "select * ..." returns no values anymore. If this is only a delayed clean-up, I would expect at least some values. |
Fixes #8819. Previously, the process of dropping expired shards according to the retention policy duration, was managed by two independent goroutines in the retention policy service. This behaviour was introduced in #2776, at a time when there were both data and meta nodes in the OSS codebase. The idea was that only the leader meta node would run the meta data deletions in the first goroutine, and all other nodes would run the local deletions in the second goroutine. InfluxDB no longer operates in that way and so we ended up with two independent goroutines that were carrying out an action that was really dependent on each other. If the second goroutine runs before the first then it may not see the meta data changes indicating shards should be deleted and it won't delete any shards locally. Shortly after this the first goroutine will run and remove the meta data for the shard groups. This results in a situation where it looks like the shards have gone, but in fact they remain on disk (and importantly, their series within the index) until the next time the second goroutine runs. By default that's 30 minutes. In the case where the shards to be removed would have removed the last occurences of some series, then it's possible that if the database was already at its maximum series limit (or tag limit for that matter), no further new series can be inserted.
affected version: influx 1.3.1
We are trying to use influxdb in a memory limited environment. Our data scheme requires to assign a medium number (1000 to 10000) of tag values.
We already found out that this is not optimal regarding the memory usage of influxdb.
Our current attempt is to limit the number of series with max-series-per-database = 10000.
So far, it works as expected. Whenever we hit the limit, new tag values are dropped during import, which is what we try to achieve.
But the key-space of the tag values seems to be not released by the retention policies. Even when the series should have been dropped we cannot insert new entries.
As a result the database looks empty (doing a select * query) but we cannot insert new elements.
This issue can be solved by restarting influxd. After restart we can insert new tag values again.
Any idea how we can overcome this?
The text was updated successfully, but these errors were encountered: