-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deleting expired shards isn't thread safe #779
Comments
Sounds like this issue is related to my #747 issue and will fix it as well. can't wait it! :) |
@nichdiekuh yes that could be the cause of the issues you're seeing. That said, as of rc3 InfluxDB incorrectly drops shards when it shouldn't be which is why you hit the bug in the first place. We will release rc4 today with the fix for #769 and #774. This version should not have any problem with the benchmark you provided. I ran the benchmark last night and it wrote 40% before the machine ran out of space. |
That sounds promising! And btw: I've set the retention-sweep-period too "1000000m" for testing purposes, started my script and it's still running. Unfortunately the sweep-period cannot be set to anything like "30d" or so (influx doesn't start with other units, but that's another issue) |
Thanks for your efforts for shard expiration. Is this issue related to #767? |
Fixed by #866 |
Due to a bug fixed by #769 and c02cff2. Shards were getting dropped prematurely (using the duration instead of retention) while some user is still writing to them. Although the bug is fixed it turns out that a shard could be deleted and closed while some data is writing to it which will cause a nil pointer dereference causing the entire daemon to crash. The two locations that were identified to have race condition (there could be more) are https://github.com/influxdb/influxdb/blob/master/datastore/shard.go#L459 and https://github.com/influxdb/influxdb/blob/master/datastore/shard.go#L93
We should mark the shards as deletable until no one has reference to it and delete it, similar to the shard cache that we have in the datastore.
The text was updated successfully, but these errors were encountered: