-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[0.10] Series data loss under high write load #5719
Comments
@madushan1000 Does all of the data in the database disappear after the compaction? Or is there still some data retained? |
@rossmcdonald depends on the insert rate. If I keep the same insert rate for some time, then all the data is lost. But if I change the insert rate somewhere before the data loss, data came with the old insert rate are retained. |
@madushan1000 That's very strange. Do you know if the same issue occurs with a single-node setup, or have you only tested on a cluster? Also, how are you distributing writes/reads? Are you writing and reading from the same instance, different instance, or is it in a round-robin fashion? |
@madushan1000 If you could see if you can reproduce using a single-node, that would help narrow down where the problem might be. |
So I tried a single node test on my i5 processor, 8GB RAM, SSD storage MacBook Pro with influxdb 0.10 from homebrew commit |
How can I run the script your running to reproduce this? |
I've pushed the complete code into my github https://github.com/madushan1000/influx-test, Clone it, do a npm install and then npm start. |
The cache had some incorrect logic for determine when a series needed to be deduplicated. The logic was checking for unsorted points and not considering duplicate points. This would manifest itself as many points (duplicate) points being returned from the cache and after a snapshot compaction run, the points would disappear because snapshot compaction always deduplicates and sorts the points. Added a test that reproduces the issue. Fixes #5719
@madushan1000 Thanks for pushing your repo up. I was able to reproduce it locally and I have fix in #5751 for it. There are two things going on here that I can see:
For #1, you'll need to ensure the timestamps are unique for each point in the batch for a given series. |
Glad I could be of help. |
The cache had some incorrect logic for determine when a series needed to be deduplicated. The logic was checking for unsorted points and not considering duplicate points. This would manifest itself as many points (duplicate) points being returned from the cache and after a snapshot compaction run, the points would disappear because snapshot compaction always deduplicates and sorts the points. Added a test that reproduces the issue. Fixes #5719
The cache had some incorrect logic for determine when a series needed to be deduplicated. The logic was checking for unsorted points and not considering duplicate points. This would manifest itself as many points (duplicate) points being returned from the cache and after a snapshot compaction run, the points would disappear because snapshot compaction always deduplicates and sorts the points. Added a test that reproduces the issue. Fixes influxdata#5719
The cache had some incorrect logic for determine when a series needed to be deduplicated. The logic was checking for unsorted points and not considering duplicate points. This would manifest itself as many points (duplicate) points being returned from the cache and after a snapshot compaction run, the points would disappear because snapshot compaction always deduplicates and sorts the points. Added a test that reproduces the issue. Fixes influxdata#5719 Rebased-to-0.10.x-by: Jon Seymour <jon@wildducktheories.com>
This series re-rolls the fixes on influxdata#5719, influxdata#5699, influxdata#5832 without any other changes from 0.11.0 onto 0.10.1 for the purpose of addressing issue influxdata#5857. Signed-off-by: Jon Seymour <jon@wildducktheories.com>
I'm testing influxdb v0.10 for our production use.
I wrote this (https://gist.github.com/madushan1000/7d4993dc19a24a01eb84) node script using node-influx (with es6 and babel-polyfill). Which basically batch 10000(or 1000) documents into a one write and iterates forever. The write rate is 14,000points/s -- 25,000points/s
The problem I have is all the data in the messurment vanishes after it reaches a certain amount of writes (about 10 writes using 10000 batches, about 2500 writes using 1000 batches). I only have the
default
retention ploicy which I created manually running,Furthermore looking at the influxdb logs I disovered something like this can be seen near the time of data loss.
or
This setup is a influxdb cluster which have 2 data nodes and 3 meta nodes. All the servers run on
Ubuntu 14.04
and I installed influxdb from the prebuild packages following the offical documantation for v0.10. Both data nodes have 500GB SSD drives mounted formatted with EXT4 mounted at/var/lib/influxdb
Here is a sample config file. I first tried
influxd config
output. After it gave the same issue, I tried this one came with the distribution. I tweaked around a little but the issue was same for every config I used. Only hostnames anddata.Enabled
change for each node.Is this write rate is too high? Even if it is, why does the data disappear? Shouldn't be some portion of the data left?
The text was updated successfully, but these errors were encountered: