-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data loss during high ingest rate #7330
Comments
If subscription writes are being dropped (which it sounds like they are in your case), you should see the
You might be able to reduce subscription write drops by adjusting your batch size while keeping an eye on that stat. |
Negative:
( |
Anything further on this? Data loss is kinda a big deal. |
@phemmer That is definitely not expected if the subWriteDrop count is 0. Are there any errors in the InfluxDB logs? |
No errors. |
I was able to reproduce this. The sends into the |
While I'm not sure what the fix here is going to be, can we also make the max sizes of any chan buffers involved configurable. And also provide a way of reporting metrics on the sizes of those chan buffers. |
The fix is to increase the number of readers processing points from the channel. |
The subscriber write goroutine would drop points if the write load was higher than it could process. This could happen with a just a few writers to the server. Instead, process the channel with multiple writers to avoid dropping writes so easily. This also adds some config options to control how large the channel buffer is as well as how many goroutines are started. Fixes #7330
Fixed via #7407 |
Bug report
System info:
influxdb 1.0
Linux & MacOS
Steps to reproduce:
influx
shell:create database test'
influx
shell:create subscription "sub0" on "test"."autogen" destinations all 'http://localhost:8087'
for ((i=0; i<30; i++)); do ( for ((n=0; n<300; n++)); do echo "insert test,foo=bar value=${i}i"; done | influx -database test ) & done; wait
influx -database test
shell:select sum(value) from test
influx -database test -port 8087
shell:select sum(value) from test
Expected behavior:
130500
130500
Actual behavior:
130488
20188
Additional info:
The first number is only sometimes off, and only by a small amount.
The second number (from the second influxdb) is always off, by a huge amount.
This is causing a problem for me as kapacitor is missing large amounts of data.
The text was updated successfully, but these errors were encountered: