-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0.9.6.1 shows gaps #5214
Comments
Forgot to mention that this is our influxdb config:
|
We saw something very similar to this. You generally wont see error logs, if you are dropping. Do you see drops in the kernel (ss) - we did when we went digging. You may want to review #5201 and/or move to a nightly since yesterday. Combined with #5217 and moving our WAL onto a very, very high performing IO device we believe we have eliminated drops. |
concur with @daviesalex recommendation, especially given the sheer number of udp endpoints you have. Out of curiosity, are you using telegraf for your metrics engine or a custom generator? |
thanks. I'll give a try to nightly and increase our UDP payload on the config (we already increased our UDP buffer via sysctl). As wrt to telegraph, no, I've a perl script that pulls data from ganglia and then throws it to influxdb. It has been working fairly well, each ganglia host forks into a new process and then the metrics are sent to influxdb with a hash load-balancing algo (just based on metric name). |
Are you still seeing this with |
First off, I think this version has so far been pretty stable, congrats for that.
However since we upgraded to this version we see some temporary gaps that then automatically disappear.
As a reminder, we push metrics via UDP (we have a few UDP ports and we hash our metrics through them). I looked into our injection logs and it does not show any gap on writing (also the injection part has not changed in the last weeks), the same thing goes with the influxdb logs. The closest to a non query/http entry is:
[retention] 2015/12/23 19:14:25 retention policy shard deletion check commencing
[retention] 2015/12/23 19:14:25 retention policy enforcement check commencing
[tsm1] 2015/12/23 19:15:21 compacted 22 tsm into 1 files in 3m6.078521546s
[tsm1] 2015/12/23 19:15:21 compacting 4 TSM files
[tsm1] 2015/12/23 19:15:42 compacted 4 tsm into 1 files in 20.213362013s
[tsm1] 2015/12/23 19:16:22 compacting 3 TSM files
[tsm1] 2015/12/23 19:16:52 compacted 3 tsm into 1 files in 30.479634201s
[tsm1] 2015/12/23 19:16:52 compacting 3 TSM files
[retention] 2015/12/23 19:44:25 retention policy enforcement check commencing
[retention] 2015/12/23 19:44:25 retention policy shard deletion check commencing
Any other graphs or stats you find worthy? You have a grafana dashboard that uses _internal that I could copy and use?
The text was updated successfully, but these errors were encountered: