-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[0.9.1] 500 errors, then out of memory, then crash #3362
Comments
@nicksellen what volume of writes were you pushing in when the database crashed? Roughly how many points per second, through what protocol, and with what batch size? |
@beckettsean There are 5 servers writing points every 5 seconds, this is how many they write each time (and thus the batch size): 219 + 1943 + 1926 + 783 + 735 = 5606 (these are slightly higher per server than the time of the crash, but not too far off). So around 1000/second, and I'm using the line protocol with ordered tags. I'm pushing in about double that now, htop tells me it's writing to disk at 90mb/s which seems high (it's only reading 135kb/s from the network). |
(sorry, make that ~400kb/s of network read, I was reading the wrong figure) |
I suspect it's on it's way to doing the same thing again, here is a simplified log https://gist.github.com/nicksellen/f5f1bd2ff0f94ba70cb5 showing " <last 2 chars of request time>" - interesting points are: 10:38:20 204 3s - first multi second request And here's a screengrab of htop. I'm going to restart it now and hopefully it comes back up ok. |
Took around 5 minutes after rebooting before it was ready to accept writes again.
Would 0.9.2-rc1 be worth upgrading to at this point? Or is the release close? |
I suspect I should remove my continuous queries too (re: #3368). |
Nick, Since removing the continuous queries (per #3368) I haven't had any problems. In fact, I am running my own down sampling queries every 15s and have not experienced any problems, so it would appear that the problem is entirely within the continuous query routines... |
It looked good for a moment :) disk writes from ~80mb/s to ~3mb/s 14:16:10 - I deleted all the continuous queries (31 of them) and the measurements they had created, and the retention policy for them at ... then the 500 errors again less than a minute later
I restarted again and after another 5 minute startup time it's happy again... so far so good... and order of magnitude better resource usage: |
I managed to recreate the problem in #3517 |
I'm pretty sure this is going to be fixed by #3569 which will be in the 0.9.3 release. It should be in a nightly build tonight. I'm closing for now, but if you can repro with that build please reopen. Thanks! |
InfluxDB v0.9.1 (git: 8b3219e74fcc3843a6f4901bdf00e905642b6bd6)
Ubuntu 14.04.2 LTS
Linux stats02.ec.everycity.co.uk 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
influxdb_0.9.1_amd64.deb
InfluxDB crashed and I'm not sure why. Starting it again caused it to recover and everything as ok again now. It would be nice to know why.
I am running default install, except for extending the continuous query recomputer previous/time. I have 32 continuous queries (1 for each measurement). I write about
Here's what happened:
fatal error: runtime: out of memory
The last error stack trace was this (go routine number looks very high...):
Here is a little sample of the latest flush log messages (after the restart) to give an idea of write volume (although this is probably only about 15/20% of intended write volume):
Thanks :)
The text was updated successfully, but these errors were encountered: