-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[0.9.5] HTTP 500 errors on insertions (after timeout) #4870
Comments
This also happens with the new storage engine, tsm1. |
@caligula1989 when the system is non-responsive to writes and queries, does the Does a restart clear the issue? |
@corylanou @rossmcdonald this sounds pretty familiar, is it another occurrence of the potential deadlock? |
A restart clears the issue for couple of hours, but it'll happen again. |
@caligula1989 Thank you for the extra information. Can you also try sending a SIGQUIT signal to the process the next time this happens, and then send us the resulting stack trace? You can send a SIGQUIT to InfluxDB by using the command:
And then capture the stack trace output using the command:
I can send you a link to an S3 bucket if it's too large to paste here. |
Believed fixed by #4913, which is available in the just-released version |
I have a cluster of production servers writing web requests to a single influxdb node.
They're doing up to 100 requests per second - not something I'd expect to crash a node.
But after the system is up for a few hours (sometimes days), the clients start getting 500 errors. The log is littered with these messages:
[http] 2015/11/21 12:42:01 10.60.195.206 - root [21/Nov/2015:12:41:46 -0600] POST /write?db=mydb HTTP/1.1 500 32 - python-requests/2.4.3 CPython/2.7.3 Linux/3.2.0-4-amd64 84aa8e82-907f-11e5-b414-000000000000 15.000491819s
When this happens, I'm also getting timeouts on continuous queries & the following error:
I'm not batching my queries, since they're web requests and I have no real way to batch them.
If useful, a typical minute looks like this:
[wal] 2015/11/21 22:17:06 Flush due to idle. Flushing 31 series with 31 points and 1544 bytes from partition 1
[wal] 2015/11/21 22:17:16 Flush due to idle. Flushing 31 series with 31 points and 1544 bytes from partition 1
[wal] 2015/11/21 22:17:26 Flush due to idle. Flushing 31 series with 31 points and 1544 bytes from partition 1
[wal] 2015/11/21 22:17:36 Flush due to idle. Flushing 31 series with 31 points and 1544 bytes from partition 1
[wal] 2015/11/21 22:17:46 Flush due to idle. Flushing 31 series with 31 points and 1544 bytes from partition 1
[wal] 2015/11/21 22:17:56 Flush due to idle. Flushing 31 series with 31 points and 1544 bytes from partition 1
The text was updated successfully, but these errors were encountered: