-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consul stop answer on 8500 port when many wait(blocking) api queries are made #8628
Comments
Hi @tantra35, Are the majority of these connections from a single client IP? If so, you may need to increase the value of See https://www.consul.io/docs/upgrading/upgrade-specific#vault-default-http_max_conns_per_client-too-low-to-run-vault-properly-1 for more info. |
@blake no, this is really bug on our opinion
goroutines
|
for first look this is the same as #8504, and very similar with issue that we got on nomad(but there problem was on server side, not on client) hashicorp/nomad#4604 |
Thank you for the report! This does sound related to #8504. I think if these connections timeout, the lack of a read timeout could cause this problem. If you are able to reproduce the problem, running https://www.consul.io/docs/commands/debug and sharing the debug output would be very helpful in debugging this problem. |
@dnephin Hm how can we call |
The |
@dnephin Ok we can provide only https://gist.github.com/tantra35/360676d27ecc15806299efa9f581da6b in our case this bug is 100% reproducible |
Can this be related to #9649 ? |
@pySilver seems that this cases may be tied, but this is not actually bug in connection subsystem, its a bug in consul throttling subsystem, for now we doens''t have a time to figure out with it but workaround is simple disable throttling:
|
I've set limits as @tantra35 recommended and there were no new errors yet. |
Overview of the Issue
After upgrade from consul 1.5.3 to 1.6.7 we found that from some time, on agents, where we making many waiting(blocking) api queries, consul agent stop answer on 8500 port. We made about 50-100 blocking queries simultaneously(not too much). Not any interesting in logs just
EOF
error on api calls and RPC communications to servers, consul doesn't crush at this moment, just simple stop answer on 8500 port, and doesn't eat CPU or memory(looks like in dead block)to demonstrate EOF erros we provide a little bit error log from buggy consul agent, other log messages are the same
We need some time to reproduce this on our test stand to provide additional info. For now we just fallback to consul 1.5.3 on client nodes
The text was updated successfully, but these errors were encountered: