-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consul connection error on port 8300 #14464
Comments
I have the same issue. |
我在k8s中使用helm部署consul集群时遇到相同的问题 |
Same here with consul 1.11.4 Seen in the changelog: Not sure though that these are related |
Just to me more complete on this, after seeing a lot of these errors, even a call to localhost:8500 just fails |
upgrade to 1.12.3 did not help, same errors across cluster of consul servers+clients |
on my side and in the contrary of @HalacliSeda 1.12.1 does also seem to have the issue even if less frequently |
Also tried latest 1.13.2 with same results :-( |
yeah, seeing these as well, intermittently - as @obourdon mentioned, this seems to be related to the timeouts/aborts that were recently added; my prior clusters don't experience these disconnects all in all, the functionality of the clusters logging these messages aren't otherwise affected, so this seems to be due to overly aggressive timeouts - there was a recent refactor around rpc timeouts + the addition of hopefully easing the timeouts resolves these errors |
@quinndiggitypolymath 看您这意思,这个[WARN]信息不影响集群的正常使用对吗? |
@quinndiggitypolymath many thanks for these very valuable infos. However there are cases where after quite a while it seems that even accessing port 8500 localy just fails as mentioned here Furthermore this does not seem "recent" as the list of impacted versions seems to to prove Could you please explain in more details what you meant by Many thanks again |
@Din-He, at least this particular message @obourdon, that sounds like the messages may be a symptom of another issue (or multiple issues) - consul has a lot of areas where things can be broken if not configured exactly right, and the logging could be better in some spots when debugging. Without knowing what your configuration is like, I would recommend adjusting the logging level https://developer.hashicorp.com/consul/docs/agent/config/config-files#log_level to
Hashicorp (I am not affiliated) supports the last 2 releases of consul https://support.hashicorp.com/hc/en-us/articles/360021185113-Support-Period-and-End-of-Life-EOL-Policy so I've been on
Essentially, if the limit is being hit, slightly increase that limit (test/record metrics before + after); if you have particularly slow request, where hitting
Increased resource usage, more sockets, more memory, more load, etc; under failure modes it could have cascading effects, all those sorts of things, on top of it taking longer to know something is wrong (if the request won't actually ever succeed, failing faster would allow retries + potentially freeing up resources). As with any change, measure before and after, and refine; if it needs |
@quinndiggitypolymath very thank you!哈哈哈 |
@Din-He, You will need a token for the node, and a policy attached to it; your policy may look along the lines of:
but refer to the following for specifics: |
I'm also having the same issue with 1.14.2. I'm playing with the rpc_client_timeout, but no luck so far. |
@quinndiggitypolymath : Can you share more about the Connect CA change that made your 1 Vault cluster : 1 Consul cluster setup stop working? I had thought that WAN federated Consul clusters could use different Vault clusters. And if they can't in your experience, that's something I'm interested in following up on. It would be preferable from a latency and resilience perspective to have a Vault cluster in the same region as the Consul cluster it acts as the Connect CA for. |
is this somehow related to issue #10603 ??? |
Seems like migrating to consul 1.14.4 fixes this issue on my side |
Yes, I confirmed 1.14.4 fixed this warning message. |
In fact, after 1 night of operations, it is drastically reduced but still present. It went down from 100-150 occurences/hour down to 1 or 2 each and every 2/3 hour (previously installed version was 1.14.3) |
Thanks @obourdon. This does seem to be a dupe of #10603 which was just closed. Please note that still occurs for agent startups which is why you likely see still see this issue and is tracked here: #15821. I'll go ahead and close this issue as there is now a separate issue tracking the agent startup WARN logs. |
Hello,
I use Consul 1.13.1
I have two server (as an example): 10.10.10.1, 10.10.10.2
I set up consul server on both.
consul.json are same on both:
{
"bind_addr": "10.10.10.1",
"client_addr": "0.0.0.0",
"datacenter": "datacenter-01",
"bootstrap_expect": 3,
"data_dir": "/var/lib/consul",
"encrypt": "",
"disable_update_check": true,
"server": true,
"ui": true,
"rejoin_after_leave": true,
"retry_join": ["10.10.10.1","10.10.10.2","......."],
"acl": {
"enabled": true,
"default_policy": "deny",
"tokens": {
"agent": ""
}
}
}
{
"bind_addr": "10.10.10.2",
"client_addr": "0.0.0.0",
"datacenter": "datacenter-01",
"bootstrap_expect": 3,
"data_dir": "/var/lib/consul",
"encrypt": "",
"disable_update_check": true,
"server": true,
"ui": true,
"rejoin_after_leave": true,
"retry_join": ["10.10.10.1","10.10.10.2","......."],
"acl": {
"enabled": true,
"default_policy": "deny",
"tokens": {
"agent": ""
}
}
}
consul members output like that:
Node Address Status Type Build Protocol DC Partition Segment
ha1 10.10.10.1:8301 alive server 1.13.1 2 datacenter-01 default
ha2 10.10.10.2:8301 alive server 1.13.1 2 datacenter-01 default
But I got an error both server like that:
[WARN] agent: [core]grpc: addrConn.createTransport failed to connect to {ha1:8300 ha1.compute.internal 0 }. Err: connection error: desc = "transport: Error while dialing dial tcp 10.10.10.2:0->10.10.10.1:8300: operation was canceled". Reconnecting...
Port 8300 used for consul service on both server. I check ports with telnet and there is no problem:
telnet 10.10.10.1 8300
Trying 10.10.10.1...
Connected to 10.10.10.1.
Escape character is '^]'.
I did not get an error in Consul 1.12.1. Is this a bug in Consul 1.13.1 ?
Thanks,
Seda
The text was updated successfully, but these errors were encountered: