-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.3.x no longer honors connect_timeout when server is unresponsive #450
Comments
The very specific 130-second wait is likely the time that takes the OS to run through the default six |
I'm getting the same behavior. Oddly tho on a local net where ICMP HOSTUNREACH comes back, it times out more "timely". It's still not right, but it's definitely more tolerable for complete different reasons. I also tried the OS host binary to see and it does in fact respect connect_timeout. also this unit test doesn't seem like it checks the timeout properly.
|
I missed the Unfortunately I'm correctly away from home, so that this issue has to wait another week until I can fix it. |
Thanks @larskanis! |
There is a problem: My assumption in the comment above was that the timeout handling is done in So we have to do the same in ruby-pg to make it I have to talk with the PostgreSQL people to resolve this issue. |
.. in favor of passing all hosts to libpq at once and instead adjust connect_timeout handling roughtly to how libpq handles it. The problem is that libpg aborts connecting to multiple hosts, if there's a authentication failure. But if pg imitates this behaviour, the libpq API doesn't give an exact indication, whether the connection aborted due to an authentication error or due to some other error, which continues the host iteration. So we can not distinguish between an authentication error and other types of errors, other then by the error message. But there's the next problem, that the error message is locale dependent and that when both client and server are running on Windows, the error message is often not correctly delivered, which is a known long standing PostgreSQL issue. This commit therefore changes the execution back to how multiple hosts were handled similar to pg-1.3.x, but with two fixes: 1. Multiple IP addresses to one hostname are handled correctly, (fixes ged#452) 2. and connect_timeout is handled roughly like libpq. (fixes ged#450) It's only roughly, since the timeout is not strictly per host, but per single socket event, but with a total timeout multiplied with the number-of-hosts. Exact handling of connect_timeout like libpq is only possible if we connect host-by-host.
.. in favor of passing all hosts to libpq at once and instead adjust connect_timeout handling roughtly to how libpq handles it. The problem is that libpg aborts connecting to multiple hosts, if there's a authentication failure. But if pg imitates this behaviour, the libpq API doesn't give an exact indication, whether the connection aborted due to an authentication error or due to some other error, which continues the host iteration. So we can not distinguish between an authentication error and other types of errors, other then by the error message. But there's the next problem, that the error message is locale dependent and that when both client and server are running on Windows, the error message is often not correctly delivered, which is a known long standing PostgreSQL issue. This commit therefore changes the execution back to how multiple hosts were handled similar to pg-1.3.x, but with two fixes: 1. Multiple IP addresses to one hostname are handled correctly, (fixes ged#452) 2. and connect_timeout is handled roughly like libpq. (fixes ged#450) It's only roughly, since the timeout is not strictly per host, but per single socket event, but with a total timeout multiplied with the number-of-hosts. Exact handling of connect_timeout like libpq is only possible if we connect host-by-host.
.. in favor of passing all hosts to libpq at once and instead adjust connect_timeout handling roughtly to how libpq handles it. The problem is that libpg aborts connecting to multiple hosts, if there's a authentication failure. But if pg imitates this behaviour, the libpq API doesn't give an exact indication, whether the connection aborted due to an authentication error or due to some other error, which continues the host iteration. So we can not distinguish between an authentication error and other types of errors, other then by the error message. But there's the next problem, that the error message is locale dependent and that when both client and server are running on Windows, the error message is often not correctly delivered, which is a known long standing PostgreSQL issue. This commit therefore changes the execution back to how multiple hosts were handled similar to pg-1.3.x, but with two fixes: 1. Multiple IP addresses to one hostname are handled correctly, (fixes ged#452) 2. and connect_timeout is handled roughly like libpq. (fixes ged#450) It's only roughly, since the timeout is not strictly per host, but per single socket event, but with a total timeout multiplied with the number-of-hosts. Exact handling of connect_timeout like libpq is only possible if we connect host-by-host.
.. in favor of passing all hosts to libpq at once and instead adjust connect_timeout handling roughtly to how libpq handles it. The problem is that libpg aborts connecting to multiple hosts, if there's a authentication failure. But if pg imitates this behaviour, the libpq API doesn't give an exact indication, whether the connection aborted due to an authentication error or due to some other error, which continues the host iteration. So we can not distinguish between an authentication error and other types of errors, other then by the error message. But there's the next problem, that the error message is locale dependent and that when both client and server are running on Windows, the error message is often not correctly delivered, which is a known long standing PostgreSQL issue. This commit therefore changes the execution back to how multiple hosts were handled similar to pg-1.3.x, but with two fixes: 1. Multiple IP addresses to one hostname are handled correctly, (fixes ged#452) 2. and connect_timeout is handled roughly like libpq. (fixes ged#450) It's only roughly, since the timeout is not strictly per host, but per single socket event, but with a total timeout multiplied with the number-of-hosts. Exact handling of connect_timeout like libpq is only possible if we connect host-by-host.
Hello,
We encountered an issue where version
1.3.4
would block almost exactly 130 seconds before timing out if the server is unresponsive. This behavior was not present in1.2.3
.I created a docker-compose stack that demonstrates this issue but here is some output first before sharing the code
For a connection string that looks like this
postgres://main_user:@pg:5432/main_db?connect_timeout=5
We get
All this time is spent waiting on this line
How to reproduce
You will need docker for that
test.rb
anddocker-compose.yml
In one terminal run
docker-compose up
In another terminal run
docker-compose exec --user root pg bash -c "apt-get update && apt-get install iptables -y"
Then run
docker-compose exec --privileged --user root pg iptables -A INPUT -p tcp --destination-port 5432 -j DROP
to render the server unresponsive.1.2.3 times out after 5 seconds, honoring the
connect_timeout
parameter1.3.4 times out after 130 seconds
The text was updated successfully, but these errors were encountered: