Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Do not attempt recovery of network timeouts #517

Closed
jackc opened this issue Mar 9, 2019 · 1 comment
Closed

Proposal: Do not attempt recovery of network timeouts #517

jackc opened this issue Mar 9, 2019 · 1 comment
Milestone

Comments

@jackc
Copy link
Owner

jackc commented Mar 9, 2019

Currently, pgx uses net.SetTimeout to implement context cancellation and it attempts to recover from the network timeouts when possible.

There are three different distinct cases that are or were handled:

  1. A timeout occurs during a Write resulting in a partial write. We always consider this a fatal error and close the connection in this case as there is no way to reliably recover.
  2. A timeout occurs before or during a Write resulting in 0 bytes written. Previously, we attempted to recover from this case. However, this case still breaks a TLS connection (and worse it is not reported properly: crypto/tls: permanently broken tls.Conn should not return temporary net.Error golang/go#29971) So we now always consider this a fatal error as well.
  3. A timeout occurs during a Read. Because of the underlying message buffering it should always be safe to recover from these errors. pgx does attempt to recover from these.

Trying to recover from timeouts is a very complex and error prone part of the code. (#494, #506, ....)

I propose abandoning the effort and considering all errors on Read or Write fatal. This should have no effect on anyone using the connection pool aside from removing a source of heisenbugs. The only impacted use case I can see is using a single connection with context cancellations and needing to recover and continuing to use that connection after a cancellation. That seems to be very unusual case -- and it never could be done entirely reliably since some timeouts always were fatal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants