Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

determine if there are situations when io.CopyN is needed to reuse underlying TCP connection #1575

Open
itsksaurabh opened this issue Jul 16, 2020 · 9 comments · Fixed by #1576

Comments

@itsksaurabh
Copy link
Contributor

itsksaurabh commented Jul 16, 2020

Currently, the Github client only closes the response body which can be seen at https://github.com/google/go-github/blob/master/github/github.go#L557 . The resp.Body.Close() makes the connection available for reuse by others using the same http.Client. It doesn’t close the underlying HTTP or TCP connection.

According to the official Go docs:
https://golang.org/pkg/net/http/#Body

// The http Client and Transport guarantee that Body is always
// non-nil, even on responses without a body or responses with
// a zero-length body. It is the caller's responsibility to
// close Body. The default HTTP client's Transport may not
// reuse HTTP/1.x "keep-alive" TCP connections if the Body is
// not read to completion and closed.

My proposal to solve the issue:

  • Read until Response is complete (i.e. ioutil.ReadAll(resp.Body) or similar)
  • Call Body.Close()
@itsksaurabh itsksaurabh changed the title HTTP Client Improvent: Reuse Underlying TCP connection Reuse Underlying TCP connection Jul 16, 2020
@gmlewis
Copy link
Collaborator

gmlewis commented Jul 18, 2020

So that I can better understand, what kind of cases are you finding where the resp.Body is not fully read before being closed?

And let's say we decide that this is a good idea and want to pursue this... I'm assuming this leads to performance gains... what kind of approximate gains are you anticipating?

Finally, my biggest concern is... would this kind of change have any negative effects on existing clients using this package?
Is this strictly a win-win proposition?

Thank you, @itsksaurabh !

@itsksaurabh
Copy link
Contributor Author

itsksaurabh commented Jul 18, 2020

@gmlewis Thank you for reviewing this issue. I would like to answer your questions with the following explanation.

Thehttp.Get returns a response as soon as all the HTTP headers have been read. The body of the response has not been read yet. The resp.Body is a wrapper around the network connection to the server. When you read from it, it downloads the body of the response.

.Close() tells the system that you're done with the network connection. If you have not read the response body, the default http transport closes the connection. The transport can only re-use the connection if the body has been read because if it reused a connection with an unread body the next request made using that connection would receive the previous request's response.

If the response bodies aren't closed at all, each request will be sent on a separate TCP connection, and neither connection would ever be closed. If the bodies are closed without reading them, each request Dials and Closes its own TCP connection. But if you read and close the body, the second request will probably be sent on the same connection as the first, saving the time that would be used for the TCP and TLS handshakes on the second connection.

what kind of cases are you finding where the resp.Body is not fully read before being closed?

One of the good cases where you do not need the resp.Body is when you only need to use Head. Head doesn't require reading or closing the response body.

would this kind of change have any negative effects on existing clients using this package?
Is this strictly a win-win proposition?

Reading the Body is often more efficient than simply Close()ing if you're making more than one request - especially with TLS connections which are relatively expensive to create.

Thanks again!

@gmlewis
Copy link
Collaborator

gmlewis commented Jul 18, 2020

Thank you, @itsksaurabh - I understand that part... so now let me explain why I asked what I asked previously...

Question # 1:

what kind of cases are you finding where the resp.Body is not fully read before being closed?

I asked this because I'm wondering how often this client is closing the body without fully reading it? Is this happening a lot? Only rarely? Only specific endpoints? What made you come to the conclusion to open this issue in the first place?

Question # 2:

what kind of approximate gains are you anticipating?

Have you tested out your PR and find that you are experiencing much better performance? If so, can you give examples? How much performance increase are we talking about here?

Questions # 3 and 4:

would this kind of change have any negative effects on existing clients using this package?
Is this strictly a win-win proposition?

This is strictly a concern about RISK. We have thousands of users of this repo, and part of my responsibility is to NOT BREAK all the users. I have done that in the past, and believe me, it is no fun at all... yet, when potential performance increases can be made, we like to do that.

Rarely will users of this repo come back and say "Wow, that was awesome that you did X", but they can be quite vocal if we broke Y.

So I would like you to please take a moment and answer to the best of your ability, questions 1-4 above. Thank you!

@itsksaurabh
Copy link
Contributor Author

itsksaurabh commented Jul 20, 2020

@gmlewis Thank you for your response. I completely understand your concern. So, I will try my best to answer all of your questions.

Question # 1:

what kind of cases are you finding where the resp.Body is not fully read before being closed?
I asked this because I'm wondering how often this client is closing the body without fully reading it? Is this happening a lot? Only rarely? Only specific endpoints? What made you come to the conclusion to open this issue in the first place?

As mentioned earlier, one of the cases where you do not need the resp.Body is when you only need to use Head. Head doesn't require reading or closing the response body. For example, when a ping/pong service is made with the help of the lib. But things become worse when it comes to broken TCP connections and the system is highly stressed.
It is hard to predict how often it happens but chances increase with traffic that can lead to port bind errors or similar and can exceed port limits on the client-side. Another big concern is data corruption when as the HTTP Transport can only re-use the connection if the body has been read. If it reused a connection with an unread body the next request made using that connection would receive the previous request's response. It may happen to all endpoints.

Referring to the Go docs,

It is the caller's responsibility to close Body. The default HTTP client's Transport may not reuse HTTP/1.x "keep-alive" TCP connections if the Body is not read to completion and closed.

Also, a library user can't drain the Body themselves because Close() is called before passing the Response to the user.

Question # 2:

what kind of approximate gains are you anticipating?
Have you tested out your PR and find that you are experiencing much better performance? If so, can you give examples? How much performance increase are we talking about here?

Yes, I have tested the PR before the submission. I have done some tests and the performance is the same or sometimes better. Also, reliability increases with the approach and can reduce system-level errors and usage related to TCP.

Please have a look at RFC 2068 that explains how TCP connections affect CPU as well as memory and also load on HTTP servers.

Please have a look at the following quick tests that I have done with existing and my version of the lib.

Questions # 3 and 4:

would this kind of change have any negative effects on existing clients using this package?
Is this strictly a win-win proposition?

I believe there will be no negative effects on existing clients using this package as the overall design is the same and I have tested with it and getting the expected response as well. This approach will only drain the body in the defer func to make the TCP connection reusable resulting in lower fault/error rates, CPU, and memory usage. I think it is a win-win position that can be observed on the systems having high traffic and the problems that I have discussed above will be resolved with this approach.

References:

Thank You!

@dmitshur
Copy link
Member

dmitshur commented Aug 15, 2020

Thanks for reporting this issue @itsksaurabh. We've had a similar situation in the past, so I wanted to ask some questions to better understand why it has come up again.

In the past, PR #317 added similar code to this library to drain the any remaining body for a speed up. Upon investigation, we discovered that there were exactly 0 bytes being read by io.Copy(ioutil.Discard, resp.Body). A net/http developer spotted it and realized it was a problem that could be fixed in net/http itself, which was done in Go 1.7 (see https://golang.org/cl/21291) and then the unnecessary code was removed from this library in PR #847.

As the net/http API says, it should always be sufficient to read the entire response body and close it.

Do you know if there are reproducible cases when the response body isn't read completely by this library? If so, it would be very helpful to learn how many bytes are being left unread:

  • If it's exactly zero bytes (as it was last time) yet TCP connections are not reused, this may be a regression in net/http that would be great to report and investigate (so that all users of net/http don't have to work around it).

  • If there is 1 or more bytes being read by the io.CopyN(ioutil.Discard, resp.Body, ...) call, then it may mean there's a bug in how this library decodes the JSON response, or that GitHub API server is sending additional bytes that it probably shouldn't. If this is the case, this would be good to report to GitHub.

Have you had a chance to measure and see how many bytes are being left unread in situations where TCP connections are not reused? Or this PR based on a hypothetical situation that you believe may be possible?

I see #317 is mentioned as one of the references, but it's not clear to me if this has come up again because of a regression somewhere, or if this is addressing a different problem compared to last time.

I agree that the fix (in PR #1576) is safe, but I'm asking these questions to see if we can improve the solution further by fixing it closer to the root of the problem (i.e., if it turns out there's a problem in the standard library or in GitHub API). Another reason to understand what purpose that change is serving is because other people may look at this library as an example of what to do, and we want to be confident we're setting a good example for others.

Thanks.

@gmlewis
Copy link
Collaborator

gmlewis commented Aug 15, 2020

Thank you, @dmitshur ! Shall we reopen this issue to promote further discussion?

@dmitshur dmitshur reopened this Aug 16, 2020
@dmitshur
Copy link
Member

Sure, let's reopen it until we understand this better.

@dmitshur dmitshur changed the title Reuse Underlying TCP connection determine if there are situations when io.CopyN is needed to reuse underlying TCP connection Aug 16, 2020
@itsksaurabh
Copy link
Contributor Author

Thank you, @dmitshur !
I completely understand your concerns and would like to answer your queries in the best possible way.

As we can see the official doc says,

It is the caller's responsibility to close Body. The default HTTP client's Transport may not reuse HTTP/1.x "keep-alive" TCP connections if the Body is not read to completion and closed.

So I made the fix to make sure the TCP connection is reused from the Client's end. We already had this issue with some of our projects and this fix indeed solved some of the potential problems that may be faced with this lib.

Do you know if there are reproducible cases when the response body isn't read completely by this library?

As mentioned earlier, one of the cases where you do not need the resp.Body is when you only need to use Head. Head doesn't require reading or closing the response body. For example, when a ping/pong service is made with the help of the lib. But things become worse when it comes to broken TCP connections and the system is highly stressed.
It is hard to predict how often it happens but chances increase with traffic that can lead to port bind errors or similar and can exceed port limits on the client-side. Another big concern is data corruption when as the HTTP Transport can only re-use the connection if the body has been read. If it reused a connection with an unread body the next request made using that connection would receive the previous request's response. It may happen to all endpoints.

Also, a library user can't drain the Body themselves because Close() is called before passing the Response to the user.

Have you had a chance to measure and see how many bytes are being left unread in situations where TCP connections are not reused?

I will definitely investigate this and will get back soon. I would really appreciate any help or guidance from you for this investigation to understand the situation even better. :)

Thank you!

@dmitshur
Copy link
Member

dmitshur commented Oct 11, 2020

Thanks for answering. If I understand your reply correctly, you're suggesting there are no known reproducible situations when the response body isn't read completely by this library. That is good to know.

one of the cases where you do not need the resp.Body is when you only need to use Head. Head doesn't require reading or closing the response body.

I'm not sure I understand how this applies in this context. When making a HEAD request, the server isn't expected to include a response body, only the response headers. Also, most methods in this package specify the HTTP method internally and it is typically "GET".

I would really appreciate any help or guidance from you for this investigation to understand the situation even better.

One thing that's easy to check is to add some logging whenever the io.CopyN(ioutil.Discard, resp.Body, ...) call returns a non-zero number of bytes copied, and see if it ever happens. However, that wouldn't catch when the underlying TCP connection would not be reused without the forced drain—determining that would probably be more involved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants