Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding timeout to alert.post and http_post #1462

Merged
merged 3 commits into from
Oct 27, 2017

Conversation

sputnik13
Copy link
Contributor

@sputnik13 sputnik13 commented Jun 29, 2017

Adding timeout to alert.post and http_post nodes. Currently
http.DefaultClient is used, which blocks indefinitely when the endpoint
accepts the connection but does not respond to the HTTP request.
Switching to using a non-default http.Client with a timeout defined per
node.

Addresses #1461

Required for all non-trivial PRs
  • Rebased/mergable
  • Tests pass
  • CHANGELOG.md updated

@nathanielc
Copy link
Contributor

@sputnik13 This looks good, I see one issue. The Update function does not propagate a new Timeout value to existing handlers since the time out is set only when the handler is created. One possible change is to use context.WithTimeout on the request.WithContext method to apply a timeout per request so that client need not be destroyed and recreated with changes to the handler config.

@nathanielc
Copy link
Contributor

@desa Could you also review this PR?

@desa
Copy link
Contributor

desa commented Jul 24, 2017

On it.

@sputnik13
Copy link
Contributor Author

@nathanielc got it, I'll take a look and push up an update

@@ -207,6 +210,7 @@ func (s *Service) Handler(c HandlerConfig, l *log.Logger) alert.Handler {
endpoint: e,
logger: l,
headers: c.Headers,
hc: &http.Client{Timeout: c.Timeout},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of using context.WithTimeout on each request, but if you choose a different approach and use a custom http.Client be sure to add a transport that uses ProxyFromEnvironment.

&http.Client{
    Timeout: c.Timeout,
    Trasport: &http.Transport{
			Proxy: http.ProxyFromEnvironment,
		},
}

@sputnik13
Copy link
Contributor Author

@nathanielc @desa sorry for the delay, seems like I had this just sitting on my computer, is this what you had in mind?

http_post.go Outdated
req.Header.Set("Content-Type", "application/json")
for k, v := range n.c.Headers {
req.Header.Set(k, v)
}

// Set timeout
ctx, cancel := context.WithTimeout(req.Context(), h.timeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be hn.Timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that should have been n.timeout, not sure how I missed that, could have sworn it passed local tests... pushed new change.

@desa
Copy link
Contributor

desa commented Sep 7, 2017 via email

@sputnik13
Copy link
Contributor Author

looking in to it

@sputnik13 sputnik13 force-pushed the post_timeout branch 6 times, most recently from be65ab6 to 50f4762 Compare September 14, 2017 03:35
@sputnik13
Copy link
Contributor Author

@desa looks like tests are passing now, thanks for hint on test environment

@desa
Copy link
Contributor

desa commented Sep 14, 2017

LGTM.

ctx, cancel := context.WithTimeout(req.Context(), h.timeout)
defer cancel()
req = req.WithContext(ctx)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is repeated above as well. Since all code to get a request goes through https://github.com/sputnik13/kapacitor/blob/50f47625628a0937873ec60cda54c4b6d880ab0a/services/httppost/service.go#L50 why not move the logic to set a Timeout context on the request into that method?

The function will probably need a new parameter for the desired timeout since the source of the timeout varies, but that seems reasonable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that's necessarily better than setting the timeout locally. While there would be less code I think clarity will suffer since the function interface for NewHTTPRequest will have to return a cancel functor for the context, but it could be misconstrued as a cancel functor for the request. I wish there was a means to set the timeout on the Request without having to pass in a context object.

If you disagree and think it's clear enough I can make the change.

@sputnik13 sputnik13 force-pushed the post_timeout branch 7 times, most recently from 0416cef to dba9fb9 Compare September 23, 2017 17:40
@sputnik13
Copy link
Contributor Author

sputnik13 commented Sep 23, 2017

@nathanielc @desa I've been digging in to the reason for the race condition, and it seems due to how the DefaultTransport used by the DefaultClient is using a separate goroutine to perform the actual write.

When a .Do is called on the client, it ultimately ends up calling .RoundTrip on the default transport (https://github.com/golang/go/blob/master/src/net/http/transport.go#L341), which ends up writing the write request to a channel (https://github.com/golang/go/blob/master/src/net/http/transport.go#L1955), which is then handled by the writeLoop (https://github.com/golang/go/blob/master/src/net/http/transport.go#L1761) in a separate goroutine.

When a request or context is cancelled, there seems to be no feedback/feedforward to this writer. Looking through the http request code, it seems like there's an expectation that ownership of the object that provides the Body is handed off to the client (https://github.com/golang/go/blob/master/src/net/http/request.go#L779) since it will attempt to close the object. Coupled with the fact that the http lib seems to do a shallow copy of the request at every opportunity, this means that ownership of the buffer has to be relinquished when it is passed to hc.Do as part of the Request.

I understand why a Buffer.Reset() is performed before it is placed back on the Pool, but attempting the Reset results in the caller to hc.Do and the writeLoop potentially accessing the Buffer at the same time.

I'm wondering whether this means in order to utilize a pool of buffers, a new buffer pool implementation is needed such that closing the buffer returns it to the pool, rather than requiring the caller to Pool.Get also Put it back to the pool. Since the http client lib closes the Body on completion (whether successful or error) it would ensure the buffer is returned once http client is done with it.

@sputnik13
Copy link
Contributor Author

Question... is the motivation for using a buffer pool to eventually put a limit on maximum memory usage or is it to avoid GC?

@sputnik13
Copy link
Contributor Author

sputnik13 commented Sep 23, 2017

this latest commit has a quick stab at a buffer pool that provides a closable buffer and uses .Close() on the buffer to return it to the pool.. this seems to resolve the race condition between kapacitor and http client

Adding timeout to alert.post and http_post nodes.  Currently
http.DefaultClient is used, which blocks indefinitely when the endpoint
accepts the connection but does not respond to the HTTP request.
Switching to using a non-default http.Client with a timeout defined per
node.
bufpool provided buffers result in race condition when used with a
buffer user that expects to take ownership of the buffer (like http
client appears to).  Returning an extended version of Buffer that allows
it to be closed, where the closing the buffer performs a Reset() on the
buffer and returns it to the pool it came from.
Copy link
Contributor

@nathanielc nathanielc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sputnik13 Sorry for the long back and forth on this one. I say we simplify and remove the use of the buffer pool entirely, (reverting the changes to the bufpool package). Then this should be good to go! 🎉

http_post.go Outdated
@@ -160,7 +165,6 @@ func (n *HTTPPostNode) doPost(row *models.Row) int {

func (n *HTTPPostNode) postRow(row *models.Row) (*http.Response, error) {
body := n.bp.Get()
defer n.bp.Put(body)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change can leak byte buffers from the pool. This is possible when there is an error writing to the buffer and then the early return. This isn't horrible as the pool will just create new buffers but it would be best if it were not possible.

Maybe we should just ditch the idea of a buffer pool here and use a new buffer each time. Then we can later address the GC pressure if its an issue. We should leave a comment though that using a buffer pool can cause a race when the request is canceled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, there's at least 3 places where the code may return without closing the buffer, it's too bad there's no way to cancel defers, that would make things a lot simpler

@nathanielc nathanielc merged commit 598fde6 into influxdata:master Oct 27, 2017
nathanielc added a commit that referenced this pull request Oct 27, 2017
adding timeout to alert.post and http_post
@nathanielc
Copy link
Contributor

@sputnik13 Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants