-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding timeout to alert.post and http_post #1462
Conversation
@sputnik13 This looks good, I see one issue. The Update function does not propagate a new Timeout value to existing handlers since the time out is set only when the handler is created. One possible change is to use context.WithTimeout on the request.WithContext method to apply a timeout per request so that client need not be destroyed and recreated with changes to the handler config. |
@desa Could you also review this PR? |
On it. |
@nathanielc got it, I'll take a look and push up an update |
services/httppost/service.go
Outdated
@@ -207,6 +210,7 @@ func (s *Service) Handler(c HandlerConfig, l *log.Logger) alert.Handler { | |||
endpoint: e, | |||
logger: l, | |||
headers: c.Headers, | |||
hc: &http.Client{Timeout: c.Timeout}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of using context.WithTimeout
on each request, but if you choose a different approach and use a custom http.Client
be sure to add a transport that uses ProxyFromEnvironment
.
&http.Client{
Timeout: c.Timeout,
Trasport: &http.Transport{
Proxy: http.ProxyFromEnvironment,
},
}
@nathanielc @desa sorry for the delay, seems like I had this just sitting on my computer, is this what you had in mind? |
93bbe35
to
c6c66ee
Compare
http_post.go
Outdated
req.Header.Set("Content-Type", "application/json") | ||
for k, v := range n.c.Headers { | ||
req.Header.Set(k, v) | ||
} | ||
|
||
// Set timeout | ||
ctx, cancel := context.WithTimeout(req.Context(), h.timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be hn.Timeout
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that should have been n.timeout, not sure how I missed that, could have sworn it passed local tests... pushed new change.
c6c66ee
to
ef186e8
Compare
it should be the same suite of tests. Looks like the following tests are failing in CI
```
--- FAIL: TestBatch_AlertStateChangesOnly (0.03s)
--- FAIL: TestBatch_AlertStateChangesOnlyExpired (0.02s)
--- FAIL: TestStream_HttpPost (0.01s)
--- FAIL: TestStream_HttpPostEndpoint (0.01s)
--- FAIL: TestStream_Alert (0.02s)
--- FAIL: TestStream_Alert_NoRecoveries (0.05s)
--- FAIL: TestStream_Alert_WithReset_0 (0.05s)
--- FAIL: TestStream_Alert_WithReset_1 (0.02s)
--- FAIL: TestStream_AlertDuration (0.03s)
--- FAIL: TestStream_AlertHTTPPost (0.02s)
--- FAIL: TestStream_AlertHTTPPostEndpoint (0.05s)
--- FAIL: TestStream_AlertSigma (0.02s)
--- FAIL: TestStream_AlertComplexWhere (0.02s)
--- FAIL: TestStream_AlertStateChangesOnly (0.04s)
--- FAIL: TestStream_AlertStateChangesOnlyExpired (0.01s)
--- FAIL: TestStream_AlertFlapping (0.04s)
FAIL
FAIL github.com/influxdata/kapacitor/integrations 4.798s
--- FAIL: TestServer_ListServiceTests (0.02s)
--- FAIL: TestServer_AlertHandlers (1.01s)
--- FAIL: TestServer_AlertHandlers/post-8 (0.09s)
FAIL
FAIL github.com/influxdata/kapacitor/server 20.777s
```
Michael Desa
On Sep 6 2017, at 4:57 pm, Min Pae <notifications@github.com> wrote:
it's passing tests for me locally, does the ci gate run a different set of tests than test.sh?
—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread.
|
looking in to it |
be65ab6
to
50f4762
Compare
@desa looks like tests are passing now, thanks for hint on test environment |
LGTM. |
ctx, cancel := context.WithTimeout(req.Context(), h.timeout) | ||
defer cancel() | ||
req = req.WithContext(ctx) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is repeated above as well. Since all code to get a request goes through https://github.com/sputnik13/kapacitor/blob/50f47625628a0937873ec60cda54c4b6d880ab0a/services/httppost/service.go#L50 why not move the logic to set a Timeout context on the request into that method?
The function will probably need a new parameter for the desired timeout since the source of the timeout varies, but that seems reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that's necessarily better than setting the timeout locally. While there would be less code I think clarity will suffer since the function interface for NewHTTPRequest will have to return a cancel functor for the context, but it could be misconstrued as a cancel functor for the request. I wish there was a means to set the timeout on the Request without having to pass in a context object.
If you disagree and think it's clear enough I can make the change.
0416cef
to
dba9fb9
Compare
@nathanielc @desa I've been digging in to the reason for the race condition, and it seems due to how the DefaultTransport used by the DefaultClient is using a separate goroutine to perform the actual write. When a .Do is called on the client, it ultimately ends up calling .RoundTrip on the default transport (https://github.com/golang/go/blob/master/src/net/http/transport.go#L341), which ends up writing the write request to a channel (https://github.com/golang/go/blob/master/src/net/http/transport.go#L1955), which is then handled by the writeLoop (https://github.com/golang/go/blob/master/src/net/http/transport.go#L1761) in a separate goroutine. When a request or context is cancelled, there seems to be no feedback/feedforward to this writer. Looking through the http request code, it seems like there's an expectation that ownership of the object that provides the Body is handed off to the client (https://github.com/golang/go/blob/master/src/net/http/request.go#L779) since it will attempt to close the object. Coupled with the fact that the http lib seems to do a shallow copy of the request at every opportunity, this means that ownership of the buffer has to be relinquished when it is passed to hc.Do as part of the Request. I understand why a Buffer.Reset() is performed before it is placed back on the Pool, but attempting the Reset results in the caller to hc.Do and the writeLoop potentially accessing the Buffer at the same time. I'm wondering whether this means in order to utilize a pool of buffers, a new buffer pool implementation is needed such that closing the buffer returns it to the pool, rather than requiring the caller to Pool.Get also Put it back to the pool. Since the http client lib closes the Body on completion (whether successful or error) it would ensure the buffer is returned once http client is done with it. |
Question... is the motivation for using a buffer pool to eventually put a limit on maximum memory usage or is it to avoid GC? |
this latest commit has a quick stab at a buffer pool that provides a closable buffer and uses .Close() on the buffer to return it to the pool.. this seems to resolve the race condition between kapacitor and http client |
bc93767
to
ae75528
Compare
ae75528
to
f78f6ca
Compare
Adding timeout to alert.post and http_post nodes. Currently http.DefaultClient is used, which blocks indefinitely when the endpoint accepts the connection but does not respond to the HTTP request. Switching to using a non-default http.Client with a timeout defined per node.
bufpool provided buffers result in race condition when used with a buffer user that expects to take ownership of the buffer (like http client appears to). Returning an extended version of Buffer that allows it to be closed, where the closing the buffer performs a Reset() on the buffer and returns it to the pool it came from.
f78f6ca
to
bf4cc98
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sputnik13 Sorry for the long back and forth on this one. I say we simplify and remove the use of the buffer pool entirely, (reverting the changes to the bufpool package). Then this should be good to go! 🎉
http_post.go
Outdated
@@ -160,7 +165,6 @@ func (n *HTTPPostNode) doPost(row *models.Row) int { | |||
|
|||
func (n *HTTPPostNode) postRow(row *models.Row) (*http.Response, error) { | |||
body := n.bp.Get() | |||
defer n.bp.Put(body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change can leak byte buffers from the pool. This is possible when there is an error writing to the buffer and then the early return. This isn't horrible as the pool will just create new buffers but it would be best if it were not possible.
Maybe we should just ditch the idea of a buffer pool here and use a new buffer each time. Then we can later address the GC pressure if its an issue. We should leave a comment though that using a buffer pool can cause a race when the request is canceled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, there's at least 3 places where the code may return without closing the buffer, it's too bad there's no way to cancel defers, that would make things a lot simpler
adding timeout to alert.post and http_post
@sputnik13 Thanks! |
Adding timeout to alert.post and http_post nodes. Currently
http.DefaultClient is used, which blocks indefinitely when the endpoint
accepts the connection but does not respond to the HTTP request.
Switching to using a non-default http.Client with a timeout defined per
node.
Addresses #1461
Required for all non-trivial PRs