-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nsqd: clamp requeue timeout to valid range instead of dropping connection #868
Conversation
Alternatively, we could fallback to a requeue timeout of 0 when it is <0, and a requeue timeout of |
4054b65
to
2a51ac3
Compare
ready for review |
I think we should do this. It seems more desirable than not requeueing the message at all, right? |
e.g. I sincerely doubt client libraries are written in a way that would handle non-fatal REQ failure. |
Yeah you're probably right. I'll update it to do that |
2a51ac3
to
53926a7
Compare
53926a7
to
41b3b56
Compare
ready for review |
@@ -712,9 +712,10 @@ func (p *protocolV2) REQ(client *clientV2, params [][]byte) ([]byte, error) { | |||
} | |||
timeoutDuration := time.Duration(timeoutMs) * time.Millisecond | |||
|
|||
if timeoutDuration < 0 || timeoutDuration > p.ctx.nsqd.getOpts().MaxReqTimeout { | |||
return nil, protocol.NewFatalClientErr(nil, "E_INVALID", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm curious now if it should still return some indication that this happened rather than silently proceeding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have no way of telling the client about it, since we don't want to return an error. We could log it on the server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be noisy, but still seems like the best option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's as noisy as the previous behavior, where we would log the fatal error and the connection drop every time someone did this ¯\(ツ)/¯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mreiferson I think hanging up at IDENTIFY
and REQ
is worth thinking about some more. We actively monitor client errors, whereas early de-queues are not something we'd figure out right away. I would be happy with "nsqd
requeued your message, but not exactly as you requested" but there's no mechanism for that. Our use case is we have some last ditch effort REQ
's in the 12h
range, if nsqd
was configured incorrectly we'd see these come around again in an hour. Something to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@judwhite nsqd
does hang up on a bad IDENTIFY
. On REQ
, I agree this new behavior is more correct because the previous behavior would result in the intended message timing out rather than being requeued. In most cases that's probably less desirable if the actual time-to-reprocess is important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the biggest issue here is that if the client has a max-in-flight > 1, any other message that was in flight would also be timed out, which can cause problems when you want exactly (or as close as possible to) 1 successful processing of each message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mreiferson @tsholmes MaxReqTimeout
is the maximum REQ
delay the client can specify, it's not related to -msg-timeout
. Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct, it's a separate option:
$ nsqd --help
...
-max-msg-timeout duration
maximum duration before a message will timeout (default 15m0s)
...
-max-req-timeout duration
maximum requeuing timeout for a message (default 1h0m0s)
...
-msg-timeout string
duration to wait before auto-requeing a message (default "1m0s")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mreiferson I think I see what you're saying. When REQ
failed previously nsqd
continued the message timeout and it gets requeued anyway with 0 delay (is that right?). I suppose 'it depends' if you'd rather see errors on the client or have the server override your request parameters without notification.
41b3b56
to
315096f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Is there a way to get the max-req-timeout from a client? I don't see a possibility. Is that correct? That would mean a client can't notify the user that he can't requeue / delay with such a high timeout? |
I think that's correct. Consider that currently, if nsqd restarts, the delays are forgotten, and messages are queued for delivery ASAP. So if you really need to process a message after a certain later time, you probably want an efficient or cached way to determine that in the consumer, and requeue for later delivery repeatedly until that time is reached (and keep in mind max attempts). Messy, but NSQD was not originally designed as a scheduler, so the requeue delay was originally intended for backoff (and not precise timing). |
@spruce http://127.0.0.1:4151/debug/pprof/cmdline if you need a dirty hack |
Fixes #865
This will return an error to the client when the requested requeue timeout is higher than the
MaxReqTimeout
option, instead of dropping the connection.