nsqd: clamp requeue timeout to valid range instead of dropping connection #868

tsholmes · 2017-03-24T17:46:54Z

Fixes #865

This will return an error to the client when the requested requeue timeout is higher than the MaxReqTimeout option, instead of dropping the connection.

tsholmes · 2017-03-24T17:49:37Z

Alternatively, we could fallback to a requeue timeout of 0 when it is <0, and a requeue timeout of MaxReqTimeout when it is too high.

tsholmes · 2017-03-24T18:06:06Z

ready for review

mreiferson · 2017-03-24T18:08:01Z

Alternatively, we could fallback to a requeue timeout of 0 when it is <0, and a requeue timeout of MaxReqTimeout when it is too high.

I think we should do this. It seems more desirable than not requeueing the message at all, right?

mreiferson · 2017-03-24T18:08:37Z

e.g. I sincerely doubt client libraries are written in a way that would handle non-fatal REQ failure.

tsholmes · 2017-03-24T18:08:59Z

Yeah you're probably right. I'll update it to do that

tsholmes · 2017-03-24T21:40:45Z

ready for review

mreiferson · 2017-03-24T22:11:15Z

nsqd/protocol_v2.go

@@ -712,9 +712,10 @@ func (p *protocolV2) REQ(client *clientV2, params [][]byte) ([]byte, error) {
 	}
 	timeoutDuration := time.Duration(timeoutMs) * time.Millisecond

-	if timeoutDuration < 0 || timeoutDuration > p.ctx.nsqd.getOpts().MaxReqTimeout {
-		return nil, protocol.NewFatalClientErr(nil, "E_INVALID",


I guess I'm curious now if it should still return some indication that this happened rather than silently proceeding?

We have no way of telling the client about it, since we don't want to return an error. We could log it on the server?

That could be noisy, but still seems like the best option.

Yeah it's as noisy as the previous behavior, where we would log the fatal error and the connection drop every time someone did this ¯\(ツ)/¯

@mreiferson I think hanging up at IDENTIFY and REQ is worth thinking about some more. We actively monitor client errors, whereas early de-queues are not something we'd figure out right away. I would be happy with "nsqd requeued your message, but not exactly as you requested" but there's no mechanism for that. Our use case is we have some last ditch effort REQ's in the 12h range, if nsqd was configured incorrectly we'd see these come around again in an hour. Something to consider.

@judwhite nsqd does hang up on a bad IDENTIFY. On REQ, I agree this new behavior is more correct because the previous behavior would result in the intended message timing out rather than being requeued. In most cases that's probably less desirable if the actual time-to-reprocess is important.

I think the biggest issue here is that if the client has a max-in-flight > 1, any other message that was in flight would also be timed out, which can cause problems when you want exactly (or as close as possible to) 1 successful processing of each message.

@mreiferson @tsholmes MaxReqTimeout is the maximum REQ delay the client can specify, it's not related to -msg-timeout. Is that right?

correct, it's a separate option:

$ nsqd --help ... -max-msg-timeout duration maximum duration before a message will timeout (default 15m0s) ... -max-req-timeout duration maximum requeuing timeout for a message (default 1h0m0s) ... -msg-timeout string duration to wait before auto-requeing a message (default "1m0s")

@mreiferson I think I see what you're saying. When REQ failed previously nsqd continued the message timeout and it gets requeued anyway with 0 delay (is that right?). I suppose 'it depends' if you'd rather see errors on the client or have the server override your request parameters without notification.

mreiferson

LGTM, thanks!

spruce · 2017-07-18T10:10:38Z

Is there a way to get the max-req-timeout from a client? I don't see a possibility. Is that correct? That would mean a client can't notify the user that he can't requeue / delay with such a high timeout?

ploxiln · 2017-07-18T18:50:55Z

I think that's correct.

Consider that currently, if nsqd restarts, the delays are forgotten, and messages are queued for delivery ASAP. So if you really need to process a message after a certain later time, you probably want an efficient or cached way to determine that in the consumer, and requeue for later delivery repeatedly until that time is reached (and keep in mind max attempts). Messy, but NSQD was not originally designed as a scheduler, so the requeue delay was originally intended for backoff (and not precise timing).

judwhite · 2017-09-23T13:14:30Z

Is there a way to get the max-req-timeout from a client?

@spruce http://127.0.0.1:4151/debug/pprof/cmdline if you need a dirty hack

tsholmes force-pushed the fix_requeue_drop_865 branch 2 times, most recently from 4054b65 to 2a51ac3 Compare March 24, 2017 18:01

tsholmes mentioned this pull request Mar 24, 2017

nsqd: connection closed on receiving a requeue timeout higher than the MaxReqTimeout #865

Closed

mreiferson changed the title ~~nsqd: Don't drop connection on out of range requeue timeout~~ nsqd: don't drop connection on out of range requeue timeout Mar 24, 2017

mreiferson added the bug label Mar 24, 2017

tsholmes force-pushed the fix_requeue_drop_865 branch from 2a51ac3 to 53926a7 Compare March 24, 2017 18:48

tsholmes changed the title ~~nsqd: don't drop connection on out of range requeue timeout~~ nsqd: clamp requeue timeout to valid range instead of dropping connection Mar 24, 2017

tsholmes force-pushed the fix_requeue_drop_865 branch from 53926a7 to 41b3b56 Compare March 24, 2017 21:37

mreiferson reviewed Mar 24, 2017

View reviewed changes

nsqd: clamp requeue timeout to range instead of dropping connection

315096f

tsholmes force-pushed the fix_requeue_drop_865 branch from 41b3b56 to 315096f Compare March 24, 2017 22:27

mreiferson approved these changes Mar 27, 2017

View reviewed changes

mreiferson merged commit b1e2262 into nsqio:master Mar 27, 2017

spruce mentioned this pull request Jul 18, 2017

Adding a defer parameter to publish dudleycarr/nsqjs#141

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nsqd: clamp requeue timeout to valid range instead of dropping connection #868

nsqd: clamp requeue timeout to valid range instead of dropping connection #868

tsholmes commented Mar 24, 2017

tsholmes commented Mar 24, 2017

tsholmes commented Mar 24, 2017

mreiferson commented Mar 24, 2017

mreiferson commented Mar 24, 2017

tsholmes commented Mar 24, 2017

tsholmes commented Mar 24, 2017

mreiferson Mar 24, 2017

tsholmes Mar 24, 2017

ploxiln Mar 24, 2017

tsholmes Mar 27, 2017

judwhite Mar 29, 2017

mreiferson Mar 29, 2017

tsholmes Mar 29, 2017

judwhite Mar 30, 2017

ploxiln Mar 30, 2017

judwhite Mar 30, 2017

mreiferson left a comment

spruce commented Jul 18, 2017

ploxiln commented Jul 18, 2017

judwhite commented Sep 23, 2017

nsqd: clamp requeue timeout to valid range instead of dropping connection #868

nsqd: clamp requeue timeout to valid range instead of dropping connection #868

Conversation

tsholmes commented Mar 24, 2017

tsholmes commented Mar 24, 2017

tsholmes commented Mar 24, 2017

mreiferson commented Mar 24, 2017

mreiferson commented Mar 24, 2017

tsholmes commented Mar 24, 2017

tsholmes commented Mar 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mreiferson left a comment

Choose a reason for hiding this comment

spruce commented Jul 18, 2017

ploxiln commented Jul 18, 2017

judwhite commented Sep 23, 2017