Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsqd: clamp requeue timeout to valid range instead of dropping connection #868

Merged
merged 1 commit into from
Mar 27, 2017

Conversation

tsholmes
Copy link
Contributor

Fixes #865

This will return an error to the client when the requested requeue timeout is higher than the MaxReqTimeout option, instead of dropping the connection.

@tsholmes
Copy link
Contributor Author

Alternatively, we could fallback to a requeue timeout of 0 when it is <0, and a requeue timeout of MaxReqTimeout when it is too high.

@tsholmes tsholmes force-pushed the fix_requeue_drop_865 branch 2 times, most recently from 4054b65 to 2a51ac3 Compare March 24, 2017 18:01
@tsholmes
Copy link
Contributor Author

ready for review

@mreiferson mreiferson changed the title nsqd: Don't drop connection on out of range requeue timeout nsqd: don't drop connection on out of range requeue timeout Mar 24, 2017
@mreiferson mreiferson added the bug label Mar 24, 2017
@mreiferson
Copy link
Member

Alternatively, we could fallback to a requeue timeout of 0 when it is <0, and a requeue timeout of MaxReqTimeout when it is too high.

I think we should do this. It seems more desirable than not requeueing the message at all, right?

@mreiferson
Copy link
Member

e.g. I sincerely doubt client libraries are written in a way that would handle non-fatal REQ failure.

@tsholmes
Copy link
Contributor Author

Yeah you're probably right. I'll update it to do that

@tsholmes tsholmes force-pushed the fix_requeue_drop_865 branch from 2a51ac3 to 53926a7 Compare March 24, 2017 18:48
@tsholmes tsholmes changed the title nsqd: don't drop connection on out of range requeue timeout nsqd: clamp requeue timeout to valid range instead of dropping connection Mar 24, 2017
@tsholmes tsholmes force-pushed the fix_requeue_drop_865 branch from 53926a7 to 41b3b56 Compare March 24, 2017 21:37
@tsholmes
Copy link
Contributor Author

ready for review

@@ -712,9 +712,10 @@ func (p *protocolV2) REQ(client *clientV2, params [][]byte) ([]byte, error) {
}
timeoutDuration := time.Duration(timeoutMs) * time.Millisecond

if timeoutDuration < 0 || timeoutDuration > p.ctx.nsqd.getOpts().MaxReqTimeout {
return nil, protocol.NewFatalClientErr(nil, "E_INVALID",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm curious now if it should still return some indication that this happened rather than silently proceeding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have no way of telling the client about it, since we don't want to return an error. We could log it on the server?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be noisy, but still seems like the best option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's as noisy as the previous behavior, where we would log the fatal error and the connection drop every time someone did this ¯\(ツ)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mreiferson I think hanging up at IDENTIFY and REQ is worth thinking about some more. We actively monitor client errors, whereas early de-queues are not something we'd figure out right away. I would be happy with "nsqd requeued your message, but not exactly as you requested" but there's no mechanism for that. Our use case is we have some last ditch effort REQ's in the 12h range, if nsqd was configured incorrectly we'd see these come around again in an hour. Something to consider.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@judwhite nsqd does hang up on a bad IDENTIFY. On REQ, I agree this new behavior is more correct because the previous behavior would result in the intended message timing out rather than being requeued. In most cases that's probably less desirable if the actual time-to-reprocess is important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the biggest issue here is that if the client has a max-in-flight > 1, any other message that was in flight would also be timed out, which can cause problems when you want exactly (or as close as possible to) 1 successful processing of each message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mreiferson @tsholmes MaxReqTimeout is the maximum REQ delay the client can specify, it's not related to -msg-timeout. Is that right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, it's a separate option:

$ nsqd --help
...
  -max-msg-timeout duration
    	maximum duration before a message will timeout (default 15m0s)
...
  -max-req-timeout duration
    	maximum requeuing timeout for a message (default 1h0m0s)
...
  -msg-timeout string
    	duration to wait before auto-requeing a message (default "1m0s")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mreiferson I think I see what you're saying. When REQ failed previously nsqd continued the message timeout and it gets requeued anyway with 0 delay (is that right?). I suppose 'it depends' if you'd rather see errors on the client or have the server override your request parameters without notification.

@tsholmes tsholmes force-pushed the fix_requeue_drop_865 branch from 41b3b56 to 315096f Compare March 24, 2017 22:27
Copy link
Member

@mreiferson mreiferson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@mreiferson mreiferson merged commit b1e2262 into nsqio:master Mar 27, 2017
@spruce
Copy link

spruce commented Jul 18, 2017

Is there a way to get the max-req-timeout from a client? I don't see a possibility. Is that correct? That would mean a client can't notify the user that he can't requeue / delay with such a high timeout?

@ploxiln
Copy link
Member

ploxiln commented Jul 18, 2017

I think that's correct.

Consider that currently, if nsqd restarts, the delays are forgotten, and messages are queued for delivery ASAP. So if you really need to process a message after a certain later time, you probably want an efficient or cached way to determine that in the consumer, and requeue for later delivery repeatedly until that time is reached (and keep in mind max attempts). Messy, but NSQD was not originally designed as a scheduler, so the requeue delay was originally intended for backoff (and not precise timing).

@judwhite
Copy link
Contributor

Is there a way to get the max-req-timeout from a client?

@spruce http://127.0.0.1:4151/debug/pprof/cmdline if you need a dirty hack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants