Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS.SimpleQueueService.BatchEntryIdsNotDistinct #63

Open
benkeil opened this issue Sep 20, 2021 · 17 comments
Open

AWS.SimpleQueueService.BatchEntryIdsNotDistinct #63

benkeil opened this issue Sep 20, 2021 · 17 comments

Comments

@benkeil
Copy link

benkeil commented Sep 20, 2021

When using non FIFO queues, it happens quite often that I receive the same message more than once. That's ok during processing, but when broadway wants to acknowledge the messages (which are in the same batch) if throws an error and move the messages to the DLQ.

Could we just filter out duplicates during acknowledging?

@josevalim josevalim transferred this issue from dashbitco/broadway Sep 20, 2021
@josevalim
Copy link
Member

Hi @benkeil! I have moved this to the SQS repo. I think we can support a new option ack type, called :ack_duplicate. here are the docs: https://github.com/dashbitco/broadway_sqs/blob/master/lib/broadway_sqs/producer.ex#L28

Would you link to send a PR for that?

@benkeil
Copy link
Author

benkeil commented Sep 21, 2021

I will do

@benkeil
Copy link
Author

benkeil commented Sep 26, 2021

Not sure how to handle the configuration part. If we add the new type :ack_duplicate you can set e.g. on_success: :ack and on_failure: :ack_duplicate which makes no sense. Wouldn't it make more sense to leave the options with :ack and :noop and add a new option :ack_mode with :default and :duplicate?

@josevalim
Copy link
Member

@benkeil why do you say it makes no sense? Maybe it has no practical use but in theory there is no problem in allowing success to not ack duplicates and allow failures to ack so.

@josevalim
Copy link
Member

To give such an example, consider this:

  1. You want to always mark messages as acked
  2. Your pipeline checks for the SQS message ID and it raises if the IDs are duplicate

In this case, you want to have exactly on_success: :ack, on_failure: :ack_duplicate, because you know dupes will fail (and therefore should ack) and you expect successes to never have dupes.

@benkeil
Copy link
Author

benkeil commented Sep 26, 2021

Don't get your point. Either you want to ack or you don't. And if you want to ack you can't ack the same message id twice in the same batch, so you must skip it to not get errors. Also from a Business Logic side it makes no sense, because it's the same message.

@josevalim
Copy link
Member

I see your point. :ack_duplicate is a poor name. We can change the name but I still think it should be an option that goes alongside :ack | :noop because that's what provides most flexibility.

It is also important to notice that we can't see it is a duplicate until we ack, so it is more like :ack_and_ignore_duplicate_errors.

@benkeil
Copy link
Author

benkeil commented Sep 26, 2021

You see at the very beginning if it is a duplicate on the message id. Theoretically you can skip the processing at all.
What is the purpose of being so flexible? It is not possible to ack these messages in the same batch. You could ack the failed messages separate from the succes messages, but what if you receive the message 3 times?

@josevalim
Copy link
Member

Oh, I see. If they are duplicate in the same batch, can it be the producer could have filtered them too? In any case, I think having a separate option called “filter_duplicate_ids_in_batch” will suffice. Should it be false or true by default?

@benkeil
Copy link
Author

benkeil commented Sep 29, 2021

I think if you misconfigured the queue with a too low visibility_timeout, this could also happen.
So I would suggest activate it by default but log an info/warning and/or add the number of duplicates to the metrics.
What exactly happens if you filter them out on SQS side?

@josevalim
Copy link
Member

So I would suggest activate it by default but log an info/warning and/or add the number of duplicates to the metrics.

Let's go this approach (logging), we don't even need an option. Thanks for all the discussion so far!

@patrickberkeley
Copy link

For what it's worth, we too are running into this issue.

@matheuscumpian
Copy link

Hello guys,

I was trying to understand this problem (this issue and #49), just to see if I have understand the discussion, the idea here is to filter out the duplicate messages during the acknowledgment and log a warning on the filtered messages by default, it is it?

@josevalim
Copy link
Member

Correct! Perhaps I would make it opt-in though, so people decide if changing their settings may be better.

@matheuscumpian
Copy link

One thing that I'm thinking about is that as mentioned in issue #49, we have this advise in the SQS docs talking about deleting duplicates:

If you receive a message more than once, each time you receive it, you get a different receipt handle. You must provide the most recently received receipt handle when you request to delete the message (otherwise, the message might not be deleted).
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-queue-message-identifiers.html

Should we be concerned about that? Because if yes, I didn't found a way to get the receive timestamp of the messages, any tips?

@josevalim
Copy link
Member

@HeavyBR perhaps it needs to be explicitly requested in the :metadata field?

@ndrluis
Copy link

ndrluis commented Apr 2, 2023

I was experiencing the AWS.SimpleQueueService.BatchEntryIdsNotDistinct error multiple times and was wondering why it was happening so frequently. After investigating the issue, I realized that the problem could be due to the visibility timeout configuration being insufficient for my use case. In my particular scenario, my consumer takes around an hour to process messages, and I was having this error frequently.

After checking the AWS documentation, I found that the recommended solution is to increase the visibility timeout to a value greater than the time it takes for the consumer to process the message. So, I increased the visibility timeout to 90 minutes, and the error disappeared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants