-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vanflow: Duplicate message delivery per distinct router path #1365
Comments
This is probably a workaround for this? |
They are related @jiridanek - but I'd hesitate to call either a workaround. The flow collector should be idempotent (fix in skupper/pull/1341), and from what I understand vanflow messages should be delivered only once to each consumer with some reliability regardless of van topology (the topic brought up in this issue.) |
Hi, I'm interested in investigating this ticket. If there's any concern / issue with me investigating this ticket, please do not hesitate to let me know. Thanks, Karen |
Howdy everyone. Back at my old stuff again here, this time with a cleaner reproduction. To trigger this behavior we need a few things:
Example docker-compose router network: https://github.com/c-kruse/collector-benchmark/tree/skupper-router-1365
EDIT for posterity: I've mostly been focused on vanflow since that is the context I discovered this issue, but it appears to be a general multicast routing issue. Reproduction above circumvents vanflow entirely and uses plain multicast routing. |
Hi @Karen-Schoener do you still interested in investigating this ticket ? |
@ted-ross @kgiusti @ganeshmurthy, I wanted to check in and ask for advice on which skupper-router test I should update to verify this fix... Today, locally, I updated system_tests_multicast.py to create a mesh of 3 routers. Right now, my patched test sleeps for 1 second to wait for all multicast packets to be received. Do you prefer that I add a multicast test to system_tests_router_mesh.py? |
The sender could send, say, 10 messages with a string in body like this
On the receiving end, add this message body to a python dictionary (one dict per receiver). Check in the dict to see if a body string is already present and if it is, you have received a duplicate, the test fails. Let the test timeout (timeout defaults to 60 seconds) and when the on_timeout function is called, check to see if all the dicts have 10 keys.
My preference would be to add the test to |
I agree on putting the test in the The key is getting the test to reliably fail with the fix not in place. Waiting 60 seconds is just going to make the test really long. Using a marker-message at the end may not work because the receivers might receive the marker before receiving any duplicates. A short delay (1-5 seconds) is probably not a bad solution. |
Having very little specific technical knowledge in the area: Can we do managment commands to poll and wait until the senders or receivers have no more blocked/unsettled/undelivered messages? |
Say a receiver connected to Router A received 10 non-duplicate messages first and the 10 more duplicate messages are slightly delayed and have not entered the Router A yet, the test might falsely assume that everything is ok since the blocked/unsettled/undelivered counts will be zero at that instant (before the duplicates arrive) ? The problem here is that we will not know when all the messages and duplicates will arrive. In some slow CI environments we have seen delayed arrival of messages. That is why I thought that the default TIMEOUT (60) seconds is enough time for the test to run. |
Fix qdr_forward_multicast_CT to check valid_origins. Fixes #1365
As discussed in the Skupper Team g chat, I am observing repeated delivery of vanflow messages. This was uncovered by an unrelated skupper collector bug. Thanks for following along @ted-ross! I suspect this issue is in the router, but I didn't know where to start trying to reproduce it without the control plane.
To reproduce:
The text was updated successfully, but these errors were encountered: