Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TAPA queries NSM to re-use connection if possible #444

Merged
merged 2 commits into from
Jul 19, 2023
Merged

Conversation

zolug
Copy link
Collaborator

@zolug zolug commented Jul 18, 2023

Description

TAPA uses a deterministic NSM Connection/Segment ID that does not change if connecting to [Conduit, Trench] tuple with the same names from the same application POD.

var nsName string = conduit.GetNetworkServiceNameWithProxy(c.Conduit.GetName(), c.Conduit.GetTrench().GetName(), c.Namespace)
var id string = fmt.Sprintf("%s-%s-%d", c.TargetName, nsName, 0)

In case NSM connection Close() fails for example on the TAPA side for some reason (or the TAPA container crashes), then requesting a new connection with the same Connection ID will create a second connection in NSM.
Unfortunately, once the "old" connection's token lifetime expires, and NSM orders clean-up, it will trigger a heal event in TAPA because of the shared Connection ID.
This will lead to unwanted NSM connection reconnect. Also, in general having multiple NSM connections that share the same TAPA connection/segment ID is problematic, since IPAM uses that ID as key to track who owns the assigned IPs.

fix:
TAPA has been improved to query NSMgr checking if it is aware of a Connection with a particular ID. If it is, the TAPA will re-use that Connection to request a NSM connection towards a particular Conduit. This shall update the connection maintained in NSM, meaning no unexpected reconnect due to duplicated connections.

Issue link

Checklist

  • Purpose
    • Bug fix
    • New functionality
    • Documentation
    • Refactoring
    • CI
  • Test
    • Unit test
    • E2E Test
    • Tested manually
  • Introduce a breaking change
    • Yes (description required)
    • No

zolug added 2 commits July 18, 2023 16:44
Before connecting a Conduit check NSM if it is aware of
a connection with the same ID. If it is, re-use that to
request the NSM connection, to avoid creating a "duplicated"
one (thus avoid interference due to token expiration, heal).
Increase default interfacename cache timeout to 10 minutes
to avoid for example slow Close() related issues.
@zolug zolug self-assigned this Jul 18, 2023
@zolug zolug changed the title TAPA to fetch connection from NSM to re-use for connecting TAPA queries NSM to re-use connection if possible Jul 18, 2023
@zolug zolug requested a review from LionelJouin July 18, 2023 15:50
@zolug zolug merged commit 35df83c into master Jul 19, 2023
@zolug zolug deleted the ezollug-conn-mon branch July 31, 2023 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants