Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Rework route programming to solve various problems #8393

Closed
wants to merge 46 commits into from

Conversation

fasaxc
Copy link
Member

@fasaxc fasaxc commented Jan 5, 2024

Description

Will likely break this into smaller PRs before it's ready to merge...

  • Refactor RouteTable:

    • Key kernel routes using same tuple as the kernel in a DeltaTracker. Aligning the keying means that conflicts can't occur at this layer.
    • Resolve in-table conflicts in userspace; only the winning route gets put in the tracker.
    • Figure out correct wildcarding of routes for deletion.
    • Use RouteReplace to overwrite routes. Means that, if we do conflict with a route that appears to be a non-Calico route, we'll overwrite it. On balance I think this is the best thing to do; the number one cause of such problems is that Felix has been reconfigured and the old route was a Calico route from before the reconfiguration. If we've been configured to use the same IP space as something else then that's already broken.
  • Resolve conflicts between different routing table views by adding metric/priority to each view; from lowest metric (most preferred) to highest:

    • Local workload routes.
    • Cross-subnet routes
    • VXLAN tunneled routes
    • Default "blackhole" routes for IPAM blocks.

    Routes with different metrics can co-exist in the kernel so the different views can't clobber each other, even transiently (and the behaviour during an overlay is sane).

  • Move VXLAN layer 2 handling to its own object, managed separately from the RouteTable. The L2 state is basically independent of Routes so there's no need to do it in the same object.

  • Move creation of cross-subnet routing table to main goroutine to avoid races. Fix handling of parent interface change.

Related issues/PRs

Todos

  • Tests
  • Documentation
  • Release note

Release Note

TBD

Reminder for the reviewer

Make sure that this PR has the correct labels and milestone set.

Every PR needs one docs-* label.

  • docs-pr-required: This change requires a change to the documentation that has not been completed yet.
  • docs-completed: This change has all necessary documentation completed.
  • docs-not-required: This change has no user-facing impact and requires no docs.

Every PR needs one release-note-* label.

  • release-note-required: This PR has user-facing changes. Most PRs should have this label.
  • release-note-not-required: This PR has no user-facing changes.

Other optional labels:

  • cherry-pick-candidate: This PR should be cherry-picked to an earlier release. For bug fixes only.
  • needs-operator-pr: This PR is related to install and requires a corresponding change to the operator.

Previously, there were two problems:

- The routing table was no updated to use the
  new parent.
- Even after restarting felix, the routing table
  was unable to clean up the old routes.

Fix is in two parts:

- Add a default-enabled feature flag to force
  cleanup of conflicting routes.
- Move creation of the no-encap routing table
  to the main loop and recreate it when the
  parent changes.

Also noticed that KeepVXLANDeviceInSync could
take a long time to respond to changes, add a
kick channel.

Move feature gates to the feature detector.
Avoid needing to pass yet another object to 30 places.
Much cleaner, kernel does the conflict resolution.
Covers case wehre RouteTable is created after start-of-day.
@marvin-tigera marvin-tigera added this to the Calico v3.28.0 milestone Jan 5, 2024
@marvin-tigera marvin-tigera added release-note-required Change has user-facing impact (no matter how small) docs-pr-required Change is not yet documented labels Jan 5, 2024
@fasaxc fasaxc force-pushed the route-delta-tracker branch from 6fe2c60 to b4dc142 Compare January 5, 2024 16:00
@fasaxc fasaxc added docs-not-required Docs not required for this change and removed docs-pr-required Change is not yet documented labels Jan 5, 2024
@fasaxc fasaxc force-pushed the route-delta-tracker branch 2 times, most recently from 6b2426c to a018495 Compare January 11, 2024 15:10
@fasaxc fasaxc force-pushed the route-delta-tracker branch from a018495 to f267890 Compare January 11, 2024 15:18
@fasaxc fasaxc force-pushed the route-delta-tracker branch from abf2e64 to 592ea68 Compare January 18, 2024 16:44
@fasaxc
Copy link
Member Author

fasaxc commented Jan 25, 2024

Closing in favour of #8418; I abandoned the routing metric idea so I don't think it's a good stepping stone

@fasaxc fasaxc closed this Jan 25, 2024
@marvin-tigera marvin-tigera removed release-note-required Change has user-facing impact (no matter how small) docs-not-required Docs not required for this change labels Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants