Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helm: Restore the ability to start a cluster in conformance mode by disabling the cilium ipmasq agent when in conformance mode #3062

Merged
merged 1 commit into from
May 8, 2024

Conversation

3u13r
Copy link
Member

@3u13r 3u13r commented May 3, 2024

Context

Proposed change(s)

  • disable Cilium's ip masq agent when running in conformance mode since the ip masq agent requires bpf masquerading

I tested this on AWS.

Checklist

  • Add labels (e.g., for changelog category)
  • Is PR title adequate for changelog?
  • Link to Milestone

@3u13r 3u13r added the bug fix Fixing a bug label May 3, 2024
@3u13r 3u13r added this to the v2.17.0 milestone May 3, 2024
@3u13r 3u13r requested review from burgerdev and miampf May 3, 2024 12:43
@3u13r 3u13r requested a review from derpsteb as a code owner May 3, 2024 12:43
Copy link

netlify bot commented May 3, 2024

Deploy Preview for constellation-docs canceled.

Name Link
🔨 Latest commit 9f52d3c
🔍 Latest deploy log https://app.netlify.com/sites/constellation-docs/deploys/663b9b064ae2200008f993df

@miampf
Copy link
Contributor

miampf commented May 3, 2024

I currently get the following error when trying to set up a cluster on GCP: Error: applying Helm charts: applying constellation-operators: helm install: release constellation-operators failed, and has been uninstalled due to atomic being set: client rate limiter Wait returned an error: context deadline exceeded

@burgerdev
Copy link
Contributor

It looks like you're not only disabling the ip-masq-agent, but also the entire eBPF service resolution mechanism (kubeProxyReplacement) - could you please document why this is necessary?

@3u13r
Copy link
Member Author

3u13r commented May 5, 2024

could you please document why this is necessary?

What kind of documentation do you want? Do you want a user-facing documentation of the --conformance flag?
We need to disable kube-proxy replacement since we must use the portmap feature of Cilium go gain conformance regarding the differentiation of UDP and TCP traffic on the same port.
The idea is that nobody requires (or maybe even uses) that feature since upstream Cilium in their default configuration is also not compliant. They want to fix this eventually (since ~2 years or so). There have been recent efforts which were abandoned even more recently.

@3u13r
Copy link
Member Author

3u13r commented May 5, 2024

I currently get the following error when trying to set up a cluster on GCP:

Hm, I don't get any error when initializing a Constellation on gcp.

@burgerdev
Copy link
Contributor

We need to disable kube-proxy replacement since we must use theportmap feature of Cilium go gain conformance regarding the differentiation of UDP and TCP traffic on the same port.

This kind of documentation, as.a comment right here where we set it. Future readers may be interested in the reason for the different configuration in conformance mode, or whether/when it's ok to change it back.

I still don't understand why we need to go from partial replacement to no replacement, unless the previous configuration (minus ip-masq-agent) never worked to begin with.

@3u13r
Copy link
Member Author

3u13r commented May 6, 2024

I still don't understand why we need to go from partial replacement to no replacement,

Using partial is deprecated and using false is essentially the partial option now, see: https://docs.cilium.io/en/stable/network/kubernetes/kubeproxy-free/#kube-proxy-hybrid-modes

comment right here where we set it.

Will do hopefully later today.

@burgerdev
Copy link
Contributor

I deployed this branch on GCP without problems.

@burgerdev
Copy link
Contributor

However, the conformance test did not pass:

   - name: '[It] [sig-network] Services should serve endpoints on same port and different
        protocols [Conformance]'
      status: failed

@3u13r
Copy link
Member Author

3u13r commented May 8, 2024

Got a successful run with K8s 1.28 on Azure.

@3u13r 3u13r force-pushed the fix/networking/conformance-masq branch from e6d3905 to 4597d68 Compare May 8, 2024 14:48
Copy link
Contributor

@burgerdev burgerdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks for the detailed warning!

@3u13r 3u13r force-pushed the fix/networking/conformance-masq branch from 4597d68 to 9f52d3c Compare May 8, 2024 15:32
Copy link
Contributor

github-actions bot commented May 8, 2024

Coverage report

Package Old New Trend
internal/constellation/helm 33.60% 33.80% ↗️

@3u13r 3u13r merged commit 0325483 into main May 8, 2024
8 checks passed
@3u13r 3u13r deleted the fix/networking/conformance-masq branch May 8, 2024 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fix Fixing a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants