-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[publick8s] AzureAD / AKS error Authorization Failures have been detected that may affect cluster availability
over outbound IPv6 addresses
#4206
Comments
After a quick analysis: it appears that specifying a custom resource group for the public IPs of our public LBs (ref. https://github.com/jenkins-infra/kubernetes-management/blob/4f2bc39e9554bf45463e32f1c1dc69f7aa03539f/config/public-nginx-ingress_publick8s.yaml#L19) is also applied to the outbound IPs of the loadbalancer (see https://github.com/jenkins-infra/azure/blob/e8cbf0146eb1a033a7c93c65a5c7875570f3d6f3/publick8s.tf#L68-L73) when it's used as egress/outbound method. To solve these errors, we either:
|
After few more investigations, it appears that these 2 IPv6 looks like leftovers of past SNAT issues: #3908 (comment) Since we changed the cluster setup to define explicit outbound IPv6 in jenkins-infra/azure@f3ba3d8, these 2 old IPv6 are not required anymore WiP: deletion of this 2 IPv6s |
The 2 IPv6 have been disassociated from the publick8s's LB and removed |
Note: the 2 faulty IPv6s have been removed but it looks like a reconciliation needs to happen. Sounds like a weird bug in AKS :| |
Related to jenkins-infra/helpdesk#4206 This PR allows the cluster to fully manages IPs in this RG. Signed-off-by: Damien Duportal <damien.duportal@gmail.com>
Update: it is NOT an Azure issue. It is a configuration issue from us: we were using DualStack LB Kubernetes services which had this unexpected behaviors... Ref. https://kubernetes.io/docs/concepts/services-networking/dual-stack/ |
Service(s)
Azure, Other
Summary
The Azure Health Monitor (in Azure UI, browse to the AKS cluster, select 'Diagnose and solve problems' -> section 'Cluster and Control Plane Availability and Performance') alerts us about the following error in the cluster
publick8s
:We need to check:
Reproduction steps
No response
The text was updated successfully, but these errors were encountered: