-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod to service connectivity issues on August and September cumulative updates on Windows Server 2019 #61
Comments
The ETA for the fixing patch is next week. For those impacted users that need a (test-signed!) fix ahead of next week, please reach out to your Microsoft customer support contact to request such. |
when you say pause kube-proxy what are you meaning there. Pause just puts up a prompt to press any key to continue. does mean reboot the system? |
What I mean, is to stop the kube-proxy such that no HNS policies are being re-created after you ran |
KB4577668 has been released on October 13, 2020 (OS Build 17763.1518). |
No, this still has the issue. The next KB should have the fix. ETA is October 20th |
Hi again, At page (https://support.microsoft.com/en-us/help/4580390/windows-10-update-kb4580390) it marked as Preview. Is it final KB or we need to wait another one? |
For the "preview" naming please see here: It is the final production KB but optional, meaning users need to seek it out. It will also be included in the next month's "B" release update, hence the name "preview"... This KB contains the fix for this issue. Can you try it out and confirm whether the issue still reproduces or not? |
I confirm that my nodes that were updated to 10.0.17763.1554 have no pod to services connectivity issue. |
I am sorry about my previous comment but the issues does exist. More details: kubernetes-sigs/sig-windows-tools#127 |
Build 17763.1577 also doesn't work. |
@vitaliy-leschenko Can you please try out some of the troubleshooting steps here, and give us output of collectlogs.ps1 script: Specifically, can you reproduce the output shown in example #1 and example #3 in the above doc? Do all the Windows nodes have the same patch status 17763.1577? |
Ok. I will try |
I tried test Windows Server 1809 with the latest updates installed. Version: 10.0.17763.1637. It has issue with pod to service connectivity.
Can't test
Currently we have situation:
I created sample to reproduce the issue: https://vitaliyorgstorage.azureedge.net/9d4196ab-1c3c-4efb-b065-b643384f1832/github/sample.yaml
|
Logs for |
Sorry for the delay here. What do the logs look like from the problematic node? Is FlannelD running on problematic node? Also, what do FlannelD logs look like on problematic node? There should be entries such as
|
FlannelD is running. Pods can communicate vis IP addresses. There is flannel log from another cluster with the same issue:
|
When we setup flannel as host-gw we can see I think we have issue with kube-proxy because it works as service load-balancer. |
This sounds to me like pod-pod issue across nodes. The error in the FlannelD logs also sound like there could be a misconfiguration on different node - do you still have any nodes attached in the cluster that are configured in |
I have no nodes in |
This issue has been open for 30 days with no updates. |
Sorry for the long delay. In theory, DNS relies on service connectivity so it is surprising to see the two statements:
There is a relevant fix that came out on February. Can you try to update to latest version? and then provide the following on the problematic node:
|
Any updates on this thread? Otherwise we'll go ahead and close it. |
In Kubernetes on Windows Server, DECAP_IN VFP layer gets dropped on 8C and 9B cumulative updates on Windows Server 2019 when HNS service gets restarted. That may cause pod -> service traffic to fail in some cases and configurations.
If a Windows Server 2019 machine needs to be restarted, a workaround to try out is to:
This regression will be resolved in the cumulative update being released in 3rd week of October. This issue will not surface on Windows Server 1903 and above.
The text was updated successfully, but these errors were encountered: