Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix for bz1400609 #12107

Merged
merged 1 commit into from
Dec 14, 2016
Merged

fix for bz1400609 #12107

merged 1 commit into from
Dec 14, 2016

Conversation

rajatchopra
Copy link
Contributor

Check for validity of a recorded ip address of the node among all the addresses provided in the status. It is possible that the order is not maintained between status reports from the node. Picking the first address would cause flip-flopping of SDN destinations which trouble traffic going to the pods because of OVS reload.
https://bugzilla.redhat.com/show_bug.cgi?id=1400609

@rajatchopra
Copy link
Contributor Author

cc @knobunc @danwinship @openshift/networking

@danwinship
Copy link
Contributor

Picking the first address would cause flip-flopping of SDN destinations which trouble traffic going to the pods because of OVS reload.

FWIW, a later comment on the support case claimed that connections reliably worked when the HostSubnet record pointed to one IP, and failed when they pointed to the other. So presumably it wasn't actually the OVS reloading causing the problem, it was that the (OpenShift) router only had an (IP) route to one of the two node addresses, or something like that.

@danwinship
Copy link
Contributor

In which case this patch would mean that if it picked the wrong IP the first time, it would stick with it forever? We need to make sure the HostSubnet record gets created with the "right" IP.. should we be looking at the NodeAddress.Type field to decide which IP to use rather than just always using Addresses[0]?

@rajatchopra
Copy link
Contributor Author

So should we use the type 'internal' when available? Probably. This PR however is irrespective of what is chosen the first time around.
Can include the change to getNodeIP in another pull request.

@dcbw
Copy link
Contributor

dcbw commented Dec 2, 2016

@rajatchopra we probably do want to use "Internal" since presumably that's how the nodes would reach each other. I looked at all the cloud providers, and they all currently seem to return only one internal address. But nothing seems to preclude returning more than one... If there's no internal address, we need to fall back to Legacy.

@pravisankar
Copy link

Yes, we can use NodeInternalIP if available otherwise fallback to first one in the node address list. Some providers may not support NodeLegacyHostIP. For example, openstack provider has only NodeInternalIP or NodeExternalIP.

@danwinship
Copy link
Contributor

So should we use the type 'internal' when available? Probably. This PR however is irrespective of what is chosen the first time around.
Can include the change to getNodeIP in another pull request.

Yes we definitely want this patch, but I don't think it really fixes the customer's bug without the other patch too.

(Also, you need to run gofmt)

…ses (when there are multiple NICs to report), do not let the SDN chase it
@knobunc
Copy link
Contributor

knobunc commented Dec 12, 2016

[test]

Copy link
Contributor

@knobunc knobunc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@knobunc
Copy link
Contributor

knobunc commented Dec 13, 2016

[test] eveything flaked yesterday... retry

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to a5e26ff

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/12330/) (Base Commit: 34b4f58)

@knobunc
Copy link
Contributor

knobunc commented Dec 14, 2016

[merge]

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to a5e26ff

@openshift-bot
Copy link
Contributor

openshift-bot commented Dec 14, 2016

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/12377/) (Base Commit: 4c96ea4) (Image: devenv-rhel7_5545)

@openshift-bot openshift-bot merged commit 993395c into openshift:master Dec 14, 2016
@rajatchopra rajatchopra deleted the multi_ip branch December 16, 2016 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants