fix for bz1400609 #12107

rajatchopra · 2016-12-02T01:25:00Z

Check for validity of a recorded ip address of the node among all the addresses provided in the status. It is possible that the order is not maintained between status reports from the node. Picking the first address would cause flip-flopping of SDN destinations which trouble traffic going to the pods because of OVS reload.
https://bugzilla.redhat.com/show_bug.cgi?id=1400609

rajatchopra · 2016-12-02T01:25:22Z

cc @knobunc @danwinship @openshift/networking

danwinship · 2016-12-02T14:17:41Z

Picking the first address would cause flip-flopping of SDN destinations which trouble traffic going to the pods because of OVS reload.

FWIW, a later comment on the support case claimed that connections reliably worked when the HostSubnet record pointed to one IP, and failed when they pointed to the other. So presumably it wasn't actually the OVS reloading causing the problem, it was that the (OpenShift) router only had an (IP) route to one of the two node addresses, or something like that.

danwinship · 2016-12-02T14:25:47Z

In which case this patch would mean that if it picked the wrong IP the first time, it would stick with it forever? We need to make sure the HostSubnet record gets created with the "right" IP.. should we be looking at the NodeAddress.Type field to decide which IP to use rather than just always using Addresses[0]?

rajatchopra · 2016-12-02T20:05:36Z

So should we use the type 'internal' when available? Probably. This PR however is irrespective of what is chosen the first time around.
Can include the change to getNodeIP in another pull request.

dcbw · 2016-12-02T20:26:17Z

@rajatchopra we probably do want to use "Internal" since presumably that's how the nodes would reach each other. I looked at all the cloud providers, and they all currently seem to return only one internal address. But nothing seems to preclude returning more than one... If there's no internal address, we need to fall back to Legacy.

pravisankar · 2016-12-03T00:36:58Z

Yes, we can use NodeInternalIP if available otherwise fallback to first one in the node address list. Some providers may not support NodeLegacyHostIP. For example, openstack provider has only NodeInternalIP or NodeExternalIP.

danwinship · 2016-12-05T15:02:48Z

So should we use the type 'internal' when available? Probably. This PR however is irrespective of what is chosen the first time around.
Can include the change to getNodeIP in another pull request.

Yes we definitely want this patch, but I don't think it really fixes the customer's bug without the other patch too.

(Also, you need to run gofmt)

…ses (when there are multiple NICs to report), do not let the SDN chase it

knobunc · 2016-12-12T18:19:01Z

[test]

knobunc

LGTM

knobunc · 2016-12-13T14:51:06Z

[test] eveything flaked yesterday... retry

openshift-bot · 2016-12-13T14:53:25Z

Evaluated for origin test up to a5e26ff

openshift-bot · 2016-12-13T16:06:18Z

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/12330/) (Base Commit: 34b4f58)

knobunc · 2016-12-14T14:02:12Z

[merge]

openshift-bot · 2016-12-14T14:05:20Z

Evaluated for origin merge up to a5e26ff

openshift-bot · 2016-12-14T14:45:38Z

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/12377/) (Base Commit: 4c96ea4) (Image: devenv-rhel7_5545)

fix for bz1400609; if the node status flips on the order of ip addres…

a5e26ff

…ses (when there are multiple NICs to report), do not let the SDN chase it

rajatchopra force-pushed the multi_ip branch from 3b5c1f4 to a5e26ff Compare December 12, 2016 18:13

knobunc approved these changes Dec 12, 2016

View reviewed changes

openshift-bot merged commit 993395c into openshift:master Dec 14, 2016

rajatchopra deleted the multi_ip branch December 16, 2016 21:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix for bz1400609 #12107

fix for bz1400609 #12107

rajatchopra commented Dec 2, 2016

rajatchopra commented Dec 2, 2016

danwinship commented Dec 2, 2016

danwinship commented Dec 2, 2016

rajatchopra commented Dec 2, 2016

dcbw commented Dec 2, 2016

pravisankar commented Dec 3, 2016

danwinship commented Dec 5, 2016

knobunc commented Dec 12, 2016

knobunc left a comment

knobunc commented Dec 13, 2016

openshift-bot commented Dec 13, 2016

openshift-bot commented Dec 13, 2016

knobunc commented Dec 14, 2016

openshift-bot commented Dec 14, 2016

openshift-bot commented Dec 14, 2016 •

edited

Loading

fix for bz1400609 #12107

fix for bz1400609 #12107

Conversation

rajatchopra commented Dec 2, 2016

rajatchopra commented Dec 2, 2016

danwinship commented Dec 2, 2016

danwinship commented Dec 2, 2016

rajatchopra commented Dec 2, 2016

dcbw commented Dec 2, 2016

pravisankar commented Dec 3, 2016

danwinship commented Dec 5, 2016

knobunc commented Dec 12, 2016

knobunc left a comment

Choose a reason for hiding this comment

knobunc commented Dec 13, 2016

openshift-bot commented Dec 13, 2016

openshift-bot commented Dec 13, 2016

knobunc commented Dec 14, 2016

openshift-bot commented Dec 14, 2016

openshift-bot commented Dec 14, 2016 • edited Loading

openshift-bot commented Dec 14, 2016 •

edited

Loading