Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for carrier before announcing IPs via GARP/NA #301

Merged

Conversation

thom311
Copy link
Contributor

@thom311 thom311 commented May 30, 2024

This is a follow up to c241dcb.

what happens is that after sriov-cni sets up the interface, initially the interface does not have carrier. Sent GARP/IPv6 NA at that point are lost.

Instead, wait a bit until the interface has carrier.

See-also: https://issues.redhat.com/browse/OCPBUGS-30549

@wizhaoredhat
Copy link
Contributor

LGTM
@SchSeba @zeeke PTAL

@thom311 thom311 force-pushed the th/announce-ips-wait-carrier branch from 37ac7e3 to d4e80b6 Compare May 31, 2024 06:58
@thom311
Copy link
Contributor Author

thom311 commented May 31, 2024

@wizhaoredhat previous version had a bug. Should be fixed now:

diff --git a/pkg/utils/packet.go b/pkg/utils/packet.go
index 809b4ded399f..88a52bc78cc1 100644
--- a/pkg/utils/packet.go
+++ b/pkg/utils/packet.go
@@ -221,7 +221,12 @@ func WaitForCarrier(ifName string, waitTime time.Duration) bool {
               return false
          }
 
-         if linkObj.Attrs().RawFlags&unix.IFF_NO_CARRIER != 0 {
+         /* Wait for carrier, i.e. IFF_UP|IFF_RUNNING. Note that there is also
+          * IFF_LOWER_UP, but we follow iproute2 ([1]).
+          *
+          * [1] https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/tree/ip/ipaddress.c?id=f9601b10c21145f76c3d46c163bac39515ed2061#n86
+          */
+         if linkObj.Attrs().RawFlags&(unix.IFF_UP|unix.IFF_RUNNING) == (unix.IFF_UP | unix.IFF_RUNNING) {
               return true
          }
     }

@coveralls
Copy link

coveralls commented May 31, 2024

Pull Request Test Coverage Report for Build 9346881405

Details

  • 18 of 22 (81.82%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.6%) to 51.554%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/utils/packet.go 18 22 81.82%
Totals Coverage Status
Change from base Build 9250302256: 0.6%
Covered Lines: 680
Relevant Lines: 1319

💛 - Coveralls

Copy link
Member

@zeeke zeeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, consider adding a unit test for WaitForCarrier (see f168bc2)

pkg/utils/packet.go Show resolved Hide resolved
@thom311 thom311 force-pushed the th/announce-ips-wait-carrier branch from d4e80b6 to 966a219 Compare May 31, 2024 09:09
@thom311
Copy link
Contributor Author

thom311 commented May 31, 2024

Please, consider adding a unit test for WaitForCarrier (see f168bc2)

I took your patch and integrated it in the branch. How about this?

(I don't think the test is yet correct. WIP).

@thom311 thom311 force-pushed the th/announce-ips-wait-carrier branch from 966a219 to bd25374 Compare May 31, 2024 09:21
@thom311
Copy link
Contributor Author

thom311 commented May 31, 2024

(I don't think the test is yet correct. WIP).

Unit test should work now... IMO it's ready.

@SchSeba
Copy link
Collaborator

SchSeba commented May 31, 2024

Hi @thom311 the CI is still not happy can you please take a look?

@thom311
Copy link
Contributor Author

thom311 commented May 31, 2024

Hi @thom311 the CI is still not happy can you please take a look?

It says "race detected during execution of test".

Is this due to the assignment of fakeLink.RawFlags while another go routine reads it?

@wizhaoredhat
Copy link
Contributor

@adrianchiris Could you also help PTAL?

@SchSeba
Copy link
Collaborator

SchSeba commented Jun 2, 2024

Hi @thom311 yes that is the reason please take a look

WARNING: DATA RACE
Write at 0x00c000135bc8 by goroutine 49:
  github.com/k8snetworkplumbingwg/sriov-cni/pkg/utils.glob..func1.1.1()
      /home/runner/work/sriov-cni/sriov-cni/pkg/utils/packet_test.go:37 +0x4e4
  github.com/onsi/ginkgo/v2/internal.extractBodyFunction.func3()
      /root/go/pkg/mod/github.com/onsi/ginkgo/v2@v2.9.2/internal/node.go:463 +0x2e
  github.com/onsi/ginkgo/v2/internal.(*Suite).runNode.func3()
      /root/go/pkg/mod/github.com/onsi/ginkgo/v2@v2.9.2/internal/suite.go:863 +0x14b

Previous read at 0x00c000135bc8 by goroutine 50:
  github.com/k8snetworkplumbingwg/sriov-cni/pkg/utils.WaitForCarrier()
      /home/runner/work/sriov-cni/sriov-cni/pkg/utils/packet.go:225 +0xf0
  github.com/k8snetworkplumbingwg/sriov-cni/pkg/utils.glob..func1.1.1.2()
      /home/runner/work/sriov-cni/sriov-cni/pkg/utils/packet_test.go:32 +0x[44](https://github.com/k8snetworkplumbingwg/sriov-cni/actions/runs/9315930803/job/25643595721?pr=301#step:5:45)

@wizhaoredhat please first fix the unit-test so we can continue with the reviews on the PR.

@thom311 thom311 force-pushed the th/announce-ips-wait-carrier branch from bd25374 to 4779c1b Compare June 3, 2024 06:46
@thom311
Copy link
Contributor Author

thom311 commented Jun 3, 2024

Hi @thom311 yes that is the reason please take a look

Race in unit test should be fixed now. How is that.

@adrianchiris
Copy link
Contributor

adrianchiris commented Jun 3, 2024

@thom311 could you sort the PR commits? i see it now got 167 commits (openshift fork?)

zeeke and others added 3 commits June 3, 2024 09:36
Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Thomas Haller <thaller@redhat.com>
Previously, we called AnnounceIPs() right after configuring the
interface. At that point, the interface was just recently set IFF_UP,
and it might not yet have carrier. In that case, the messages will be
lost. We will need to wait a bit for carrier.

Since AnnounceIPs() is an optional performance improvement, let's first
do all the important things. Move the non-critical call to the end.

This will be interesting next, when we will do some additional waiting
for the device to have carrier. Let's not do the waiting in the middle
of the critical operations, but only at the end.

Signed-off-by: Thomas Haller <thaller@redhat.com>
@thom311 thom311 force-pushed the th/announce-ips-wait-carrier branch from 4779c1b to e9611a2 Compare June 3, 2024 07:39
@thom311
Copy link
Contributor Author

thom311 commented Jun 3, 2024

@thom311 could you sort the PR commits? i see it now got 167 commits (openshift fork?)

sorry, that was a wrong rebase.

Fixed, and addressed other comments too.

thom311 and others added 2 commits June 3, 2024 10:29
After setting up the interface, it might take a bit for kernel to detect
carrier. If we then already send the GARP/NA packets, they are lost.

Instead, wait for up to 200 msec for the interface to get carrier. This
time is chosen somewhat arbitrarily. We don't want to block the process
too long, but we also need to wait long enough, that kernel and driver
has time to detect carrier. Also, while busy waiting, sleep with an
exponential back-off time (growth factor 1.5).

Fixes: c241dcb ('Send IPv4 GARP and IPv6 Unsolicited NA in "cmdAdd"')

See-also: https://issues.redhat.com/browse/OCPBUGS-30549

Signed-off-by: Thomas Haller <thaller@redhat.com>
Co-authored-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Thomas Haller <thaller@redhat.com>
@thom311 thom311 force-pushed the th/announce-ips-wait-carrier branch from e9611a2 to fc25698 Compare June 3, 2024 08:31
Copy link
Member

@zeeke zeeke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zeeke zeeke merged commit ab67e42 into k8snetworkplumbingwg:master Jun 5, 2024
10 of 11 checks passed
@thom311 thom311 deleted the th/announce-ips-wait-carrier branch June 5, 2024 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants