Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a rebind to default driver as a w/a #233

Merged
merged 2 commits into from
Feb 10, 2022

Conversation

SchSeba
Copy link
Collaborator

@SchSeba SchSeba commented Jan 20, 2022

This commit add a w/a to an issue observed on intel nics where not all the vfs are created.

Test:

cat /tmp/2.sh
set -ex

while :
do
	echo 5 > /sys/bus/pci/devices/0000\:86\:00.0/sriov_numvfs
	sleep 4
	VFS=`ip l 2>/dev/null | grep ens7f0v | wc -l`
	if [[ $VFS != 5 ]]; then
		echo "bug!"
		sleep INF
	fi
	echo 0 > /sys/bus/pci/devices/0000\:86\:00.0/sriov_numvfs
	sleep 2
done

/tmp/2.sh
++ :
++ echo 5
++ sleep 4
+++ ip l
+++ grep ens7f0v
+++ wc -l
++ VFS=5
++ [[ 5 != 5 ]]
++ echo 0
++ sleep 2
++ :
++ echo 5
++ sleep 4
+++ ip l
+++ grep ens7f0v
+++ wc -l
++ VFS=5
++ [[ 5 != 5 ]]
++ echo 0
++ sleep 2
++ :
++ echo 5
++ sleep 4
+++ ip l
+++ grep ens7f0v
+++ wc -l
++ VFS=4
++ [[ 4 != 5 ]]
++ echo 'bug!'
bug!
++ sleep INF

d8:00.0 Ethernet controller: Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
d8:00.1 Ethernet controller: Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
d8:02.0 Ethernet controller: Intel Corporation Ethernet Virtual Function 700 Series (rev 02)
d8:02.1 Ethernet controller: Intel Corporation Ethernet Virtual Function 700 Series (rev 02)
d8:02.2 Ethernet controller: Intel Corporation Ethernet Virtual Function 700 Series (rev 02)
d8:02.3 Ethernet controller: Intel Corporation Ethernet Virtual Function 700 Series (rev 02)
d8:02.4 Ethernet controller: Intel Corporation Ethernet Virtual Function 700 Series (rev 02)
[root@cnfdt14 core]# lspci -v -mm -nn -k -s d8:00.0
Slot:	d8:00.0
Class:	Ethernet controller [0200]
Vendor:	Intel Corporation [8086]
Device:	Ethernet Controller XXV710 for 25GbE SFP28 [158b]
SVendor:	Intel Corporation [8086]
SDevice:	Ethernet 25G 2P XXV710 Adapter [0009]
Rev:	02
Driver:	i40e
Module:	i40e
NUMANode:	1

ls -la /sys/bus/pci/devices/0000\:3b\:02.1/net/
ls: cannot access '/sys/bus/pci/devices/0000:3b:02.1/net/': No such file or directory

echo "0000:3b:02.1" > /sys/bus/pci/drivers/iavf/unbind
echo "0000:3b:02.1" > /sys/bus/pci/drivers/iavf/bind

[  336.586302] pci 0000:3b:02.1: [8086:154c] type 00 class 0x020000
[  336.592352] pci 0000:3b:02.1: enabling Extended Tags
[  336.597653] pci 0000:3b:02.1: Adding to iommu group 156
[  336.622048] iavf 0000:3b:02.1: enabling device (0000 -> 0002)
[  336.716570] iavf 0000:3b:02.1: Device is still in reset (-16), retrying
[  337.839372] iavf 0000:3b:02.1: Invalid MAC address 00:00:00:00:00:00, using random
[  337.848651] iavf 0000:3b:02.1: Multiqueue Enabled: Queue pair count = 4
[  337.965036] iavf 0000:3b:02.1: MAC address: 92:67:37:fd:42:25
[  337.972234] iavf 0000:3b:02.1: GRO is enabled
[  338.004527] iavf 0000:3b:02.1 ens1f0v1: renamed from eth0
[  338.195468] iavf 0000:3b:02.1: Reset warning received from the PF
[  338.211038] iavf 0000:3b:02.1: Scheduling reset task
[  353.590034] pci 0000:3b:02.1: Removing from iommu group 156
[  366.167547] pci 0000:3b:02.1: [8086:154c] type 00 class 0x020000
[  366.174623] pci 0000:3b:02.1: enabling Extended Tags
[  366.180037] pci 0000:3b:02.1: Adding to iommu group 156
[  366.185485] iavf 0000:3b:02.1: enabling device (0000 -> 0002)
[  366.265071] iavf 0000:3b:02.1: Device is still in reset (-16), retrying
[ 1255.037432] iavf 0000:3b:02.1: enabling device (0000 -> 0002)
[ 1255.110439] iavf 0000:3b:02.1: Invalid MAC address 00:00:00:00:00:00, using random
[ 1255.118153] iavf 0000:3b:02.1: Multiqueue Enabled: Queue pair count = 4
[ 1255.125217] iavf 0000:3b:02.1: MAC address: d2:73:2c:3c:ed:f8
[ 1255.130984] iavf 0000:3b:02.1: GRO is enabled
[ 1255.135615] iavf 0000:3b:02.1 ens1f0v1: renamed from eth0
[ 1255.792238] iavf 0000:3b:02.1: Reset warning received from the PF
[ 1255.798351] iavf 0000:3b:02.1: Scheduling reset task
[ 1268.465116] iavf 0000:3b:02.1: Reset warning received from the PF
[ 1268.479921] iavf 0000:3b:02.1: Scheduling reset task

We are working with kernel developers to find and fix the issue.

This w/a will give us more time without impacting the sriov operator

Signed-off-by: Sebastian Sch sebassch@gmail.com

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@SchSeba
Copy link
Collaborator Author

SchSeba commented Jan 20, 2022

/hold

Running some validations

@github-actions github-actions bot added the hold label Jan 20, 2022
return err
}

glog.Errorf("setVfAdminMac(): workaround implement for VF %s", vfAddr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a glog.Info ?
What about moving this log to the beginning of the Unbind call?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will like to have it at least as a warning so it will popup in the logs

glog.Errorf("setVfAdminMac(): VF link is not ready for device %+v %q", vfAddr, err)
return err
// TODO: remove workaround after intel fix driver issue
if err := Unbind(vfAddr); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the VF doesn't exist, can we still use its pciAddr to unbind and rebind the driver? I think we did that before, just want to confirm.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right :)

@adrianchiris
Copy link
Collaborator

adrianchiris commented Jan 25, 2022

is this related to : https://bugzilla.redhat.com/show_bug.cgi?id=1875338 ?

i see there was some WA which was introduced and reverted around rebinding VF driver

#204

also commit message should explain the issue but avoid IMO console bash output

@@ -565,8 +565,23 @@ func setVfAdminMac(vfAddr string, pfLink netlink.Link) error {
}
vfLink, err := vfIsReady(vfAddr)
if err != nil {
glog.Errorf("setVfAdminMac(): VF link is not ready for device %+v %q", vfAddr, err)
return err
// TODO: remove workaround after intel fix driver issue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this bit be moved to a separate function ?
with comment explaining the issue and link to bz maybe ?

"intel fix driver issue" is a bit generic to me :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done I also update the commit message and remove all the console outputs

@SchSeba SchSeba force-pushed the add_intel_driver_wa branch from ec2cb3c to ded2720 Compare January 25, 2022 15:43
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@SchSeba
Copy link
Collaborator Author

SchSeba commented Jan 26, 2022

/hold cancel

@github-actions github-actions bot removed the hold label Jan 26, 2022
pkg/utils/utils.go Outdated Show resolved Hide resolved
@SchSeba SchSeba force-pushed the add_intel_driver_wa branch from ded2720 to e2d31ad Compare January 27, 2022 13:56
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@SchSeba
Copy link
Collaborator Author

SchSeba commented Jan 27, 2022

@adrianchiris @zshi-redhat can you please give it another look I test it locally and it pass

@@ -291,12 +291,16 @@ func configSriovDevice(iface *sriovnetworkv1.Interface, ifaceStatus *sriovnetwor
if err = setVfGuid(addr, pfLink); err != nil {
return err
}
} else if driver != "" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name choice for this is bad IMO, wish it was : dpdkDriver but not related to this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed ? to avoid triggering the WA ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will use this instead.

#245

This is needed if there was a problem with one of the vfs and some of them are in vfio we will try to see check if VfIsReady and that will always fail because the VF is already attached to a user-space driver

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change the name is a small change :)

@@ -565,8 +569,19 @@ func setVfAdminMac(vfAddr string, pfLink netlink.Link) error {
}
vfLink, err := vfIsReady(vfAddr)
if err != nil {
glog.Errorf("setVfAdminMac(): VF link is not ready for device %+v %q", vfAddr, err)
return err
err = RebindVfToDefaultDriver(vfAddr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking on this more, this is a surprising place to find a WA.

here we basically just do ip link set <pf> vf <idx> mac <macaddr>

the reason i assume its here is because vfIsReady was put here in commit : eb47703

I wonder if a better approach is to roll the WA logic into VfIsReady and call it from configSriovDevices

essentially after we create the VFs, we wait for them to be ready then move on with our business.

as the issue only happens on Intel, you dont need to worry about a rebind later on that might hit the issue.
which happens only with rdma.(also as the WA is applied today it will not solve it)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@github-actions
Copy link

github-actions bot commented Feb 7, 2022

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@SchSeba SchSeba force-pushed the add_intel_driver_wa branch from 2d55873 to 1c8c9c6 Compare February 9, 2022 15:18
@github-actions
Copy link

github-actions bot commented Feb 9, 2022

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

// only set GUID and MAC for VF with default driver
// for userspace drivers like vfio we configure the vf mac using the kernel nic mac address
// before we switch to the userspace driver
if yes, d := hasDriver(addr); yes && !sriovnetworkv1.StringInArray(d, DpdkDrivers) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @adrianchiris @mskrocki @pliurh @zshi-redhat

I add this validation instead of if iface.NumVfs != ifaceStatus.NumVfs that was proposed under #245.

I think is a better solution because we are inside a for loop going over all the vfs. if there is an issue with one of the vfs(for example the intel driver get stuck) we exist the configSriovDevice function with an error. Then in the next reconcile the iface.NumVfs will be equal to the ifaceStatus.NumVfs because the vfs got already created here (https://github.com/k8snetworkplumbingwg/sriov-network-operator/pull/233/files#diff-81ddbadfb415ccbb9c7af84f11668d1aa5e53c34025bf86d4702f16b4e42f045R246) but we didn't really finish allocating the GUID or the mac address to all the vfs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree.

@mskrocki
Copy link
Contributor

mskrocki commented Feb 9, 2022

fix #244

@pliurh
Copy link
Collaborator

pliurh commented Feb 10, 2022

/lgtm

@github-actions github-actions bot added the lgtm label Feb 10, 2022
@bn222
Copy link
Collaborator

bn222 commented Feb 10, 2022

Since this is a workaround, I propose that we remove it when all related BZs have been fixed. With that in mind, the sole commit in this PR does more than just introduce RebindVfToDefaultDriver. I suggest to split the commit into at least two commits where one will just introduce the workaround and the other applies the changes we want to keep even after the issues have been resolved in the kernel.

…ver only

Signed-off-by: Sebastian Sch <sebassch@gmail.com>
Signed-off-by: Sebastian Sch <sebassch@gmail.com>
@SchSeba SchSeba force-pushed the add_intel_driver_wa branch from 1c8c9c6 to ce121d6 Compare February 10, 2022 12:25
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@SchSeba
Copy link
Collaborator Author

SchSeba commented Feb 10, 2022

Since this is a workaround, I propose that we remove it when all related BZs have been fixed. With that in mind, the sole commit in this PR does more than just introduce RebindVfToDefaultDriver. I suggest to split the commit into at least two commits where one will just introduce the workaround and the other applies the changes we want to keep even after the issues have been resolved in the kernel.

sounds good can you take another look?

@bn222
Copy link
Collaborator

bn222 commented Feb 10, 2022

/lgtm

@pliurh pliurh merged commit 7102b79 into k8snetworkplumbingwg:master Feb 10, 2022
@@ -541,7 +569,7 @@ func vfIsReady(pciAddr string) (netlink.Link, error) {
glog.Infof("vfIsReady(): VF device %s", pciAddr)
var err error
var vfLink netlink.Link
err = wait.PollImmediate(time.Second, 5*time.Second, func() (bool, error) {
err = wait.PollImmediate(time.Second, 10*time.Second, func() (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to increase the timeout?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the timeout was too short and was generating spurious errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants