-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Netlink invalid argument error on VF VLAN configuration #303
Comments
Hi @jqueuniet this sounds like an issue with the driver. can you please try to just run
if that failed check dmesg for any logs from the kernel |
Hey, thanks for your answer. I already tried that as I found this kind of feedback in similar issues like #285 , mentioned it in the initial report, here is the CLI output: # ip link show enp129s0f0np0
4: enp129s0f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 94:6d:ae:8c:87:80 brd ff:ff:ff:ff:ff:ff
vf 0 link/ether 06:24:7b:de:48:61 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 1 link/ether 3a:4e:66:07:97:fe brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 2 link/ether ae:51:fe:0d:d7:d2 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 3 link/ether 32:61:97:93:b3:92 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 4 link/ether fa:33:12:40:60:c6 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 5 link/ether fe:7d:74:65:93:a6 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 6 link/ether a6:12:6d:9d:bd:0a brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 7 link/ether 7e:14:18:08:b0:45 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
# ip link set enp129s0f0np0 vf 4 vlan 100 qos 0 proto 802.1q
# ip link show enp129s0f0np0
4: enp129s0f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 94:6d:ae:8c:87:80 brd ff:ff:ff:ff:ff:ff
vf 0 link/ether 06:24:7b:de:48:61 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 1 link/ether 3a:4e:66:07:97:fe brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 2 link/ether ae:51:fe:0d:d7:d2 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 3 link/ether 32:61:97:93:b3:92 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 4 link/ether fa:33:12:40:60:c6 brd ff:ff:ff:ff:ff:ff, vlan 100, spoof checking off, link-state auto, trust off, query_rss off
vf 5 link/ether fe:7d:74:65:93:a6 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 6 link/ether a6:12:6d:9d:bd:0a brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 7 link/ether 7e:14:18:08:b0:45 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off So, the VLAN is successfully set using the CLI, and no error is returned. |
Just in case this is useful, here are the iproute package version: # ip -V
ip utility, iproute2-6.7.0, libbpf 1.2.0
# rpm -qv iproute
iproute-6.7.0-2.fc40.x86_64 |
Tried with a less bleeding edge distribution, Flatcar stable with a 6.1 kernel and iproute2 6.5.0. Still getting the same error and the same symptoms, sriov-cni can't set the VLAN but I can using iproute2. Nothing seems out of place in dmesg either. |
I have two worker nodes with same NIC hardware (Intel XL710) - the error occurs on one but not the other!? The main difference I can see is that one is running Debian 12 (kernel Other components (same for both nodes):
EDIT: I can also set VF config manually from the Debian host using |
We've been able to replicate the error consistently with the |
Hi there, we did some more digging and it appears that the issue is related to extra validation added to address this CVE. By adjusting the size of the attribute to a multiple of 4 bytes, it seems to correct the issue. Doing more testing with the patched version today to see if the problem is fully addressed. |
Hi @mega-alex great working on the debug! |
Hi @mega-alex thanks for taking care on the netlink side. |
I can confirm the fix has solved my issue 👍 |
@SchSeba can you please tag a new release to make this fix available to upstream projects e.g. sriov-network-operator |
@adrianchiris can I ask you help on creating a new release for this and the sriov operation? :) |
Please release at least a v2.8.1 with this fix ASAP. Anything in production cannot receive kernel updates otherwise. Thank you! |
After kernel upgraded to 6.6.0, the sriov-cni plugin started failing when creating VLANs over a VF interfaces. This described in: k8snetworkplumbingwg/sriov-cni#303 The fix was released in v2.8.1, pull request #309: https://github.com/k8snetworkplumbingwg/sriov-cni/releases/tag/v2.8.1 Test Plan: PASS: start the sriov pod with VLAN configured Story: 2011124 Task: 50894 Change-Id: I09191a71574cf4f0073c1a40226d5cd679d3e857 Signed-off-by: Caio Bruchert <caio.bruchert@windriver.com>
We need to update SR-IOV CNI to have it working with the latest Linux kernels and have [1] fix included. [1] k8snetworkplumbingwg/sriov-cni#303 Signed-off-by: Ivan Kolodiazhnyi <ikolodiazhny@nvidia.com>
We need to update SR-IOV CNI to have it working with the latest Linux kernels and have [1] fix included. [1] k8snetworkplumbingwg/sriov-cni#303 Signed-off-by: Ivan Kolodiazhnyi <ikolodiazhny@nvidia.com> (cherry picked from commit 8a5f320)
We need to update SR-IOV CNI to have it working with the latest Linux kernels and have [1] fix included. [1] k8snetworkplumbingwg/sriov-cni#303
We need to update SR-IOV CNI to have it working with the latest Linux kernels and have [1] fix included. [1] k8snetworkplumbingwg/sriov-cni#303 Signed-off-by: Ivan Kolodiazhnyi <ikolodiazhny@nvidia.com> (cherry picked from commit 8a5f320)
What happened?
Pod fails to start, gives an error related to VF VLAN configuration
What did you expect to happen?
Pod starts with VF attached
What are the minimal steps needed to reproduce the bug?
Anything else we need to know?
Setup done using the SR-IOV operator
Reading the code for the netlink Go library, I gathered the failed command was equivalent to an
ip link
CLI call and tried to reproduce with it, but it worked and the VLAN was properly set afterward.Component Versions
Please fill in the below table with the version numbers of applicable components used.
Hardware
Only enp129s0f0np0/81:00.0 is currently configured for VF, to facilitate debugging.
Config Files
Config file locations may be config dependent.
Pod manifest
CNI config (Try '/etc/cni/net.d/')
Device pool config file location (Try '/etc/pcidp/config.json')
Multus config (Try '/etc/cni/multus/net.d')
Kubernetes deployment type ( Bare Metal, Kubeadm etc.)
bare-metal custom deployment with v1.29.2 rpm kubelet
SR-IOV Network Custom Resource Definition
Logs
SR-IOV Network Device Plugin Logs (use
kubectl logs $PODNAME
)None, pod does not start
Multus logs (If enabled. Try '/var/log/multus.log' )
Kubelet logs (journalctl -u kubelet)
The text was updated successfully, but these errors were encountered: