Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Add support for allmulticast flag" #272

Merged
merged 2 commits into from
Jul 3, 2023

Conversation

adrianchiris
Copy link
Contributor

@adrianchiris adrianchiris commented Jun 21, 2023

Reverts #268

tuning CNI already provides a way to set/resotre all multicast for a netdev.
this configuration is general for netdevs not necessarily VFs

https://github.com/containernetworking/cni.dev/blob/main/content/plugins/current/meta/tuning.md#interface-attribute-configuration-reference

my preference is to not bloat sriov-cni config if there is an alternative.

@SchSeba @Eoghan1232 @mlguerrero12

@github-actions
Copy link

Pull Request Test Coverage Report for Build 5330893244

  • 2 of 2 (100.0%) changed or added relevant lines in 2 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-1.4%) to 38.475%

Files with Coverage Reduction New Missed Lines %
pkg/utils/mocks/netlink_manager_mock.go 1 79.72%
Totals Coverage Status
Change from base Build 5326045906: -1.4%
Covered Lines: 434
Relevant Lines: 1128

💛 - Coveralls

@mlguerrero12
Copy link
Contributor

mlguerrero12 commented Jun 21, 2023

That's correct @adrianchiris. Thanks for pointing it out.

The reason to have the allmulticast here (besides facilitating configuration) was to have additional checks related to sriov/dpdk. Mainly, the setting of the trust parameter. It is possible to enable allmulticast on the VF without trust being on. No error is returned, the device simply won't enter to the allmulti rx mode. We wanted to restrict this and avoid having users coming up to us asking why allmulti doesn't work. Also, for non netdev VF, we wanted to return a specific error saying that it is not possible to set it on for dpdk drivers (although my initial idea was to ignore it) instead of having the tuning plugin return "failed to get ifname".

I understand your point. It's valid. We should have discussed this initially.

@adrianchiris
Copy link
Contributor Author

adrianchiris commented Jun 21, 2023

Mainly, the setting of the trust parameter. It is possible to enable allmulticast on the VF without trust being on. No error is returned, the device simply won't enter to the allmulti rx mode.

This is not the case for MLNX NICs,

PF:

# ip l show enp3s0f0np0
4: enp3s0f0np0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether b8:ce:f6:09:fa:38 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether 00:aa:bb:cc:dd:ee brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 1     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 2     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
    vf 3     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off

VF 0:

# ip l show enp3s0f0v0
37: enp3s0f0v0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether d6:26:8f:3d:79:82 brd ff:ff:ff:ff:ff:ff permaddr f2:55:9a:20:11:88

Set all multicast:

# ip l set enp3s0f0v0 allmulticast on
# ip l show enp3s0f0v0
37: enp3s0f0v0: <BROADCAST,MULTICAST,ALLMULTI> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether d6:26:8f:3d:79:82 brd ff:ff:ff:ff:ff:ff permaddr f2:55:9a:20:11:88

So it seems to be driver implementation specific, and should not be enforced in sriov-cni.

for non netdev VF, we wanted to return a specific error saying that it is not possible to set it on for dpdk drivers (although my initial idea was to ignore it) instead of having the tuning plugin return "failed to get ifname".

i dont think this one is critical to mandate the logic being in sriov-cni. users who run dpdk applications are expected to know their desired cni configuration. there are also other parameters in cni that are not applicable to dpdk.

I understand your point. It's valid. We should have discussed this initially.

Indeed, please do so next time.

@cgoncalves
Copy link
Contributor

Both sides make good points. I personally am more inclined to have the option in the SR-IOV CNI (and SR-IOV Network Operator) for the same reasons @mlguerrero12 mentioned (greater validation and user experience). Parameters mtu and mac exist in both SR-IOV and Tuning CNIs. Requiring users to now also use the Tuning CNI for SR-IOV VFs with all the disadvantages Marcelo highlighted would lessen user experience.

Also, requiring cluster administrators to support and allow users to use the Tuning CNI can pose a security risk as it allows any sysctl to be changed (the newly introduced sysctl allow list feature requires could ease security attack surface but still).

@adrianchiris
Copy link
Contributor Author

@cgoncalves mac and mtu are related to sriov as there are specific parameters on the PF that should be applied on behalf of the VF. these are then applied on the VF netdev to be in sync.

allmulticast apparently is NOT related to to sriov, but is general for netdev.
will we now add all netdev configurable parameters to sriov-cni ?

Also, requiring cluster administrators to support and allow users to use the Tuning CNI can pose a security risk as it allows any sysctl to be changed (the newly introduced sysctl allow list feature requires could ease security attack surface but still).

Administrators should define the networks and not let users define them :)

@mlguerrero12
Copy link
Contributor

you can configure it, but does it work?
This is from the nvidia docs

"VF All-Multi Mode
VFs can enter an all-multi mode that enables receiving all the multicast traffic sent from/to the other functions on the same physical port in addition to the traffic originally targeted to the VF.
Note: Only privileged/trusted VFs can enter the all-multi RX mode."

https://docs.nvidia.com/networking/pages/viewpage.action?pageId=61869558

@mlguerrero12
Copy link
Contributor

it didn't work for me, but if it works for you, then yeah, let's revert it. That was my main motivation.

@SchSeba
Copy link
Collaborator

SchSeba commented Jun 21, 2023

From my side again if it as a correlation with something that is part of the sriov-cni then it must be in the sriov-cni.
BUT if it's possible to configure it on any netdevice VF or macvlan for example then we should use the implementation from the tuning-cni.

@mlguerrero12
Copy link
Contributor

Just to be clear, I'm claiming that it is possible to configure it (allmulti flag is set) but it doesn't really work

@SchSeba
Copy link
Collaborator

SchSeba commented Jun 21, 2023

Just to be clear, I'm claiming that it is possible to configure it (allmulti flag is set) but it doesn't really work

then it's even worse than getting an error if you ask me :P

@adrianchiris
Copy link
Contributor Author

adrianchiris commented Jun 21, 2023

you can configure it, but does it work?

how do i test this one ? do you have some instructions ?

if it doesnt, and it silently fails, then still id expect driver to fix it (and actually fail when turning on all-multi)
(ill create an internal ticket for them to address)

@mlguerrero12
Copy link
Contributor

Using a mellanox nic, I have

POD A
28: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 22:f9:f4:98:bc:ce brd ff:ff:ff:ff:ff:ff
vf 3 link/ether 22:f9:f4:98:bc:ce brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state enable, trust on, query_rss off

POD B
25: net1: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether b2:8c:90:d2:ae:71 brd ff:ff:ff:ff:ff:ff
vf 0 link/ether b2:8c:90:d2:ae:71 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state enable, trust off, query_rss off

POD C
26: net1: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether ba:8c:f7:c3:1c:1b brd ff:ff:ff:ff:ff:ff
vf 1 link/ether ba:8c:f7:c3:1c:1b brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state enable, trust on, query_rss off

Pod A pings a multicast address
sh-4.4# ping 224.10.10.10 -I net1
ping: Warning: source address might be selected on device other than net1.
PING 224.10.10.10 (224.10.10.10) from 10.132.3.231 net1: 56(84) bytes of data.

Pod B doesn't see this (nor other mulitcast traffic)
sh-4.4# tcpdump -i net1
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on net1, link-type EN10MB (Ethernet), capture size 262144 bytes
08:02:56.598926 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 08:c0:eb:96:32:2c (oui Unknown), length 276
08:02:56.711677 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from b4:96:91:b4:73:18 (oui Unknown), length 276
08:02:56.796698 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from b4:96:91:b4:71:a1 (oui Unknown), length 276
08:02:57.306474 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from b4:96:91:b4:71:a0 (oui Unknown), length 276
08:02:57.577047 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from b4:96:91:b4:78:a9 (oui Unknown), length 285
08:02:57.595346 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from b4:96:91:b4:73:19 (oui Unknown), length 276
08:02:57.720536 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from b4:96:91:b4:78:a8 (oui Unknown), length 285
08:02:58.262794 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from b4:96:91:b4:75:f9 (oui Unknown), length 285

Pod C sees this and ipv6 router solicitations which is what our customer is asking for
sh-4.4# tcpdump -i net1
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on net1, link-type EN10MB (Ethernet), capture size 262144 bytes
08:02:55.875767 IP 10.132.3.231 > 224.10.10.10: ICMP echo request, id 6, seq 462, length 64
08:02:55.997663 IP6 fe80::b696:91ff:feb4:71a1 > ff02::2: ICMP6, router solicitation, length 8
08:02:56.034849 IP6 fe80::ac0:ebff:fe96:322c > ff02::2: ICMP6, router solicitation, length 8
08:02:56.328419 IP6 fe80::b696:91ff:feb4:71a0 > ff02::2: ICMP6, router solicitation, length 8
08:02:56.353606 IP6 fe80::8cf4:f52d:7477:b838 > ff02::2: ICMP6, router solicitation, length 8
08:02:56.598889 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 08:c0:eb:96:32:2c (oui Unknown), length 276
08:02:56.711682 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from b4:96:91:b4:73:18 (oui Unknown), length 276
08:02:56.796705 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from b4:96:91:b4:71:a1 (oui Unknown), length 276
08:02:56.880993 STP 802.1w, Rapid STP, Flags [Learn, Forward], bridge-id 8000.e0:f6:2d:60:0f:d0.820c, length 43
08:02:56.899712 IP 10.132.3.231 > 224.10.10.10: ICMP echo request, id 6, seq 463, length 64
08:02:57.012966 IP6 fe80::f446:e743:f09b:e8e6 > ff02::2: ICMP6, router solicitation, length 8
08:02:57.223575 IP6 fe80::79a5:a9dc:6516:618 > ff02::2: ICMP6, router solicitation, length 8

@adrianchiris
Copy link
Contributor Author

i see, thx for checking this. so it behaves the same for mlnx (silently fails).

@mlguerrero12
Copy link
Contributor

does this mean that you are ok with this or that you will push internally for a fix? By the way, I saw the same behavior with an Intel card.

@adrianchiris
Copy link
Contributor Author

adrianchiris commented Jun 26, 2023

  1. still think we should not have this config part of sriov-cni, ill discuss with maintainers in tomorrows meeting.
  2. yes will open a bug internally, see what i get from driver team.
  3. allMulti + vf trust can be checked in sriov-network-operator (the check would need to ensure vf trust is enabled if tuning cni config is providerd as meta plugin with all multi enabled

@Eoghan1232
Copy link
Collaborator

Eoghan1232 commented Jun 27, 2023

so I wanted to check this on my side and review properly as well:

3: ens785f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether b4:96:91:b4:44:00 brd ff:ff:ff:ff:ff:ff
    vf 0     link/ether fe:fc:cf:f2:9d:93 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 1     link/ether 72:2f:d7:c4:e5:69 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 2     link/ether 6a:f3:a7:2d:77:44 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    vf 3     link/ether d2:bf:70:02:a5:85 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
    altname enp75s0f0
~# ip l show ens785f0v0
176: ens785f0v0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether fe:fc:cf:f2:9d:93 brd ff:ff:ff:ff:ff:ff
    altname enp75s0f0v0
~# ip l set enp75s0f0v0 allmulticast off
~# ip l set enp75s0f0v0 allmulticast on
~# ip l show ens785f0v0
176: ens785f0v0: <BROADCAST,MULTICAST,ALLMULTI> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether fe:fc:cf:f2:9d:93 brd ff:ff:ff:ff:ff:ff
    altname enp75s0f0v0
~#

so I am able to set allmulticast to on, even with trust off.

@mlguerrero12 you mentioned you saw similar issue with Intel cards too right?
if so, I would like to know the card/driver you were testing with, I want to log an internal ticket to address this if so.
Assuming E800 series and IAVF driver
I'd also like to run the quick test of sending the multicast traffic.

as for adding allmulticast option to sriov-cni, since it doesn't relate specifically to sriov, and just to netdev in general, I am not sure it should be added here...

this sounds like a driver issue, which should log this failure instead of being silent.

@mlguerrero12
Copy link
Contributor

@Eoghan1232

17:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-C for SFP (rev 02)
driver: iavf

Indeed, it is possible to configure it, but the interface won't see the multicast traffic. Please test it from your side.

I totally agree. If this can be solved at the driver level, then we don't need this check. The main motivation of the PR was to address this issue.

@adrianchiris
Copy link
Contributor Author

@SchSeba @zeeke @mlguerrero12

ive rebased PR to resolve conflicts. LMK if we can move forward with this revert.

@mlguerrero12
Copy link
Contributor

LGTM

@adrianchiris
Copy link
Contributor Author

@SchSeba you on board with reverting this ?

@adrianchiris adrianchiris merged commit c44726a into master Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants