Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix BlueField2 SR-IOV configuration #240

Merged
merged 1 commit into from
Feb 20, 2022

Conversation

e0ne
Copy link
Collaborator

@e0ne e0ne commented Feb 2, 2022

This patch fixes current behaviour for BF2 NICs to configure
SR-IOV VFs in vanilla Kubernetes cluster.

PR #201 broke BF2 NICs configration in Kubernetes. It assumes
that BF2 will be configured via systemd service in OpenShift
using MachineConfigPool object.

Signed-off-by: Ivan Kolodyazhny ikolodiazhny@nvidia.com

@e0ne e0ne marked this pull request as draft February 2, 2022 08:12
@github-actions
Copy link

github-actions bot commented Feb 2, 2022

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@e0ne e0ne force-pushed the bf2-configuration branch from 226ec73 to dd85271 Compare February 9, 2022 09:51
@e0ne e0ne marked this pull request as ready for review February 9, 2022 09:51
@github-actions
Copy link

github-actions bot commented Feb 9, 2022

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@e0ne e0ne marked this pull request as draft February 9, 2022 10:03
This patch fixes current behaviour for BF2 NICs to configure
SR-IOV VFs in vanilla Kubernetes cluster.

PR k8snetworkplumbingwg#201 broke BF2 NICs configration in Kubernetes. It assumes
that BF2 will be configured via systemd service in OpenShift
using MachineConfigPool object.

Signed-off-by: Ivan Kolodyazhny <ikolodiazhny@nvidia.com>
@e0ne e0ne force-pushed the bf2-configuration branch from dd85271 to be1f285 Compare February 10, 2022 09:01
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@e0ne e0ne marked this pull request as ready for review February 10, 2022 12:28
@e0ne
Copy link
Collaborator Author

e0ne commented Feb 14, 2022

@zshi-redhat @pliurh could you please review this PR?

@zshi-redhat
Copy link
Collaborator

@e0ne Is this change to enable BF in ConnectX mode?

@e0ne
Copy link
Collaborator Author

e0ne commented Feb 14, 2022

@e0ne Is this change to enable BF in ConnectX mode?

It fixes logic which worked before #201 was merged and works without any knowledge about the mode.

@adrianchiris
Copy link
Collaborator

adrianchiris commented Feb 15, 2022

@zshi-redhat for now we dont have a use-case to deploy BF2 via systemd service with vanilla k8s.
our use-case is to support DPU in connectX mode (which requires "full" legacy sriov configuration).

later on we will make sriov-network-operator DPU mode aware and support both use-cases.
(we may also need to support transitioning between the modes in this case).

once we move sriov configuration to systemd both flows (openshift and non-openshift) can be unified IMO. performing full sriov configuration via systemd.

@zshi-redhat
Copy link
Collaborator

@zshi-redhat for now we dont have a use-case to deploy BF2 via systemd service with vanilla k8s. our use-case is to support DPU in connectX mode (which requires "full" legacy sriov configuration).

So this patch is not supposed to enable switchdev mode with BF2 in connectX mode?

later on we will make sriov-network-operator DPU mode aware and support both use-cases. (we may also need to support transitioning between the modes in this case).

Ack+, both use-cases means supporting BF2 in different modes (connectX, embedded cpu)

once we move sriov configuration to systemd both flows (openshift and non-openshift) can be unified IMO. performing full sriov configuration via systemd.

Agree, I think k8s_plugin applies the systemd service to enable switchdev mode in vanilla k8s, I wonder if this approach will be applied to BF in connectX mode.

@e0ne
Copy link
Collaborator Author

e0ne commented Feb 16, 2022

@zshi-redhat with this PR BF2 could be configured:

  • by systemd service in switchdev mode both for vanilla k8s and OpenShift
  • by config daemon to configure VFs like we do for the rest of NICs in vanilla k8s only

DPU mode aware logic will be implemented in a follow-up PR

@adrianchiris
Copy link
Collaborator

adrianchiris commented Feb 16, 2022

So this patch is not supposed to enable switchdev mode with BF2 in connectX mode?

it is, as connectX, realized i wasnt clear on this :)

Ack+, both use-cases means supporting BF2 in different modes (connectX, embedded cpu)

yes

Agree, I think k8s_plugin applies the systemd service to enable switchdev mode in vanilla k8s, I wonder if this approach will be applied to BF in connectX mode.

yes it will. the goal is for it to take the same code-path as connectx in both legacy/switchdev modes in vanilla k8s

// Nvidia_mlx5_MT42822_BlueField-2_integrated_ConnectX-6_Dx
if ifStatus.Vendor == VendorMellanox && ifStatus.DeviceID == DeviceBF2 {
// Nvidia_mlx5_MT42822_BlueField-2_integrated_ConnectX-6_Dx in OpenShift
if ClusterType == ClusterTypeOpenshift && ifStatus.Vendor == VendorMellanox && ifStatus.DeviceID == DeviceBF2 {
Copy link
Collaborator

@zshi-redhat zshi-redhat Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd also like to support BF in connectx mode in openshift, this check makes the bf2 device in connectx mode to go through the path of systemd service configuration on openshift, which is not ideal since currently systemd service doesn't initialize the VF device as is done by config-daemon runtime configSriovDevice.

Shall we take this opptunitiy to implement the VF initialization in systemd service?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is proposed as a temporary solution to enable the use-case for vanilla k8s. @e0ne is working of VF initialization in systemd.

for openshift i suspect the need to support BF as connectx will come at a later stage so it gives us some time to move to systemd., Driver changes might be needed as well as a new mstflint version that needs to be pulled to RHEL repo (for when we need to query the DPU mode and change it).

will this be acceptable ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, the sysetmd can come later.

@adrianchiris
Copy link
Collaborator

/test-all

@adrianchiris adrianchiris merged commit 15db360 into k8snetworkplumbingwg:master Feb 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants