Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mellanox vendor plugin cannot register under certain conditions #573

Open
qiaoning opened this issue Jan 3, 2024 · 7 comments
Open

mellanox vendor plugin cannot register under certain conditions #573

qiaoning opened this issue Jan 3, 2024 · 7 comments

Comments

@qiaoning
Copy link

qiaoning commented Jan 3, 2024

https://github.com/k8snetworkplumbingwg/sriov-network-operator/blob/master/pkg/daemon/daemon.go#L541C2-L548

	// load plugins if it has not loaded
	if len(dn.enabledPlugins) == 0 {
		dn.enabledPlugins, err = enablePlugins(dn.platform, dn.useSystemdService, latestState, dn.hostManager, dn.storeManager)
		if err != nil {
			log.Log.Error(err, "nodeStateSyncHandler(): failed to enable vendor plugins")
			return err
		}
	}

if latestState.status is nil, only k8s_plugin and generic_plugin registed, vendor plugin(such as mellanox_plugin) will never be registered in this case, even if generation of SriovNetworkNodeState is changed, because only check dn.enabledPlugins's count, plugin name is NOT checked.

A solution should be like this code template:

	// load plugins if has not loaded
	if len(dn.enabledPlugins) == 0 {
		dn.enabledPlugins, err = enablePlugins(dn.platform, dn.useSystemdService, latestState, dn.hostManager, dn.storeManager)
		if err != nil {
			log.Log.Error(err, "nodeStateSyncHandler(): failed to enable vendor plugins")
			return err
		}
	} else {
		// load mellanox vendor plugin if has not loaded before
		if _, exist := dn.enabledPlugins[mellanox.PluginName]; !exist {
			vendorPlugins, err := registerVendorPlugins(latestState)
			if err != nil {
                                log.Log.Error(err, "nodeStateSyncHandler(): failed to enable mellanox vendor plugins error")
				return err
			}
			if _, exist := vendorPlugins[mellanox.PluginName]; !exist {
				glog.V(0).Info("nodeStateSyncHandler(): mellanox plugin NOT registered.")
			} else {
				dn.enabledPlugins[mellanox.PluginName] = vendorPlugins[mellanox.PluginName]
				glog.V(0).Info("nodeStateSyncHandler(): mellanox plugin registered.")
			}
		}
	}
@SchSeba
Copy link
Collaborator

SchSeba commented Feb 12, 2024

I think that is not the case because we always run the pollNic before we go here no?

@qiaoning
Copy link
Author

I think that is not the case because we always run the pollNic before we go here no?

Corrected: the code is from release 1.2.0: https://github.com/k8snetworkplumbingwg/sriov-network-operator/blob/v1.2.0/pkg/daemon/daemon.go#L426

It's triggered by SriovNetworkNodeState changed, and handled with nodeStateSyncHandler function.
And We actually encountered this situation in production environment.

@SchSeba
Copy link
Collaborator

SchSeba commented Feb 20, 2024

will you be able to use a new version of the operator? :)

@qiaoning
Copy link
Author

will you be able to use a new version of the operator? :)

https://github.com/k8snetworkplumbingwg/sriov-network-operator/releases

The latest release version is v1.2.0, so shall we use master code to deploy to production environment ?

@SchSeba
Copy link
Collaborator

SchSeba commented Mar 10, 2024

you are right we are going to do a new release of the operator soon sorry about that

@qiaoning13256
Copy link

We plan to use sriov-network-operator 1.3.0 release version in our production environment, and we'll continue watching this bug.
Thanks a lot.

@SchSeba
Copy link
Collaborator

SchSeba commented Aug 27, 2024

Hi @qiaoning please check the latest release we did as we fix also a critical issue in the sriov-cni

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants