Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reset the VF if failure occurs before the netconf is cached #222

Merged
merged 1 commit into from
Nov 22, 2022

Conversation

adilGhaffarDev
Copy link
Contributor

@adilGhaffarDev adilGhaffarDev commented Aug 30, 2022

This PR fixes following issue:
#212

@SchSeba
Copy link
Collaborator

SchSeba commented Aug 30, 2022

/cc @adrianchiris

@@ -94,6 +94,8 @@ func cmdAdd(args *skel.CmdArgs) error {
})
if err == nil {
_ = sm.ReleaseVF(netConf, args.IfName, args.ContainerID, netns)
// Reset the VF if failure occurs before the netconf is cached
_ = sm.ResetVFConfig(netConf)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not the right place.

we need to run the reset not only for non dpdk mode.
I will like to see a defer function going before

if err := sm.ApplyVFConfig(netConf); err != nil {

but for this we need to split that function to the point where we field the origin info.
can you take this part to another function?

pfLink, err := s.nLink.LinkByName(conf.Master)
	if err != nil {
		return fmt.Errorf("failed to lookup master %q: %v", conf.Master, err)
	}

	// Save current the VF state before modifying it
	vfState := getVfInfo(pfLink, conf.VFID)
	if vfState == nil {
		return fmt.Errorf("failed to find vf %d", conf.VFID)
	}
	conf.OrigVfState.FillFromVfInfo(vfState)

and then you can have something like

sm := sriov.NewSriovManager()
(new filed origin function)

(defer here with the reset function inside)

	if err := sm.ApplyVFConfig(netConf); err != nil {
		return fmt.Errorf("SRIOV-CNI failed to configure VF %q", err)
	}

also another change is we need a global err that we can use so if there is an error down in the stack the LIFO method of the defer function will call the release in the right order

  1. ipam release
  2. release vf
  3. resetVFconfig

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the requested changes kindly check.

@coveralls
Copy link

coveralls commented Sep 7, 2022

Pull Request Test Coverage Report for Build 3412540026

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 7 of 11 (63.64%) changed or added relevant lines in 1 file are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+2.1%) to 36.428%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/sriov/sriov.go 7 11 63.64%
Files with Coverage Reduction New Missed Lines %
pkg/sriov/sriov.go 1 35.74%
Totals Coverage Status
Change from base Build 3405549672: 2.1%
Covered Lines: 361
Relevant Lines: 991

💛 - Coveralls

Copy link
Contributor

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there , thx for contributing !

a few comments.

once reworked it would be great if you squash to a single commit and add a descriptive commit message.

this brings #115 one step closer to completion :)

if err == nil {
_ = sm.ReleaseVF(netConf, args.IfName, args.ContainerID, netns)
// Reset the VF if failure occurs before the netconf is cached
_ = sm.ResetVFConfig(netConf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should always do this IMO, regardless of the fact that netdev is present in container namespace.
currently if ApplyVFConfig fails it will not call ResetVFConfig here.

so lets call it after L91

@@ -126,7 +126,8 @@ type Manager interface {
SetupVF(conf *sriovtypes.NetConf, podifName string, cid string, netns ns.NetNS) (string, error)
ReleaseVF(conf *sriovtypes.NetConf, podifName string, cid string, netns ns.NetNS) error
ResetVFConfig(conf *sriovtypes.NetConf) error
ApplyVFConfig(conf *sriovtypes.NetConf) error
ApplyVFConfig(conf *sriovtypes.NetConf, pfLink netlink.Link) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you change this interface method ? just to avoid getting netlink handle twice ?
I think it should be kept as is for the sake of simplicity as before

@@ -126,7 +126,8 @@ type Manager interface {
SetupVF(conf *sriovtypes.NetConf, podifName string, cid string, netns ns.NetNS) (string, error)
ReleaseVF(conf *sriovtypes.NetConf, podifName string, cid string, netns ns.NetNS) error
ResetVFConfig(conf *sriovtypes.NetConf) error
ApplyVFConfig(conf *sriovtypes.NetConf) error
ApplyVFConfig(conf *sriovtypes.NetConf, pfLink netlink.Link) error
FillOriginalVfInfo(conf *sriovtypes.NetConf) (netlink.Link, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name of the method and what it returns is a bit confusing.
Id prefer that it will just do one thing and fill the original VF info.

vfState := getVfInfo(pfLink, conf.VFID)
if vfState == nil {
return pfLink, fmt.Errorf("failed to find vf %d", conf.VFID)
//return fmt.Errorf("failed to find vf %d", conf.VFID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove line

@@ -374,6 +364,22 @@ func (s *sriovManager) ApplyVFConfig(conf *sriovtypes.NetConf) error {
return nil
}

// Fills original VF state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit add golang docs string

if err := sm.ApplyVFConfig(netConf); err != nil {
pfLink, err := sm.FillOriginalVfInfo(netConf)
if err != nil {
return fmt.Errorf("failed to lookup master %q: %v", netConf.Master, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error string should be: "failed to get original vf information: %v"

the needed info is already part of err

@@ -374,6 +364,22 @@ func (s *sriovManager) ApplyVFConfig(conf *sriovtypes.NetConf) error {
return nil
}

// Fills original VF state
func (s *sriovManager) FillOriginalVfInfo(conf *sriovtypes.NetConf) (netlink.Link, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add unit tests for this new method ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the changes that you have asked, kindly check it.

@adilGhaffarDev adilGhaffarDev requested review from adrianchiris and SchSeba and removed request for adrianchiris and SchSeba September 12, 2022 13:17
return fmt.Errorf("failed to get original vf information: %v", err)
}
defer func() {
if err != nil {
Copy link
Contributor

@wizhaoredhat wizhaoredhat Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a call here in this function:

		if len(newResult.IPs) == 0 {
			return errors.New("IPAM plugin returned missing IP config")
		}

Where err wouldn't be != nil. Should we also handle that as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I might be misunderstanding,if len(newResult.IPs) == 0 we are returning the error and it will not be equal to nil, defer function will release ipam and vf, and resetVFConfig. Is it wrong flow? Can you elaborate bit more what is the issue here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defer function checks that "if err != nil" and in that specific case err will equal nil since it is a length check. So should we actually do something like this?

		if len(newResult.IPs) == 0 {
			err = ...
                        return err
		}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed that. Now its fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good now, thanks!

@adilGhaffarDev
Copy link
Contributor Author

I would like to know how can I run these PR tests myself if I want to? I have seen /retest in one of the PR, wasn't able find any other. Thanks.

@adilGhaffarDev adilGhaffarDev requested review from SchSeba and wizhaoredhat and removed request for adrianchiris and SchSeba September 23, 2022 08:10
@adilGhaffarDev adilGhaffarDev requested review from SchSeba and removed request for wizhaoredhat November 1, 2022 10:41
@adilGhaffarDev
Copy link
Contributor Author

@SchSeba kindly review and trigger tests here.

@adilGhaffarDev
Copy link
Contributor Author

/test-all

@cyclinder
Copy link
Contributor

/cc @adrianchiris @SchSeba

PTAL :)

@cgoncalves
Copy link
Contributor

LGTM

Copy link
Contributor

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@SchSeba
Copy link
Collaborator

SchSeba commented Nov 20, 2022

This will not work as expected from here

https://github.com/k8snetworkplumbingwg/sriov-cni/pull/222/files#diff-30bddd0b0a3414968a5fa92d19e50ad3168e273728fd52ba867afbc442defe62R115

r, err := ipam.ExecAdd(netConf.IPAM.Type, args.StdinData)

you can't allocate the new err variable or it will not get caught by the defer function.

@adilGhaffarDev
Copy link
Contributor Author

adilGhaffarDev commented Nov 21, 2022

@SchSeba Fixed, there was another place with similar allocation that is also fixed. kindly check.

Copy link
Collaborator

@SchSeba SchSeba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adilGhaffarDev it looks good now

/lgtm

@SchSeba SchSeba merged commit 7eb235b into k8snetworkplumbingwg:master Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants