Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore original VF state when needed #114

Merged
merged 1 commit into from
Apr 7, 2020

Conversation

adrianchiris
Copy link
Contributor

@adrianchiris adrianchiris commented Feb 23, 2020

Restore original VF value when needed

This commit attempts to sort out (to some extent)
how a VF configuration is restored.

Until today, almost all VF configurations (except MAC)
where restored to some hardcoded value.
These values may not be desired by the system administrator
or may be invalid for some NIC driver implementation.

As an example, The behaviour of A VF where spoofchk is enabled
and admin mac address is zero is not well defined and is left
for the vendor driver implementors to define the behaviour.
This may cause the NIC's embedded switch to block traffic
from the VF, hence a POD will not be able to send traffic on that VF.

Today, when a user does not specify MAC and Spoofchk values in network
configuration will run into the issue described above when the VF is
reused for a POD.

To fix the issue, described above we take a similar approach to
how Administrative MAC is saved and restored today.

This commit:
  1. Defines a new VF state struct which will save VF configurations
     to be used during configuration restoration which shall be saved
     together with the network confguration in cache.
  2. Moves AdminMAC, EffectiveMAC attributes to the new struct
  3. Saves the VF's state during CmdAdd flow
  4. Conditionally restores all values to their saved state during
     ResetVFConfig()
  5. Modifies order when restoring VF confguration to first restore
     spoofchk and then Administrative MAC
  6. Enhances documentation under Mellanox to clarify the spoofchk
     and mac dependency.

if err = s.nLink.LinkSetVfSpoofchk(pfLink, conf.VFID, spoofchk_val); err != nil {
return fmt.Errorf("failed to restore spoof checking for vf %d to %s: %v", conf.VFID, conf.OrigSpoofChk, err)
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could add an "else" clause here and set spoofchk to false. However, this will not necessarily preserve the original state of the system.

conf.OrigSpoofChk = "off"
}

if spoofChk && conf.MAC == "" && origVfInfo.Mac.String() == zeroMac {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may not want to enforce spoofchk=on + adminMac=all-zero rule here as this behavior may be different from driver to driver.
I'd suggest to just restore the spoofChk to its original value which shall work across drivers as driver would set the spoofchk and adminMac properly when VF first gets created.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may also be worth documenting this behavior for user reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing that Intel VFs are created with spoofchk on and all-zero admin mac it seems that behaviour of this state (and transitions to this state) is left to the interpretation of each vendor driver.
will remove the check

MinTxRate *int `json:"min_tx_rate"` // Mbps, 0 = disable rate limiting
MaxTxRate *int `json:"max_tx_rate"` // Mbps, 0 = disable rate limiting
SpoofChk string `json:"spoofchk,omitempty"` // on|off
OrigSpoofChk string `json:"origSpoofchk,omitempty"` // on|off, represents the original spoofchk value in the system
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am getting concerned about littering all "non-user-configurable" parameters with corresponding json encoding fields so that we could save them on disk.
Ideally, we should only have fields with json tag if the value is expected from user. I was wondering if there's any better way of doing this.

Copy link
Contributor Author

@adrianchiris adrianchiris Feb 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regards a more general problem, for this PR we deal with the issue of implicitly changing spoofchk value which was introduced in
020e04d
this causes pod traffic to fail on mellanox VFs. (as described in commit message)

ideally the VF's "restore" state should be in a separate cache entry (maybe different file/ different section in the file).
these values would still have json tags as the state will eventually end up as JSON to be consumed later by cmdDel

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @ahalim-intel that we should not expose a parameter that's not configurable by user. This may better go into the conf file saved in data dir. Shall we address the VF state restoration in one issue, which would solve all the relevent open issues such as #113, #116 etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont understand why this differs from NetConf.AdminMAC , NetConf.EffectiveMAC (These are not user configurable parameters) this patch takes the same approach to deal with what i consider a critical issue as Pod 2 Pod traffic will always fail on VFs from Mellanox NIC due to the fact that the original state of spoofchk is not preserved.

Now, don't get me wrong, i'm all for solving the general issue but it can be regarded as an enhancement (crucial one IMO) to the project that will take some time to agree on design, implementation etc.
This patch is proposed as a triage until #115 is implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also i'm not sure about backpoting policies for this project in case there was a release with this issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea of not having origSpoofChk in conf also applies to AdminMac and EffectiveMac, it's just that it was not explicitly called out. Maybe we shall put all the VF state related options which are not expected to be configured by user into a separate structure or as a sub-structure of conf so that CNI can cache all the VF states and restore them when cmdDel is called.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding backporting, I think we don't have much request as such before so there is no established rules of how this should be done. but we're open to backport any fix to release branch where needed.

@@ -277,6 +279,11 @@ func (s *sriovManager) ApplyVFConfig(conf *sriovtypes.NetConf) error {
return fmt.Errorf("failed to lookup master %q: %v", conf.Master, err)
}

origVfInfo := getVfInfo(pfLink, conf.VFID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try save and restore the original VF states from this. And on the ApplyVFConfig() should only try setting VF configurations that are given in the config. If any config not given let's not touch them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can define a data structure for origVfInfo, then it would be easier to store the retrieved origVfInfo to the defined structure and cache it. This will save us from contional check from here

@adrianchiris
Copy link
Contributor Author

recheck

@adrianchiris adrianchiris changed the title Restore original spoofchk value only when needed Restore original VF state when needed Mar 11, 2020
@adrianchiris
Copy link
Contributor Author

This PR has been modified to address the comments, it now stores the original VF state in a separate struct nested in netconf and conditionally restores all VF configuration values.

}

// Restore spoofchk
if conf.SpoofChk != "" {
Copy link
Collaborator

@zshi-redhat zshi-redhat Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it restores only when spoofChk is configured, will this resolve the issue that spoofChk is flipped to on while releasing VF on mellanox VF?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is assumed that the VF state prior to cmdAdd() call is a valid one for the NIC driver(so no spoofchk=on, MAC=00000...).
then if user chose to change the value, we will restore to the original state.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if user doesn't specify the value of spoofChk, will it still be turned on automatically when released? (then next time it's used in other pod, traffic blocked)

Copy link
Contributor Author

@adrianchiris adrianchiris Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe im missing something, but who would turn on spoofchk automatically ?
the only entity who modifies these VF attributes should be SR-IOV CNI which, now, does that conditionally.

can you describe the scenario ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, here is what I found on openshift deployment: when VF is released from pod, spoofChk is turned on automatically, the next time new pod get created with the same VF will have traffic blocked as it doesn't explicitly turn off the spoofChk.

Copy link
Contributor Author

@adrianchiris adrianchiris Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, here is what I found on openshift deployment: when VF is released from pod, spoofChk is turned on automatically, the next time new pod get created with the same VF will have traffic blocked as it doesn't explicitly turn off the spoofChk.

Current SR-IOV CNI automatically sets it to on on cmdDel(), after this change it will not.
this PR does not address fixing a VF which has ended up in an invalid state (initial configuration is assumed to be valid).

@lennybe
Copy link

lennybe commented Mar 12, 2020

/retest

}

// validate that vlan id is set if vlan qos is configured
if n.Vlan == 0 && n.VlanQoS != 0 {
if (n.Vlan == nil || *n.Vlan == 0) && n.VlanQoS != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's rarely used, but shall we allow combined settings vlan == 0 and vlanQoS == 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should not (at least not in this PR) ill modify the condition to:

if (n.Vlan == nil || *n.Vlan == 0) && (n.VlanQos != nil && *n.VlanQos != 0) {...}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, i don't think we need to prevent it.

Changing Vlan and VlanQos to a pointer allows us to know if the user specified a value.
so in my opinion we should enforce:

  1. Vlan must be specified by the user if VlanQos is specified
  2. if VlanQos is specified and is a non-zero value then Vlan must be a non zero value

i don't think we should prevent vlan == 0 && vlanQos == 0

This commit attempts to sort out (to some extent)
how a VF configuration is restored.

Until today, almost all VF configurations (except MAC)
where restored to some hardcoded value.
These values may not be desired by the system administrator
or may be invalid for some NIC driver implementation.

As an example, The behaviour of A VF where spoofchk is enabled
and admin mac address is zero is not well defined and is left
for the vendor driver implementors to define the behaviour.
This may cause the NIC's embedded switch to block traffic
from the VF, hence a POD will not be able to send traffic on that VF.

Today, when a user does not specify MAC and Spoofchk values in network
configuration will run into the issue described above when the VF is
reused for a POD.

To fix the issue, described above we take a similar approach to
how Administrative MAC is saved and restored today.

This commit:
  1. Defines a new VF state struct which will save VF configurations
     to be used during configuration restoration which shall be saved
     together with the network confguration in cache.
  2. Moves AdminMAC, EffectiveMAC attributes to the new struct
  3. Saves the VF's state during CmdAdd flow
  4. Conditionally restores all values to their saved state during
     ResetVFConfig()
  5. Modifies order when restoring VF confguration to first restore
     spoofchk and then Administrative MAC
  6. Enhances documentation under Mellanox to clarify the spoofchk
     and mac dependency.
@lennybe
Copy link

lennybe commented Mar 16, 2020

/retest

@adrianchiris
Copy link
Contributor Author

PR update : allow user to specify the following configuration: vlan == 0 && vlanQos == 0

@adrianchiris
Copy link
Contributor Author

@zshi-redhat @ahalim-intel inputs appreciated :)

@zshi-redhat
Copy link
Collaborator

@zshi-redhat @ahalim-intel inputs appreciated :)

I just got chance to test this PR, it worked fine. Thanks for working on it!

@carmark
Copy link
Contributor

carmark commented Mar 31, 2020

works well also on my local environment. Without this patch, I may encounter the panic issue, such as :

Mar 31 17:04:47 k8s-worker-4 kubelet[13739]: panic: runtime error: invalid memory address or nil pointer dereference
Mar 31 17:04:47 k8s-worker-4 kubelet[13739]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x543768]
Mar 31 17:04:47 k8s-worker-4 kubelet[13739]: goroutine 1 [running, locked to thread]:
Mar 31 17:04:47 k8s-worker-4 kubelet[13739]: github.com/intel/sriov-cni/pkg/sriov.(*sriovManager).SetupVF(0xc00000c720, 0xc000145040, 0xc00001401b, 0x3, 0xc00001c010, 0x40, 0x5e0e20, 0xc000010a70, 0xc00018000
Mar 31 17:04:47 k8s-worker-4 kubelet[13739]: /root/go/src/github.com/intel/sriov-cni/.gopath/src/github.com/intel/sriov-cni/pkg/sriov/sriov.go:155 +0x98
Mar 31 17:04:47 k8s-worker-4 kubelet[13739]: main.cmdAdd(0xc0001644d0, 0x0, 0x0)
Mar 31 17:04:47 k8s-worker-4 kubelet[13739]: /root/go/src/github.com/intel/sriov-cni/.gopath/src/github.com/intel/sriov-cni/cmd/sriov/main.go:96 +0x7a0
Mar 31 17:04:47 k8s-worker-4 kubelet[13739]: github.com/intel/sriov-cni/vendor/github.com/containernetworking/cni/pkg/skel.(*dispatcher).checkVersionAndCall(0xc00011bec8, 0xc0001644d0, 0x5df660, 0xc000138570,
Mar 31 17:04:47 k8s-worker-4 kubelet[13739]: /root/go/src/github.com/intel/sriov-cni/.gopath/src/github.com/intel/sriov-cni/vendor/github.com/containernetworking/cni/pkg/skel/skel.go:185 +0x258
Mar 31 17:04:47 k8s-worker-4 kubelet[13739]: github.com/intel/sriov-cni/vendor/github.com/containernetworking/cni/pkg/skel.(*dispatcher).pluginMain(0xc00011bec8, 0x5b7aa0, 0x5b7aa8, 0x5b7ab8, 0x5df660, 0xc000
Mar 31 17:04:47 k8s-worker-4 kubelet[13739]: /root/go/src/github.com/intel/sriov-cni/.gopath/src/github.com/intel/sriov-cni/vendor/github.com/containernetworking/cni/pkg/skel/skel.go:221 +0x546
Mar 31 17:04:47 k8s-worker-4 kubelet[13739]: github.com/intel/sriov-cni/vendor/github.com/containernetworking/cni/pkg/skel.PluginMainWithError(...)
Mar 31 17:04:47 k8s-worker-4 kubelet[13739]: /root/go/src/github.com/intel/sriov-cni/.gopath/src/github.com/intel/sriov-cni/vendor/github.com/containernetworking/cni/pkg/skel/skel.go:286
Mar 31 17:04:47 k8s-worker-4 kubelet[13739]: github.com/intel/sriov-cni/vendor/github.com/containernetworking/cni/pkg/skel.PluginMain(0x5b7aa0, 0x5b7aa8, 0x5b7ab8, 0x5df660, 0xc000138570, 0x0, 0x0)
Mar 31 17:04:47 k8s-worker-4 kubelet[13739]: /root/go/src/github.com/intel/sriov-cni/.gopath/src/github.com/intel/sriov-cni/vendor/github.com/containernetworking/cni/pkg/skel/skel.go:301 +0x128

@ahalimx86
Copy link
Collaborator

/lgtm
I will wait for others feedback as well.

@moshe010
Copy link

moshe010 commented Apr 7, 2020

can we merged this please?

@killianmuldoon
Copy link
Collaborator

I tested this across a number of nics and drivers and it's working as described. /LGTM

@ahalimx86 ahalimx86 merged commit e051689 into k8snetworkplumbingwg:master Apr 7, 2020
@moshe010
Copy link

moshe010 commented Apr 7, 2020

thanks :)

@moshe010
Copy link

moshe010 commented Apr 8, 2020

@zshi-redhat, Do we need to backport this for openshift?

@zshi-redhat
Copy link
Collaborator

@zshi-redhat, Do we need to backport this for openshift?

@moshe010 this is cherry-picked as part of regular update: openshift/sriov-cni#14

krsacme pushed a commit to krsacme/sriov-cni that referenced this pull request Aug 25, 2020
This commit attempts to sort out (to some extent)
how a VF configuration is restored.

Until today, almost all VF configurations (except MAC)
where restored to some hardcoded value.
These values may not be desired by the system administrator
or may be invalid for some NIC driver implementation.

As an example, The behaviour of A VF where spoofchk is enabled
and admin mac address is zero is not well defined and is left
for the vendor driver implementors to define the behaviour.
This may cause the NIC's embedded switch to block traffic
from the VF, hence a POD will not be able to send traffic on that VF.

Today, when a user does not specify MAC and Spoofchk values in network
configuration will run into the issue described above when the VF is
reused for a POD.

To fix the issue, described above we take a similar approach to
how Administrative MAC is saved and restored today.

This commit:
  1. Defines a new VF state struct which will save VF configurations
     to be used during configuration restoration which shall be saved
     together with the network confguration in cache.
  2. Moves AdminMAC, EffectiveMAC attributes to the new struct
  3. Saves the VF's state during CmdAdd flow
  4. Conditionally restores all values to their saved state during
     ResetVFConfig()
  5. Modifies order when restoring VF confguration to first restore
     spoofchk and then Administrative MAC
  6. Enhances documentation under Mellanox to clarify the spoofchk
     and mac dependency.
zeeke pushed a commit to zeeke/sriov-cni that referenced this pull request Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants