Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pci address lock for allocated devices #220

Merged
merged 2 commits into from
Nov 17, 2022

Conversation

SchSeba
Copy link
Collaborator

@SchSeba SchSeba commented Aug 29, 2022

This commit add a type of lock to allocated pci address

Fixes: #219

Signed-off-by: Sebastian Sch sebassch@gmail.com

@coveralls
Copy link

coveralls commented Aug 29, 2022

Pull Request Test Coverage Report for Build 3037867348

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 19 of 102 (18.63%) changed or added relevant lines in 4 files are covered.
  • 20 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-1.5%) to 32.811%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/utils/utils.go 0 1 0.0%
pkg/config/config.go 9 12 75.0%
pkg/utils/pci_allocator.go 10 41 24.39%
pkg/utils/mocks/pci_allocator_mock.go 0 48 0.0%
Files with Coverage Reduction New Missed Lines %
pkg/config/config.go 20 48.84%
Totals Coverage Status
Change from base Build 3015757156: -1.5%
Covered Lines: 356
Relevant Lines: 1085

💛 - Coveralls

@SchSeba SchSeba force-pushed the add_pciaddress_lock branch 2 times, most recently from d9b7567 to de043e3 Compare August 29, 2022 13:53
@orelmisan
Copy link

orelmisan commented Aug 30, 2022

Hi @SchSeba, what will happen in case the ADD command fails because of the new lock?
Will there be one less interface in the Pod?

@SchSeba
Copy link
Collaborator Author

SchSeba commented Aug 30, 2022

Hi @orelmisan nope. the pod will stay in ContainerCreating and multus will retry

example:

testpod-2jnht   Successfully assigned sriov-cni-test-2/testpod-2jnht to sriov-worker2
3m54s       Warning   DNSConfigForming         pod/testpod-2jnht   Search Line limits were exceeded, some search paths have been omitted, the applied search line is: sriov-cni-test-2.svc.cluster.local svc.cluster.local cluster.local eng.lab.tlv.redhat.com lab.tlv.redhat.com tlv.redhat.com
4m11s       Normal    AddedInterface           pod/testpod-2jnht   Add eth0 [10.244.1.45/24] from kindnet
4m11s       Warning   FailedCreatePodSandBox   pod/testpod-2jnht   Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "f10739216389f1cb0b1dced00febb5612c27e2117e2312ffbce2b9ceb832d60d": [sriov-cni-test-2/testpod-2jnht:sriov-network]: error adding container to network "sriov-network": error the deivce is already allocated for pci address 0000:04:0a.4
3m57s       Normal    AddedInterface           pod/testpod-2jnht   Add eth0 [10.244.1.46/24] from kindnet
3m57s       Normal    AddedInterface           pod/testpod-2jnht   Add net1 [] from sriov-cni-test-2/sriov-network
3m57s       Normal    Pulling                  pod/testpod-2jnht   Pulling image "fedora"
3m55s       Normal    Pulled                   pod/testpod-2jnht   Successfully pulled image "fedora" in 1.767141498s
3m55s       Normal    Created                  pod/testpod-2jnht   Created container test
3m55s       Normal    Started                  pod/testpod-2jnht   Started container test
3m53s       Normal    Killing                  pod/testpod-2jnht   Stopping container test

@orelmisan
Copy link

Thank you for the answer @SchSeba.

@SchSeba SchSeba force-pushed the add_pciaddress_lock branch from de043e3 to 60b047a Compare August 30, 2022 11:42

// Check if the device is already allocated.
// This is to prevent issues where kubelet request to delete a pod and in the same time a new pod using the same
// vf is started. we can have an issue where the cmdDel of the old pod us called AFTER the cmdAdd of the new one

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// vf is started. we can have an issue where the cmdDel of the old pod us called AFTER the cmdAdd of the new one
// vf is started. we can have an issue where the cmdDel of the old pod is called AFTER the cmdAdd of the new one


path := filepath.Join(allocatedPCIPath, pciAddress)

err := os.WriteFile(path, []byte(ns), 0700)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to grant this file only read and write permissions.

Suggested change
err := os.WriteFile(path, []byte(ns), 0700)
err := os.WriteFile(path, []byte(ns), 0600)

Copy link

@ormergi ormergi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SchSeba does it make sense to check if a VF is allocated at cmdCheck?

@SchSeba
Copy link
Collaborator Author

SchSeba commented Aug 30, 2022

Hi @ormergi I am not sure if that function is even implemented today via multus

@SchSeba SchSeba force-pushed the add_pciaddress_lock branch from eb95f56 to e27e0ad Compare August 30, 2022 13:04
cmd/sriov/main.go Show resolved Hide resolved
return n, err
}

if isAllocated {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the verification from the main to here so we produce a better error when a PCI address is still assign.
instead of this error

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_client_seba_65fe3641-ebd8-4f8b-9b17-785f07b3ad3b_0(0140bb9890307e2f3f8804a06dd0f70a0970f7b8ee1254227eb5c0043ddf2f77): error adding pod seba_client to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [seba/client/65fe3641-ebd8-4f8b-9b17-785f07b3ad3b:sriov-network-2]: error adding container to network "sriov-network-2": SRIOV-CNI failed to load netconf: LoadConf(): the VF 0000:d8:0b.0 does not have a interface name or a dpdk driver

we will return error that the PCI is still allocated

@SchSeba SchSeba force-pushed the add_pciaddress_lock branch from e27e0ad to c0728f0 Compare August 30, 2022 13:08
@ormergi
Copy link

ormergi commented Aug 30, 2022

Hi @ormergi I am not sure if that function is even implemented today via multus

@SchSeba The code looks like it does, check this out https://github.com/k8snetworkplumbingwg/multus-cni/blob/v3.7/cmd/main.go#L56
It may save a few iterations of the plugin when a VF is already allocated, isn't it?

EDIT: if it a lot of work or out of this PR scope, we can do it on a follow-up PR if it makes sense.

@SchSeba
Copy link
Collaborator Author

SchSeba commented Aug 30, 2022

@ormergi I think it's I can open an issue so we track the request to implement the check function from the CNI API definition

@orelmisan
Copy link

Hi @SchSeba Does the new locking mechanism supports the following scenarios:

  1. A graceful reboot of the node.
  2. A non graceful reboot of the node (for example from a power outage).

@adrianchiris
Copy link
Contributor

Hi @SchSeba Does the new locking mechanism supports the following scenarios:

There is also, reboot of node then force delete of pod from k8s API, in this case, when node comes up i dont think cni will be called

@SchSeba
Copy link
Collaborator Author

SchSeba commented Aug 31, 2022

Hi @orelmisan @adrianchiris I test the 3 cases

1 . A graceful reboot of the node. - With a drain then pod moves to another node or to pending after the node is back the new pod start again

  1. A non graceful reboot of the node (for example from a power outage). - tested 2 cases here
    2.1 node is back before the eviction policy the pod moves to container creating again and after the sriov config daemon and device plugin are ready the pod start (as it's a new pod so a different netns)
    2.2 node is back after eviction policy so a new pod is created and waiting on pending (again a different netns is used for the new pod)

  2. There is also, reboot of node then force delete of pod from k8s API, in this case, when node comes up i dont think cni will be called - I test this one also when the node is back there is no pod to start and applying again the pod will be able to reuse the PCI address as it us a different netns

@orelmisan
Copy link

Thank you for your answer @SchSeba.
Will there be leftover files that could grow in number, and thus in size over time?

@SchSeba
Copy link
Collaborator Author

SchSeba commented Sep 1, 2022

they may be files overtime but I don't think that can impact a system if you ask me.

@orelmisan
Copy link

they may be files overtime but I don't think that can impact a system if you ask me.

Will it work if these files will be written to /tmp so they will be automatically deleted on reboot?

@SchSeba
Copy link
Collaborator Author

SchSeba commented Sep 4, 2022

we can do it. but I am scared that some auto clean in the node if there is storage pressure, will remove the files and then we can again have the same issue

Copy link

@orelmisan orelmisan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes @SchSeba


dat, err := os.ReadFile(path)
if err != nil {
return false, "", fmt.Errorf("failed to read for pci address file for %s: %v", pciAddress, err)
Copy link
Member

@zeeke zeeke Sep 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add the whole path to the error. It makes it easier to debug in case of errors.

Suggested change
return false, "", fmt.Errorf("failed to read for pci address file for %s: %v", pciAddress, err)
return false, "", fmt.Errorf("failed to read PCI address lock file %s: %w", path, err)

func CleanAllocatedPCI(pciAddress, dataDir string) error {
path := filepath.Join(dataDir, "pci", pciAddress)
if err := os.Remove(path); err != nil {
return fmt.Errorf("error removing PCI address lock file %s: %q", path, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why using %q for printing errors?

Suggested change
return fmt.Errorf("error removing PCI address lock file %s: %q", path, err)
return fmt.Errorf("error removing PCI address lock file %s: %w", path, err)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was a copy paste from CleanCachedNetConf I agree we should switch to %w but all the code use %v so I will use V and in the future we can open a PR to switch all of them together

Copy link
Collaborator

@Eoghan1232 Eoghan1232 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall, lgtm.

@SchSeba SchSeba force-pushed the add_pciaddress_lock branch from c0728f0 to a0e4c88 Compare September 7, 2022 10:09
Copy link

@ormergi ormergi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SchSeba Thanks for fixing it!

@@ -36,6 +37,38 @@ func LoadConf(bytes []byte) (*sriovtypes.NetConf, error) {
return nil, fmt.Errorf("LoadConf(): VF pci addr is required")
}

// Get the vf pci address
vfPciAddress, err := utils.GetPciAddress(n.Master, n.VFID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n.DeviceID already holds the VF PCI address no ? why do we need this here ?

@@ -40,6 +40,7 @@ type NetConf struct {
VlanQoS *int `json:"vlanQoS"`
DeviceID string `json:"deviceID"` // PCI address of a VF in valid sysfs format
VFID int
VFPciAddress string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeviceID is the VF PCI address AFAIK. did i miss anything ?

}
return nil
}

func SaveAllocatedPCI(pciAddress, ns, dataDir string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we wrap these three methods in an interface ?

i know cache is not implemented this way, but i do not think we need to take the same approach here.

it would be great if this would be its own package with interface definition, impl and UT.
or if you prefer, part of util package but a separate file e.g pci_allocator.go WDYT ?

the dataDir (currently config.DefaultCNIDir) should be provided when creating the interface implementation (i.e provided in the "New" function)

WDYT ?

later on we can move the caching methods to their own interface as well.

}
return nil
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add docstring to the methods below ?

return nil
}

func IsAllocated(pciAddress, dataDir string) (bool, string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you add docstring here, please add some info about the returned string.

@@ -207,10 +212,20 @@ func cmdDel(args *skel.CmdArgs) error {
}
}

vfPciAddress, err := utils.GetPciAddress(netConf.Master, netConf.VFID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

netConf.DeviceID is not saved ?

// This is to prevent issues where kubelet request to delete a pod and in the same time a new pod using the same
// vf is started. we can have an issue where the cmdDel of the old pod is called AFTER the cmdAdd of the new one
// This will block the new pod creation until the cmdDel is done.
isAllocated, allocatedNs, err := utils.IsAllocated(vfPciAddress, DefaultCNIDir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could roll the logic below into IsAllocated and simplify this method.

a VF is allocated if file exists and the namespace stored within it exists else it is not allocated.
if it is not allocated then IsAllocated can delete the stale file (as best effort operation) WDYT ?

@adrianchiris
Copy link
Contributor

also it would be great if you could add more info in commit message about the issue and the changes introduced

and even better ( : - D ) if you can split PR to 2 commits (adding the locking mechanism and the other, integrating it in CNI code) with info on each would be even better :)

the latter is not a must ofc, depends on you.

@SchSeba SchSeba force-pushed the add_pciaddress_lock branch from a0e4c88 to a3b8961 Compare September 12, 2022 13:56
@SchSeba
Copy link
Collaborator Author

SchSeba commented Sep 12, 2022

@adrianchiris Thanks for the comments!

I make the changes let me know how it looks now :)

Copy link
Contributor

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so minor nits.

it would be great to add some UT of the added functionality.

dataDir string
}

func NewPCIAllocator(dataDir string) PCIAllocation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m(iiinor)nit: docstring

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method should IMO return *PCIAllocator

the caller should store it in a PCIAllocation interface

pkg/utils/pci_allocator.go Outdated Show resolved Hide resolved
pkg/utils/pci_allocator.go Outdated Show resolved Hide resolved
pkg/utils/pci_allocator.go Show resolved Hide resolved
pkg/config/config.go Show resolved Hide resolved
This is needed to lock the allocation of the same PCI address until the cmdDel is called or the kernel remove the network namespace.

Signed-off-by: Sebastian Sch <sebassch@gmail.com>
@SchSeba SchSeba force-pushed the add_pciaddress_lock branch from 802b9ad to 1ec4511 Compare November 14, 2022 16:48
@SchSeba
Copy link
Collaborator Author

SchSeba commented Nov 14, 2022

Hi @adrianchiris thanks for the great comments!

when you have time please have a look and let me know if you have any other comments

@SchSeba SchSeba force-pushed the add_pciaddress_lock branch from 1ec4511 to 19e50d9 Compare November 14, 2022 16:51
…i addresses

Fixes: k8snetworkplumbingwg#219

Signed-off-by: Sebastian Sch <sebassch@gmail.com>
@SchSeba SchSeba force-pushed the add_pciaddress_lock branch from 19e50d9 to 1ae1fea Compare November 14, 2022 16:55
@adrianchiris
Copy link
Contributor

Thanks for addressing my comments ! LGTM

@adrianchiris adrianchiris merged commit 2aea1bb into k8snetworkplumbingwg:master Nov 17, 2022
ormergi added a commit to ormergi/kubevirtci that referenced this pull request Nov 21, 2022
Signed-off-by: Or Mergi <ormergi@redhat.com>
EdDev added a commit to EdDev/kubevirt that referenced this pull request Nov 22, 2022
Place a delay between tests to assure resources (VF/s) are
fully released before reused again on a new VMI.
(results show that waiting for VMI/s to disappear is not enough)
Ref: k8snetworkplumbingwg/sriov-cni#219

This workaround should be temporary until the fix [1] can be consumed.
[1] k8snetworkplumbingwg/sriov-cni#220

Signed-off-by: Edward Haas <edwardh@redhat.com>
EdDev added a commit to EdDev/kubevirt that referenced this pull request Nov 22, 2022
Place a delay between tests to assure resources (VF/s) are
fully released before reused again on a new VMI.
(results show that waiting for VMI/s to disappear is not enough)
Ref: k8snetworkplumbingwg/sriov-cni#219

This workaround should be temporary until the fix [1] can be consumed.
[1] k8snetworkplumbingwg/sriov-cni#220

Signed-off-by: Edward Haas <edwardh@redhat.com>
ormergi added a commit to ormergi/kubevirtci that referenced this pull request Nov 29, 2022
Following the flakes we have all over SR-IOV lanes due to [1] and [2],
bump sriov-cni to v2.7.0 in order to consume the fix [3].

[1] kubevirt/kubevirt#6776
[2] k8snetworkplumbingwg/sriov-cni#219
[3] k8snetworkplumbingwg/sriov-cni#220

Signed-off-by: Or Mergi <ormergi@redhat.com>
kubevirt-bot pushed a commit to kubevirt/kubevirtci that referenced this pull request Dec 1, 2022
Following the flakes we have all over SR-IOV lanes due to [1] and [2],
bump sriov-cni to v2.7.0 in order to consume the fix [3].

[1] kubevirt/kubevirt#6776
[2] k8snetworkplumbingwg/sriov-cni#219
[3] k8snetworkplumbingwg/sriov-cni#220

Signed-off-by: Or Mergi <ormergi@redhat.com>

Signed-off-by: Or Mergi <ormergi@redhat.com>
thom311 pushed a commit to thom311/k8snetworkplumbingwg-sriov-cni that referenced this pull request Jun 3, 2024
* Create allocation interface and implementation.
This is needed to lock the allocation of the same PCI address until the cmdDel is called or the kernel remove the network namespace.

Signed-off-by: Sebastian Sch <sebassch@gmail.com>

* Use the allocator interface to prevent allocation of still in used pci addresses

Fixes: k8snetworkplumbingwg#219

Signed-off-by: Sebastian Sch <sebassch@gmail.com>

Signed-off-by: Sebastian Sch <sebassch@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VFs may get reseted after being allocated by other pod
7 participants