-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update use of CDI API #12825
Update use of CDI API #12825
Conversation
libpod/container_internal_linux.go
Outdated
registry := cdi.GetRegistry() | ||
unresolved, err := registry.InjectDevices(g.Config, c.config.CDIDevices...) | ||
if len(unresolved) > 0 { | ||
logrus.Debugf("Could not resolve CDI devices: %v", unresolved) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this ought to be more serious than a Debugf? logrus.Warnf
maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should probably be a warning. This would indicate that the state of the registry has changed between determining the list of CDI devices (where GetDevice
is called) and here where the actual injection takes place.
One question I did have was whether there would be a recommended way to propagate a reference to the registry from MakeContainer
in create_container.go
to the use here in generateSpec
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The container's image name and image ID are stored in its configuration at this point, but there's no guarantee those are actually populated, Podman supports creating images from user-specified root filesystems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "registry" in this case I mean a cache of the CDI devices defined by spec files available on the system (see the interface here. The idea is that spec files in, for example, /etc/cdi/*.json
are loaded once and then these specifications are used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh. Sorry, registry usually means a very different thing for containers.
I checked, and there's no location for the spec files currently listed that I can find. If one were to be added, containers.conf
would be the most logical location - that's a centralized config file that we could put the path in, that all containers can access
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm still miscommunicating what I need to know.
When processing the original list of devices from the config in create_container.go
, we instantiate a cache that stores the state of the CDI specs for the system (defining where these are read from is currently out of scope). The question is whether there is a way to pass this instance of the cache along so that this is also used in generateSpec
in container_internal_linux.go
? The motivation for this is that the files defining the CDI definition may have changed on disk so that the list of CDI devices being processed here does not have the desired effect -- hence the warning.
To be clear, I don't think this is a blocker for this particular PR, but it may be good to consider how to plumb this through in future if there is something that makes sense and does not add too much additional overhead. On the other hand, maybe @klihub has insights into the lifetime of the singleton instantiated in the cdi
package when used at these two locations -- if we have some guarantees that these are in fact the same object then there would be no need to pass these through explicitly (except making the link clearer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are temporary files directories that you could use, but they're all internal to Libpod, so accessing them from pkg/specgen
could prove to be a problem. Is this a global directory, or a per-container one? We could definitely do a global CDI state directory inside Libpod's temporary files directory, and expose that to the world.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a global directory. For what it's worth, I have changed the implementation to only look at whether the device specified is a valid CDI device name. This means that we are no longer instantiating this cache twice. I think this is sufficient for the time being. If this becomes a requirement we can open a discussion as to how this should be done correctly.
/approve |
cc1e59f
to
ee090cc
Compare
@elezar is this still a draft? |
Ok lets get their approval. |
@elezar LGTM, with a nitpick related to CDI device reference detection. |
logrus.Debugf("CDI HasDevice Error: %v", err) | ||
} | ||
if err == nil && isCDIDevice { | ||
if d := cdiDeviceDB.GetDevice(device.Path); d != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: you could also do syntactic filtering/check for CDI device references here (using IsQualifiedName
).
One benefit of that would be that unresolvable CDI devices would go through the CDI resolution/injection attempt and come back as unresolved. This would potentially allow you to produce better/more contextually accurate error messages. AFAICT, with the current implementation such devices/errors are only caught later in the container creation pipeline, probably only in the runtime once it tries to inject the device into the container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to avoid relying on IsQualifiedName
here. One motivator is that I want to have some flexibility (in our library) for determining whether or not a device is a CDI device without requiring that clients such as podman be updated to support these changes. I understand that they would require an update, but this would be limited to updating a dependency.
Another motivation is that a fully-qualified device name does not mean that the device has an associated CDI spec.
example | fully qualified | has CDI spec | action at filtering | result at injection | |
---|---|---|---|---|---|
1 | nvidia.com/gpu=invalid |
yes | no | assumed to be standard device | errors with Error: stat nvidia.com/gpu=invalid: no such file or directory |
2 | nvidia.com/gpu=gpu0 |
yes | yes | identified as CDI device | resolves and injects modifications for nvidia.com/gpu=gpu0 |
3 | gpu0 |
no | yes | assumed to be standard device (CDI currently requires a fully-qualified name) | errors with Error: stat gpu0: no such file or directory |
4 | /dev/nvidia0 |
no | no | assumed to be standard device | injects standard device node /dev/nvidia0 |
Using IsQualifiedName
would change the filtering behaviour for 1 and would result in a warning and the container being launched without injecting the device. Is this better behaviour than exiting with the error shown? Is it more desirable to have the contains start with no modifications to the spec than to error out for an invalid device?
Thinking about this a bit more, I think raising an error for unresolved CDI devices instead of just issuing a warning would be more in keeping with the current behaviour.
Update: The injection will fail as an unresolved device is considered an error by InjectDevices
used to make the spec modifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not how it would behave IMO. If device resolution/injection fails for any of the devices, inject will also return a non-nil error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. Good point. I just confirmed that InjectDevices
also returns an error if unresolved
is non-empty meaning that injection will fail for any unresolved devices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@klihub I have reworked the implementation to use IsQualifiedName
as discussed. I abstracted this behind a function isCDIDevice
which could later be moved into our API if this makes sense.
This also removes the requirement that the registry be instantiated here meaning that the error presented to the user should be clearer. When selecting a fully-qualified but unresolvable CDI device we now see:
sudo ./bin/podman run -ti --rm --net=host --device=nvidia.com/gpu=gpu3 ubuntu:18.04 nvi
dia-smi -L
WARN[0000] Ignoring global metacopy option, not supported with booted kernel
Error: error setting up CDI devices: unresolvable CDI devices nvidia.com/gpu=gpu3
UPDATE: (Note I have removed the warning that was printed as it repeats what is in the error message)
If there were errors creating the registry these are also shown as a warning available in the debug logs (in the event of resolution errors).
opts = ExtractCDIDevices(s) | ||
registry := cdi.GetRegistry() | ||
if errs := registry.GetErrors(); len(errs) > 0 { | ||
logrus.Debugf("Error creating CDI Registry: %v", errs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we fail container creation here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. These errors may not be related to the devices that we're interested in. For example, the spec from vendorA
may be malformed, but we are using devices from vendorB
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, makes sense
cb00a64
to
92b85db
Compare
logrus.Warnf("The following errors were triggered when creating the CDI Registry: %v", errs) | ||
} | ||
} | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the err check should happen before the data check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we return on err != nil
we don't provide the warning / error output. With that said, I could include the output in the if err != nil
block as there would (or at least should) not be a case where unresolved
is non-empty and err == nil
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I have simplified things a bit. The error message will already show the unresolved devices, so there is no need to print them again. I am now also relying on the Debugf
output to ensure that registry creation errors are available to the user instead of repeating them.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: elezar, mheon, rhatdan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This change updates the CDI API to commit 46367ec063fda9da931d050b308ccd768e824364 which addresses some inconistencies in the previous implementation. Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
/lgtm |
@bart0sh: changing LGTM is restricted to collaborators In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/lgtm |
Catching up with this now, how does this address being able to dynamically change which GPUs are passed to which podman instances? |
@vans163 your question seems out of scope of this PR which was "just" updating the CDI dependency. Then, what do you mean by "instances"? If you mean containers you should be able to start three of them:
If this is not what you mean, please open an issue that describes your use case and how you would expect this to work. |
Yea this is exactly what I mean, so I would need to register devices 0-3 over here?
Because then wont the envvar "NVIDIA_VISIBLE_DEVICES={gpuX}" conflict? Also does this mean we can keep cgroups enabled in nvidia-container-runtime?
|
Having a bit of difficulty with this, running: I get: Though I can confirm I have the exact file in |
@TheFloatingBrain We have a CDI test utility in the repo that you could use for diagnosis. You can use it to get the list of vendors or devices your CDI Specs declare, check whether your Specs have any errors, test how an OCI Spec gets modified when you inject a particular CDI device into it, and for a few other things. You can give it a try by cloning the CDI repo ( |
This change updates implementation from #10081 to use the CDI API changes by @klihub merged in cncf-tags/container-device-interface@46367ec. This also serves to indicate how CDI could be used to address #11088
From a user perspective the major change is that the device name specified through the
podman
command line (when executingpodman run
) must be a fully-qualified CDI device name.This can be tested on an NVIDIA GPU-based system assuming the following:
nvidia-container-toolkit
is installed/etc/cdi/nvidia.json
Note: this could also be a YAML file:
/etc/cdi/nvidia.yaml
:In which case selecting the
nvidia.com/gpu=gpu0
device yields: