Add LXD GPU passthrough tests (New) #1577

pedro-avalos · 2024-11-04T22:18:00Z

Description

Created gpu_passthrough.py within GPGPU provider
Added LXD container tests for GPU passthrough setups.
Created LXD and LXDVM classes that can be used to wrap LXD container or LXD virtual machine
- These could probably go into checkbox-support ?

Resolved issues

Addresses CHECKBOX-1451 and CHECKBOX-970.

Documentation

n/a

Tests

Tested locally on a laptop with NVIDIA GPU. Tested on torchtusk as well

Submission from torchtusk: https://certification.canonical.com/submissions/status/293943

codecov · 2024-11-04T22:21:15Z

Codecov Report

Attention: Patch coverage is 94.90741% with 11 lines in your changes missing coverage. Please review.

Project coverage is 91.13%. Comparing base (bdf6739) to head (d08cf53).
Report is 7 commits behind head on main.

Files with missing lines	Patch %	Lines
providers/gpgpu/bin/gpu_passthrough.py	94.90%	10 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1577       +/-   ##
===========================================
+ Coverage   48.03%   91.13%   +43.10%     
===========================================
  Files         371        3      -368     
  Lines       39850      327    -39523     
  Branches     6734       38     -6696     
===========================================
- Hits        19140      298    -18842     
+ Misses      19993       28    -19965     
+ Partials      717        1      -716

Flag	Coverage Δ
checkbox-ng	`?`
checkbox-support	`?`
contrib-provider-ce-oem	`?`
provider-base	`?`
provider-certification-client	`?`
provider-certification-server	`?`
provider-genio	`?`
provider-gpgpu	`91.13% <94.90%> (+7.34%)`	⬆️
provider-iiotg	`?`
provider-resource	`?`
provider-sru	`?`
release-tools	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Hook25

Given that this is a big step in the right direction imo to clean up a bit the virtualization-oriented tests, I have a few suggestions here and there as to how I would change it further. See if they make sense.

providers/gpgpu/bin/gpu_passthrough.py

providers/gpgpu/units/test-plan.pxu

pedro-avalos · 2024-11-07T21:29:39Z

Hm, the nvidia-persistenced.service is not starting up for the VM test.

The LXD and LXDVM classes may be useful in checkbox-support, as other tests may be able to benefit from these classes.

Otherwise the test is not necessarily idempotent

I guess these are newer than I thought...

This should at the very least help speed up the setup process since X11 is not needed in the vm

This is not needed anymore since CUDA Toolkit is not being installed on the VM

This is not working as intended, will add in a separate PR

Hook25

Consider adding these in the boostrap include section, they wont be run only if the template doesn't expand

providers/gpgpu/units/test-plan.pxu

* Add initial gpu_passthrough program The LXD and LXDVM classes may be useful in checkbox-support, as other tests may be able to benefit from these classes. * Add initial coverage tests * Fix parse_args function * Add Checkbox units * Fix typo in unit * Fix typo in LXD.launch * Force delete in cleanup Otherwise the test is not necessarily idempotent * Ensure insert_images is called * Fix nvidia repo url * Fix nvidia pinfile url * Fix symlink name * Pass NVIDIA runtime at launch * Format gpu_passthrough.py * Don't use dataclasses I guess these are newer than I thought... * Remove unsupported typehints * Document that parameters can be overwritten * Use cached property * Rewrite run function * Add retry decorator to download_image * Add type hint to launch * Rename jobs to gpgpu-passthrough * Update tests * Install mixbench snap * Make LXD and LXDVM context managers * Move setup to GPU_VENDORS fields * Update tests * Fix tests * Fix shlex join bug * No sudo needed Instance should be running as root * Make script a little more verbose * init_lxd is part of __enter__ * Don't just sleep for system to be up * Ensure nvidia capabilities are passed through This ensures mixbench is able to find the right CUDA libraries from the host. * Make type hints more accurate * image_alias not image * Update tests * Update tox file requirements * Add libsystemd-dev to tox workflow * shlex.join available in 3.8+ * Fix launch tests * Add more log messages * Increase wait for VM retries * Wait for VM to be up after adding GPU * Add wait until running function * Add test * Remove unused properties * Install gpgpu drivers on LXD vm This should at the very least help speed up the setup process since X11 is not needed in the vm * Add debug messages to run * Install linux-generic * remove todo message * Add options= to make it line clearer * Use default storage size for VM This is not needed anymore since CUDA Toolkit is not being installed on the VM * Auto-connect request granted * Compatibility with jammy and prior * Remove LXDVM passthrough test This is not working as intended, will add in a separate PR * Ensure nvidia driver is present * Add units to bootstrap_include

pedro-avalos added the enhancement New feature or request label Nov 4, 2024

pedro-avalos marked this pull request as ready for review November 5, 2024 01:35

pedro-avalos requested a review from fernando79513 November 5, 2024 01:36

Hook25 requested changes Nov 7, 2024

View reviewed changes

fernando79513 assigned Hook25 Nov 8, 2024

pedro-avalos force-pushed the add-lxd-gpu-tests branch from 454a410 to 22eb1b7 Compare November 8, 2024 15:50

pedro-avalos added 22 commits November 8, 2024 11:47

Add initial gpu_passthrough program

d8a07bd

The LXD and LXDVM classes may be useful in checkbox-support, as other tests may be able to benefit from these classes.

Add initial coverage tests

effc5b9

Fix parse_args function

b04e56f

Add Checkbox units

53cdb65

Fix typo in unit

836859a

Fix typo in LXD.launch

e2b6707

Force delete in cleanup

69adfc5

Otherwise the test is not necessarily idempotent

Ensure insert_images is called

6b39e8b

Fix nvidia repo url

9e60171

Fix nvidia pinfile url

bc0d45b

Fix symlink name

9fd170c

Pass NVIDIA runtime at launch

0e7230e

Format gpu_passthrough.py

cdbad84

Don't use dataclasses

aa324d4

I guess these are newer than I thought...

Remove unsupported typehints

5bf8472

Document that parameters can be overwritten

61e4cce

Use cached property

bf4c653

Rewrite run function

cacf352

Add retry decorator to download_image

b046cca

Add type hint to launch

c843376

Rename jobs to gpgpu-passthrough

efb243f

Update tests

17d79d5

pedro-avalos added 11 commits November 8, 2024 11:47

shlex.join available in 3.8+

2cecd39

Fix launch tests

7356323

Add more log messages

f284537

Increase wait for VM retries

b33a43f

Wait for VM to be up after adding GPU

906447c

Add wait until running function

e3899c0

Add test

230bb7e

Remove unused properties

064361c

Install gpgpu drivers on LXD vm

0d4f150

This should at the very least help speed up the setup process since X11 is not needed in the vm

Add debug messages to run

dba32b5

Install linux-generic

b2d5f0d

pedro-avalos force-pushed the add-lxd-gpu-tests branch from 19dcf43 to b2d5f0d Compare November 8, 2024 17:47

pedro-avalos added 6 commits November 8, 2024 12:06

remove todo message

a7d2d5e

Add options= to make it line clearer

09a88f1

Use default storage size for VM

1d012ca

This is not needed anymore since CUDA Toolkit is not being installed on the VM

Auto-connect request granted

573d8e9

Compatibility with jammy and prior

999902a

Remove LXDVM passthrough test

877d7d4

This is not working as intended, will add in a separate PR

pedro-avalos requested a review from Hook25 November 13, 2024 14:23

Ensure nvidia driver is present

8c4a112

Hook25 previously approved these changes Nov 14, 2024

View reviewed changes

providers/gpgpu/units/test-plan.pxu Show resolved Hide resolved

Add units to bootstrap_include

d08cf53

pedro-avalos dismissed Hook25’s stale review via d08cf53 November 14, 2024 14:39

pedro-avalos requested a review from Hook25 November 14, 2024 14:59

Hook25 approved these changes Nov 14, 2024

View reviewed changes

Hook25 merged commit 7c6b994 into main Nov 14, 2024
43 checks passed

Hook25 deleted the add-lxd-gpu-tests branch November 14, 2024 15:02

fernando79513 mentioned this pull request Nov 25, 2024

Create a test for SRIOV Intel NIC's (New) #1293

Open

pedro-avalos mentioned this pull request Dec 9, 2024

Move LXD and LXDVM to checkbox-support (New) #1645

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LXD GPU passthrough tests (New) #1577

Add LXD GPU passthrough tests (New) #1577

pedro-avalos commented Nov 4, 2024 •

edited

Loading

codecov bot commented Nov 4, 2024 •

edited

Loading

Hook25 left a comment

pedro-avalos commented Nov 7, 2024

Hook25 left a comment

Add LXD GPU passthrough tests (New) #1577

Add LXD GPU passthrough tests (New) #1577

Conversation

pedro-avalos commented Nov 4, 2024 • edited Loading

Description

Resolved issues

Documentation

Tests

codecov bot commented Nov 4, 2024 • edited Loading

Codecov Report

Hook25 left a comment

Choose a reason for hiding this comment

pedro-avalos commented Nov 7, 2024

Hook25 left a comment

Choose a reason for hiding this comment

pedro-avalos commented Nov 4, 2024 •

edited

Loading

codecov bot commented Nov 4, 2024 •

edited

Loading