-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tests for the NVIDIA GPU tests #433
Conversation
} | ||
|
||
func verifyNvidiaInstallation(c cluster.TestCluster) { | ||
m := c.Machines()[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With
m := c.Machines()[0] | |
if kola.AzureOptions.Size != "Standard_NC6s_v3" { | |
c.Skip("skipping due to wrong instance size") | |
} | |
m := c.Machines()[0] |
I think we could do this in scripts:
diff --git a/ci-automation/vendor-testing/azure.sh b/ci-automation/vendor-testing/azure.sh
index 17081b3598..c57210d651 100755
--- a/ci-automation/vendor-testing/azure.sh
+++ b/ci-automation/vendor-testing/azure.sh
@@ -74,7 +74,7 @@ query_kola_tests() {
kola list --platform=azure --filter "${@}"
}
-other_instance_types=()
+other_instance_types=('Standard_NC6s_v3')
if [[ "${CIA_ARCH}" = 'amd64' ]]; then
other_instance_types+=('V1')
fi
@@ -85,6 +85,6 @@ run_kola_tests_on_instances \
"${CIA_FIRST_RUN}" \
"${other_instance_types[@]}" \
'--' \
- 'cl.internet' \
+ 'cl.internet' 'cl.misc.nvidia' \
'--' \
"${@}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to keep this check, we could even move it to the SkipFunc
(as kola.AzureOptions.Size
should be accessible)
811dc81
to
3def6fb
Compare
// This test is to test the NVIDIA installation, limited to AZURE for now | ||
Platforms: []string{"azure"}, | ||
Architectures: []string{"amd64"}, | ||
Flags: []register.Flag{register.NoEnableSelinux}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance to track the avc
message? (for issue tracking and/or documentation purpose?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! I will need to search the logs, if it's not there then I can trigger a local/Jenkins build to track the avc
messages for tracking on a issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should see it in the journal logs (before disabling the SELinux flags)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I meant searching the journal logs if they are not deleted on bincache.
out, err := c.SSH(m, "systemctl is-active nvidia.service") | ||
if !bytes.Contains(out, []byte("inactive")) { | ||
return fmt.Errorf("nvidia.service: %q: %v", out, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI, you could do this way:
out, err := c.SSH(m, "systemctl is-active nvidia.service") | |
if !bytes.Contains(out, []byte("inactive")) { | |
return fmt.Errorf("nvidia.service: %q: %v", out, err) | |
} | |
_, err := c.SSH(m, "systemctl --quiet is-active nvidia.service") | |
return err |
we just want to assert if the unit is active or not, if the unit is active with --quiet
it will simply return 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also _ = c.MustSSH(…)
which fails the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct but in this case we don't want to fail the test as we are in a retry loop no?
Signed-off-by: Sayan Chowdhury <schowdhury@microsoft.com>
Signed-off-by: Sayan Chowdhury <schowdhury@microsoft.com>
d04c5d0
to
4cd37d7
Compare
Add tests for the NVIDIA GPU tests
Testing done
changelog/
directory (user-facing change, bug fix, security fix, update)/boot
and/usr
size, packages, list files for any missing binaries, kernel modules, config files, kernel modules, etc.