Add GPU specific Tags (red-hat-data-services#1938)

* add nvidia tag for all the tests with Resources-GPU tag * add amd gpu tag in vllm tests * remove wrong comment
bdattoma · Oct 17, 2024 · 16c0b68 · 16c0b68
1 parent 4950bcc
commit 16c0b68
Show file tree

Hide file tree

Showing 17 changed files with 61 additions and 61 deletions.
diff --git a/ods_ci/tests/Tests/0100__platform/0101__deploy/0101__installation/0101__post_install.robot b/ods_ci/tests/Tests/0100__platform/0101__deploy/0101__installation/0101__post_install.robot
@@ -53,7 +53,7 @@ Verify Notebook Controller Deployment
 Verify GPU Operator Deployment  # robocop: disable
     [Documentation]  Verifies Nvidia GPU Operator is correctly installed
     [Tags]  Sanity    Tier1
-    ...     Resources-GPU  # Not actually needed, but we first need to enable operator install by default
+    ...     Resources-GPU    NVIDIA-GPUs  # Not actually needed, but we first need to enable operator install by default
     ...     ODS-1157
 
     # Before GPU Node is added to the cluster

diff --git a/..._ods_dashboard/0410__ods_dashboard_projects/0410__ods_dashboard_projects_additional.robot b/..._ods_dashboard/0410__ods_dashboard_projects/0410__ods_dashboard_projects_additional.robot
@@ -84,7 +84,7 @@ Verify Notebook Tolerations Are Applied To Workbenches
 Verify User Can Add GPUs To Workbench
     [Documentation]    Verifies user can add GPUs to an already started workbench
     [Tags]    Tier1    Sanity
-    ...       ODS-2013    Resources-GPU
+    ...       ODS-2013    Resources-GPU    NVIDIA-GPUs
     Launch Data Science Project Main Page
     Create Workbench    workbench_title=${WORKBENCH_TITLE_GPU}  workbench_description=${EMPTY}
     ...    prj_title=${PRJ_TITLE}    image_name=${NB_IMAGE_GPU}   deployment_size=Small
@@ -108,7 +108,7 @@ Verify User Can Add GPUs To Workbench
 Verify User Can Remove GPUs From Workbench
     [Documentation]    Verifies user can remove GPUs from an already started workbench
     [Tags]    Tier1    Sanity
-    ...       ODS-2014    Resources-GPU
+    ...       ODS-2014    Resources-GPU    NVIDIA-GPUs
     Launch Data Science Project Main Page
     Create Workbench    workbench_title=${WORKBENCH_TITLE_GPU}  workbench_description=${EMPTY}
     ...    prj_title=${PRJ_TITLE}    image_name=${NB_IMAGE_GPU}   deployment_size=Small

diff --git a/ods_ci/tests/Tests/0500__ide/0501__ide_jupyterhub/autoscaling-gpus.robot b/ods_ci/tests/Tests/0500__ide/0501__ide_jupyterhub/autoscaling-gpus.robot
@@ -11,7 +11,7 @@ Resource          ../../../Resources/Page/OCPDashboard/Pods/Pods.robot
 Library           JupyterLibrary
 Suite Setup       Spawner Suite Setup
 Suite Teardown    End Web Test
-Test Tags         Resources-GPU
+Test Tags         Resources-GPU    NVIDIA-GPUs
 
 
 *** Variables ***

diff --git a/ods_ci/tests/Tests/0500__ide/0501__ide_jupyterhub/minimal-cuda-test.robot b/ods_ci/tests/Tests/0500__ide/0501__ide_jupyterhub/minimal-cuda-test.robot
@@ -22,43 +22,43 @@ Verify CUDA Image Can Be Spawned With GPU
     [Documentation]    Spawns CUDA image with 1 GPU and verifies that the GPU is
     ...    not available for other users.
     [Tags]  Sanity
-    ...     Resources-GPU
+    ...     Resources-GPU    NVIDIA-GPUs
     ...     ODS-1141    ODS-346    ODS-1359
     Pass Execution    Passing tests, as suite setup ensures that image can be spawned
 
 Verify CUDA Image Includes Expected CUDA Version
     [Documentation]    Checks CUDA version
     [Tags]  Sanity
-    ...     Resources-GPU
+    ...     Resources-GPU    NVIDIA-GPUs
     ...     ODS-1142
     Verify Installed CUDA Version    ${EXPECTED_CUDA_VERSION}
 
 Verify PyTorch Library Can See GPUs In Minimal CUDA
     [Documentation]    Installs PyTorch and verifies it can see the GPU
     [Tags]  Sanity
-    ...     Resources-GPU
+    ...     Resources-GPU    NVIDIA-GPUs
     ...     ODS-1144
     Verify Pytorch Can See GPU    install=True
 
 Verify Tensorflow Library Can See GPUs In Minimal CUDA
     [Documentation]    Installs Tensorflow and verifies it can see the GPU
     [Tags]  Sanity
-    ...     Resources-GPU
+    ...     Resources-GPU    NVIDIA-GPUs
     ...     ODS-1143
     Verify Tensorflow Can See GPU    install=True
 
 Verify Cuda Image Has NVCC Installed
     [Documentation]     Verifies NVCC Version in Minimal CUDA Image
     [Tags]  Sanity
-    ...     Resources-GPU
+    ...     Resources-GPU    NVIDIA-GPUs
     ...     ODS-483
     ${nvcc_version} =  Run Cell And Get Output    input=!nvcc --version
     Should Not Contain    ${nvcc_version}  /usr/bin/sh: nvcc: command not found
 
 Verify Previous CUDA Notebook Image With GPU
     [Documentation]    Runs a workload after spawning the N-1 CUDA Notebook
     [Tags]    Tier2    LiveTesting
-    ...       Resources-GPU
+    ...       Resources-GPU    NVIDIA-GPUs
     ...       ODS-2128
     [Setup]    N-1 CUDA Setup
     Spawn Notebook With Arguments    image=${NOTEBOOK_IMAGE}    size=Small    gpus=1    version=previous
@@ -90,7 +90,7 @@ Verify CUDA Image Suite Setup
     # This will fail in case there are two nodes with the same number of GPUs
     # Since the overall available number won't change even after 1 GPU is assigned
     # However I can't think of a better way to execute this check, under the assumption that
-    # the Resources-GPU tag will always ensure there is 1 node with 1 GPU on the cluster.
+    # the Resources-GPU will always ensure there is 1 node with 1 GPU on the cluster.
     ${maxNo} =    Find Max Number Of GPUs In One Node
     ${maxSpawner} =    Fetch Max Number Of GPUs In Spawner Page
     # Need to continue execution even on failure or the whole suite will be failed

diff --git a/ods_ci/tests/Tests/0500__ide/0501__ide_jupyterhub/minimal-pytorch-test.robot b/ods_ci/tests/Tests/0500__ide/0501__ide_jupyterhub/minimal-pytorch-test.robot
@@ -49,7 +49,7 @@ Verify Tensorboard Is Accessible
 Verify PyTorch Image Can Be Spawned With GPU
     [Documentation]    Spawns PyTorch image with 1 GPU
     [Tags]  Tier1
-    ...     Resources-GPU
+    ...     Resources-GPU    NVIDIA-GPUs
     ...     ODS-1145
     Clean Up Server
     Stop JupyterLab Notebook Server
@@ -60,28 +60,28 @@ Verify PyTorch Image Can Be Spawned With GPU
 Verify PyTorch Image Includes Expected CUDA Version
     [Documentation]    Checks CUDA version
     [Tags]  Tier1
-    ...     Resources-GPU
+    ...     Resources-GPU    NVIDIA-GPUs
     ...     ODS-1146
     Verify Installed CUDA Version    ${EXPECTED_CUDA_VERSION}
 
 Verify PyTorch Library Can See GPUs In PyTorch Image
     [Documentation]    Verifies PyTorch can see the GPU
     [Tags]  Tier1
-    ...     Resources-GPU
+    ...     Resources-GPU    NVIDIA-GPUs
     ...     ODS-1147
     Verify Pytorch Can See GPU
 
 Verify PyTorch Image GPU Workload
     [Documentation]  Runs a workload on GPUs in PyTorch image
     [Tags]  Tier1
-    ...     Resources-GPU
+    ...     Resources-GPU    NVIDIA-GPUs
     ...     ODS-1148
     Run Repo And Clean  https://github.com/lugi0/notebook-benchmarks  notebook-benchmarks/pytorch/fgsm_tutorial.ipynb
 
 Verify Previous PyTorch Notebook Image With GPU
     [Documentation]    Runs a workload after spawning the N-1 PyTorch Notebook
     [Tags]    Tier2    LiveTesting
-    ...       Resources-GPU
+    ...       Resources-GPU    NVIDIA-GPUs
     ...       ODS-2129
     [Setup]    N-1 PyTorch Setup
     Spawn Notebook With Arguments    image=${NOTEBOOK_IMAGE}    size=Small    gpus=1    version=previous

diff --git a/ods_ci/tests/Tests/0500__ide/0501__ide_jupyterhub/minimal-tensorflow-test.robot b/ods_ci/tests/Tests/0500__ide/0501__ide_jupyterhub/minimal-tensorflow-test.robot
@@ -50,36 +50,36 @@ Verify Tensorboard Is Accessible
 Verify Tensorflow Image Can Be Spawned With GPU
     [Documentation]    Spawns PyTorch image with 1 GPU
     [Tags]  Tier1
-    ...     Resources-GPU
+    ...     Resources-GPU    NVIDIA-GPUs
     ...     ODS-1151
     Close Previous Server
     Spawn Notebook With Arguments  image=${NOTEBOOK_IMAGE}  size=Small  gpus=1
 
 Verify Tensorflow Image Includes Expected CUDA Version
     [Documentation]    Checks CUDA version
     [Tags]  Tier1
-    ...     Resources-GPU
+    ...     Resources-GPU    NVIDIA-GPUs
     ...     ODS-1152
     Verify Installed CUDA Version    ${EXPECTED_CUDA_VERSION}
 
 Verify Tensorflow Library Can See GPUs In Tensorflow Image
     [Documentation]    Verifies Tensorlow can see the GPU
     [Tags]  Tier1
-    ...     Resources-GPU
+    ...     Resources-GPU    NVIDIA-GPUs
     ...     ODS-1153
     Verify Tensorflow Can See GPU
 
 Verify Tensorflow Image GPU Workload
     [Documentation]  Runs a workload on GPUs in Tensorflow image
     [Tags]  Tier1
-    ...     Resources-GPU
+    ...     Resources-GPU    NVIDIA-GPUs
     ...     ODS-1154
     Run Repo And Clean  https://github.com/lugi0/notebook-benchmarks  notebook-benchmarks/tensorflow/GPU-no-warnings.ipynb
 
 Verify Previous Tensorflow Notebook Image With GPU
     [Documentation]    Runs a workload after spawning the N-1 Tensorflow Notebook
     [Tags]    Tier2    LiveTesting
-    ...       Resources-GPU
+    ...       Resources-GPU    NVIDIA-GPUs
     ...       ODS-2130
     [Setup]    N-1 Tensorflow Setup
     Spawn Notebook With Arguments    image=${NOTEBOOK_IMAGE}    size=Small    gpus=1    version=previous

diff --git a/ods_ci/tests/Tests/0500__ide/0501__ide_jupyterhub/multiple-gpus.robot b/ods_ci/tests/Tests/0500__ide/0501__ide_jupyterhub/multiple-gpus.robot
@@ -22,7 +22,7 @@ Verify Number Of Available GPUs Is Correct
     [Documentation]  Verifies that the number of available GPUs in the
     ...    Spawner dropdown is correct; i.e., it should show the maximum
     ...    Number of GPUs available in a single node.
-    [Tags]    Sanity  Resources-2GPUS
+    [Tags]    Sanity  Resources-2GPUS    NVIDIA-GPUs
     ...       ODS-1256
     ${maxNo} =    Find Max Number Of GPUs In One Node
     ${maxSpawner} =    Fetch Max Number Of GPUs In Spawner Page
@@ -31,7 +31,7 @@ Verify Number Of Available GPUs Is Correct
 Verify Two Servers Can Be Spawned
     [Documentation]    Spawns two servers requesting 1 gpu each, and checks
     ...    that both can schedule and are scheduled on different nodes.
-    [Tags]    Sanity  Resources-2GPUS
+    [Tags]    Sanity  Resources-2GPUS    NVIDIA-GPUs
     ...       ODS-1257
     Spawn Notebook With Arguments  image=${NOTEBOOK_IMAGE}  size=Small  gpus=1
     ${serial_first} =    Get GPU Serial Number

diff --git a/.../tests/Tests/0600__distributed_workloads/0602__training/test-run-tuning-stack-tests.robot b/.../tests/Tests/0600__distributed_workloads/0602__training/test-run-tuning-stack-tests.robot
@@ -31,7 +31,7 @@ Run Training operator ODH test base LoRA use case
 # Run Training operator ODH test base QLoRA use case
 #     [Documentation]    Run Go ODH tests for Training operator base QLoRA use case
 #     [Tags]  RHOAIENG-13142
-#     ...     Resources-GPU
+#     ...     Resources-GPU    NVIDIA-GPUs
 #     ...     Tier1
 #     ...     DistributedWorkloads
 #     ...     Training

diff --git a/ods_ci/tests/Tests/0600__distributed_workloads/test-run-distributed-workloads-tests.robot b/ods_ci/tests/Tests/0600__distributed_workloads/test-run-distributed-workloads-tests.robot
@@ -25,7 +25,7 @@ Run TestKueueRayCpu ODH test
 
 Run TestKueueRayGpu ODH test
     [Documentation]    Run Go ODH test: TestKueueRayGpu
-    [Tags]  Resources-GPU
+    [Tags]  Resources-GPU    NVIDIA-GPUs
     ...     Tier1
     ...     DistributedWorkloads
     ...     Training
@@ -43,7 +43,7 @@ Run TestRayTuneHPOCpu ODH test
 
 Run TestRayTuneHPOGpu ODH test
     [Documentation]    Run Go ODH test: TestMnistRayTuneHpoGpu
-    [Tags]  Resources-GPU
+    [Tags]  Resources-GPU    NVIDIA-GPUs
     ...     Tier1
     ...     DistributedWorkloads
     ...     Training
@@ -62,7 +62,7 @@ Run TestKueueCustomRayCpu ODH test
 Run TestKueueCustomRayGpu ODH test
     [Documentation]    Run Go ODH test: TestKueueCustomRayGpu
     [Tags]  RHOAIENG-10013
-    ...     Resources-GPU
+    ...     Resources-GPU    NVIDIA-GPUs
     ...     Tier1
     ...     DistributedWorkloads
     ...     Training

diff --git a/ods_ci/tests/Tests/1000__model_serving/1002__model_serving_modelmesh_gpu.robot b/ods_ci/tests/Tests/1000__model_serving/1002__model_serving_modelmesh_gpu.robot
@@ -25,7 +25,7 @@ ${RUNTIME_NAME}=    Model Serving GPU Test
 *** Test Cases ***
 Verify GPU Model Deployment Via UI    # robocop: off=too-long-test-case,too-many-calls-in-test-case
     [Documentation]    Test the deployment of an openvino_ir model on a model server with GPUs attached
-    [Tags]    Sanity    Resources-GPU
+    [Tags]    Sanity    Resources-GPU    NVIDIA-GPUs
     ...       ODS-2214
     Clean All Models Of Current User
     Open Data Science Projects Home Page
@@ -57,7 +57,7 @@ Verify GPU Model Deployment Via UI    # robocop: off=too-long-test-case,too-many
 
 Test Inference Load On GPU
     [Documentation]    Test the inference load on the GPU after sending random requests to the endpoint
-    [Tags]    Sanity    Resources-GPU
+    [Tags]    Sanity    Resources-GPU    NVIDIA-GPUs
     ...       ODS-2213
     ${url}=    Get Model Route Via UI    ${MODEL_NAME}
     Send Random Inference Request     endpoint=${url}    no_requests=100

diff --git a/ods_ci/tests/Tests/1000__model_serving/1005__model_serving_ovms_on_kserve.robot b/ods_ci/tests/Tests/1000__model_serving/1005__model_serving_ovms_on_kserve.robot
@@ -104,7 +104,7 @@ Verify Multiple Projects With Same Model (OVMS on Kserve)
 
 Verify GPU Model Deployment Via UI (OVMS on Kserve)    # robocop: off=too-long-test-case,too-many-calls-in-test-case
     [Documentation]    Test the deployment of an openvino_ir model on a model server with GPUs attached
-    [Tags]    Tier1    Resources-GPU
+    [Tags]    Tier1    Resources-GPU    NVIDIA-GPUs
     ...       ODS-2630    ODS-2631    ProductBug    RHOAIENG-3355
     ${requests}=    Create Dictionary    nvidia.com/gpu=1
     ${limits}=    Create Dictionary    nvidia.com/gpu=1

diff --git a/ods_ci/tests/Tests/1000__model_serving/1007__model_serving_llm/1007__model_serving_llm.robot b/ods_ci/tests/Tests/1000__model_serving/1007__model_serving_llm/1007__model_serving_llm.robot
@@ -339,7 +339,7 @@ Verify User Can Set Requests And Limits For A Model    # robocop: off=too-long-t
 Verify Model Can Be Served And Query On A GPU Node    # robocop: off=too-long-test-case,too-many-calls-in-test-case
     [Documentation]    Basic tests for preparing, deploying and querying a LLM model on GPU node
     ...                using Kserve and Caikit+TGIS runtime
-    [Tags]    Sanity    ODS-2381    Resources-GPU
+    [Tags]    Sanity    ODS-2381    Resources-GPU    NVIDIA-GPUs
     [Setup]    Set Project And Runtime    namespace=singlemodel-gpu
     ${test_namespace}=    Set Variable    singlemodel-gpu
     ${model_name}=    Set Variable    flan-t5-small-caikit

diff --git a/.../tests/Tests/1000__model_serving/1007__model_serving_llm/1007__model_serving_llm_UI.robot b/.../tests/Tests/1000__model_serving/1007__model_serving_llm/1007__model_serving_llm_UI.robot
@@ -151,7 +151,7 @@ Verify User Can Set Requests And Limits For A Model Using The UI    # robocop: o
 Verify Model Can Be Served And Query On A GPU Node Using The UI    # robocop: off=too-long-test-case
     [Documentation]    Basic tests for preparing, deploying and querying a LLM model on GPU node
     ...                using Kserve and Caikit+TGIS runtime
-    [Tags]    Sanity    ODS-2523   Resources-GPU
+    [Tags]    Sanity    ODS-2523   Resources-GPU    NVIDIA-GPUs
     [Setup]    Set Up Project    namespace=singlemodel-gpu
     ${test_namespace}=    Set Variable    singlemodel-gpu
     ${model_name}=    Set Variable    flan-t5-small-caikit