Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating cudnn from 8 to 9 on exsiting cuda 12 docker image #20925

Merged
merged 16 commits into from
Jun 11, 2024
Merged

Conversation

jchen351
Copy link
Contributor

@jchen351 jchen351 commented Jun 4, 2024

Description

Adding support of cudnn 9

Motivation and Context

Keep exsiting cuda 12.2 with nvidia dirver 535

jywu-msft
jywu-msft previously approved these changes Jun 6, 2024
Copy link
Member

@snnn snnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You installed a RHEL9 package to RHEL8, which usually would not work. Please modify the base image instead. The base image already has a cudnn package, we should replace that one. And we should get packages from a package manager(dnf) instead of wget.
And keep in mind that we need to replicate this change to ORT-extension and also ORT-GenAI, that's why I suggest modifying the base image instead.

RUN if [ $(echo $CUDA_VERSION | cut -d"." -f1) -ge 12 ]; then \
    dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo &&\
    dnf clean all &&\
    dnf -y install --allowerasing cudnn9-cuda-12 ; \
fi
@jchen351 jchen351 added ep:CUDA issues related to the CUDA execution provider release:1.18.1 labels Jun 6, 2024
@jchen351 jchen351 requested a review from snnn June 6, 2024 21:31
@snnn
Copy link
Member

snnn commented Jun 6, 2024

Would you mind modifying the base image instead?

Copy link
Member

@snnn snnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please manually generate the build image and tell us what packages were installed in the image. Please provide the information so that we can examine if the change you are making works as expected.

@jchen351 jchen351 requested a review from a team as a code owner June 7, 2024 02:01
@jywu-msft
Copy link
Member

Please manually generate the build image and tell us what packages were installed in the image. Please provide the information so that we can examine if the change you are making works as expected.

best to check all the artifacts produced by the package pipelines after this change.
I did check some of the artifacts and the libonnxruntime_providers_cuda.so and libonnxruntime_providers_tensorrt.so did indeed depend on cudnn.9

@jywu-msft
Copy link
Member

Please manually generate the build image and tell us what packages were installed in the image. Please provide the information so that we can examine if the change you are making works as expected.

best to check all the artifacts produced by the package pipelines after this change. I did check some of the artifacts and the libonnxruntime_providers_cuda.so and libonnxruntime_providers_tensorrt.so did indeed depend on cudnn.9

i checked https://artprodcus3.artifacts.visualstudio.com/Abc038106-a83b-4dab-9dd3-5a41bc58f34c/530acbc4-21bc-487d-8cd8-348ff451d2ff/_apis/artifact/cGlwZWxpbmVhcnRpZmFjdDovL2FpaW5mcmEvcHJvamVjdElkLzUzMGFjYmM0LTIxYmMtNDg3ZC04Y2Q4LTM0OGZmNDUxZDJmZi9idWlsZElkLzQ4MDg1OC9hcnRpZmFjdE5hbWUvb25ueHJ1bnRpbWUtbGludXgteDY0LWdwdQ2/content?format=file&subPath=%2Fonnxruntime-linux-x64-gpu-1.19.0.tgz from this run https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=480858&view=logs&j=93517ded-26f4-5eeb-8a9b-73e6b9c15e50&t=e69b1501-fc09-575b-14b1-82104b17da58

@chilo-ms
Copy link
Contributor

chilo-ms commented Jun 10, 2024

Does the new base image cuda12_x64_ubi8_gcc12:20240607 contain cudnn9 and can we kick off all the package pipelines to check the artifacts they produce now? thanks.
I can help check the artifacts once they are generated.

@snnn
Copy link
Member

snnn commented Jun 10, 2024

He is working on it.

@jchen351 jchen351 merged commit 05032e5 into main Jun 11, 2024
202 of 204 checks passed
@jchen351 jchen351 deleted the Cjian/cd9 branch June 11, 2024 16:37
@chilo-ms
Copy link
Contributor

i checked the Nuget-CUDA-Packaging pipeline for onnxruntime-linux-x64-gpu-1.19.0.tgz, and both CUDA/TRT EP lib did link against cuda12 and cudnn9
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=483657&view=artifacts&pathAsName=false&type=publishedArtifacts

@sophies927 sophies927 added the triage:approved Approved for cherrypicks for release label Jun 11, 2024
yf711 pushed a commit that referenced this pull request Jun 18, 2024
Adding support of cudnn 9

Keep exsiting  cuda 12.2 with nvidia dirver 535
yf711 pushed a commit that referenced this pull request Jun 18, 2024
### Description
Adding support of cudnn 9

### Motivation and Context
Keep exsiting  cuda 12.2 with nvidia dirver 535
baijumeswani pushed a commit that referenced this pull request Jun 20, 2024
Adding support of cudnn 9

Keep exsiting  cuda 12.2 with nvidia dirver 535
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:CUDA issues related to the CUDA execution provider release:1.18.1 triage:approved Approved for cherrypicks for release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants