[ORT 1.18.1 Release] Cherry pick 1st round #21105

yf711 · 2024-06-19T18:47:50Z

Description

Motivation and Context

### Description  ### Motivation and Context

### Description Previous all feed are set to nightly, the offcial released feed-id is not set ### Motivation and Context

### Description  This PR adding protoc.exe to make the Nuget Cuda Pipleine, which also allowing it to get build Java for various CUDA version ### Motivation and Context

### Description  ### Motivation and Context

…20812) ### Description Adding java build/packaging stage to `cuda-packaging-pipeline.yml` ### Motivation and Context This way we can enable publishing the Java Cuda 12 along with Nuget CUDA 12

…ine (#20888) ### Description  ### Motivation and Context To allow nightly release to be automatic triggered

### Description This PR to allow `./gradlew cmakeCheck` failed on Windows_Packaging_(CUDA|TensorRT) Job. This way, it will still generate all nessary jar and pom file need for later stage to consume while `./gradlew cmakeCheck`will be also run again in the Windows_Packaging_(CUDA|TensorRT)_Testing stage. ### Motivation and Context Reduce the time of All java packaging stages by 30+ min.

…over the case where there is only single repo checked out ### Description adding $(Build.SourcesDirectory)/cmake/external/onnx/third_party to cover the case where there is only single repo checked out ### Motivation and Context Fix CG issue https://aiinfra.visualstudio.com/Lotus/_componentGovernance/97926/alert/8862110?typeId=16576846

### Description Remove failOnStderr from Gradle cmakeCheck ### Motivation and Context The Gradle is still using the deprecated API

### Description Adding Job names to jobs without a name ### Motivation and Context This way we will know which job fails CG scan.

To replaced deprecated API. Should verify with the `Gradle cmakeCheck` step from `Windows_Packaging_CPU_x64_default` stage from the Zip-Nuge-... pipeline.

### Description Adding support of cudnn 9 ### Motivation and Context Keep exsiting cuda 12.2 with nvidia dirver 535

### Description  ### Motivation and Context

### Description orttrainingtestdatascus has only save mnist whose size is only 64M in Azure File To meet security requirements and reduce maintenance cost, move the test data to lotusscus and saved in Azure blob.

### Description  ### Motivation and Context

…in2022-GPU-A10 (#21023) ### Description Move jobs in onnxruntime-Win2022-GPU-T4 machine pool to onnxruntime-Win2022-GPU-A10 ### Motivation and Context To reduce the variants of VM images we need to maintain. Now we have 3: 1. Windows 2022 CPU 2. Windows 2022 GPU A10 3. Windows 2022 GPU T4 This change allows us removing the last one.

### Description Fix a few issues in the Windows TRT job in "Zip-Nuget-Java-Nodejs Packaging Pipeline": 1. It is a Windows job. It should not use bash(which is usually not available on Windows). 2. When it sets ADO vars, it missed a semicolon Here is the doc of how to set ADO vars via scripts: https://learn.microsoft.com/en-us/azure/devops/pipelines/process/set-variables-scripts?view=azure-devops&tabs=bash You could see it needs a semicolon . Without the semicolon , the vars will have an extra quotation mark in their values.

### Description Use a common set of prebuilt manylinux base images to build the packages, to avoid building the manylinux part again and again. The base images can be used in GenAI and other projects too. This PR also updates the GCC version for inference python CUDA11/CUDA12 builds from 8 to 11. Later on I will update all other CUDA pipelines to use GCC 11, to avoid the issue described in onnx/onnx#6047 and microsoft/onnxruntime-genai#257 . ### Motivation and Context To extract the common part as a reusable build infra among different ONNX Runtime projects.

### Description Add "-allow-unsupported-compiler" flags to Windows CUDA flags. This change only impacts our pipelines. By default it would not reach this code path. ### Motivation and Context nvcc refuses working with the latest VS toolset unless this flag is set. If without this change, our CI build will fail with the compiler is the latest VS 2022 17.10. Here is the log: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1405549&view=logs&j=6df8fe70-7b8f-505a-8ef0-8bf93da2bac7&t=c7e55e04-f02b-57dc-d19a-29b7d3528c44&l=715 The error message is: `D:\a\_work\_temp\v11.8\include\crt/host_config.h(153): fatal error C1189: #error: -- unsupported Microsoft Visual Studio version! Only the versions between 2017 and 2022 (inclusive) are supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk. [D:\a\_work\1\b\RelWithDebInfo\CMakeFiles\CMakeScratch\TryCompile-g5rudf\cmTC_7b8ff.vcxproj]`

### Description Similar to #20786 . The last PR was able to update all pipelines and all docker files. This is a follow-up to that PR. ### Motivation and Context 1. To extract the common part as a reusable build infra among different ONNX Runtime projects. 2. Avoid hitting docker hub's limit: 429 Too Many Requests - Server message: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit

…esystem implementation (#20893) ### Description This PR upgrades CUDA 11 build pipelines' GCC version from 8 to 11. ### Motivation and Context GCC8 has an experimental std::filesystem implementation which is not ABI compatible with the formal one in later GCC releases. It didn't cause trouble for us, however, ONNX community has encountered this issue much. For example, onnx/onnx#6047 . So this PR increases the minimum supported GCC version from 8 to 9, and removes the references to GCC's "stdc++fs" library. Please note we compile our code on RHEL8 and RHEL8's libstdc++ doesn't have the fs library, which means the binaries in ONNX Runtime's official packages always static link to the fs library. It is just a matter of which version of the library, an experimental one or a more mature one. And it is an implementation detail that is not visible from outside. Anyway, a newer GCC is better. It will give us the chance to use many C++20 features. #### Why we were using GCC 8? It is because all our Linux packages were built on RHEL8 or its equivalents. The default GCC version in RHEL8 is 8. RHEL also provides additional GCC versions from RH devtoolset. UBI8 is the abbreviation of Red Hat Universal Base Image 8, which is the containerized RHEL8. UBI8 is free, which means it doesn't require a subscription(while RHEL does). The only devtoolset that UBI8 provides is GCC 12, which is too new for being used with CUDA 11.8. And our CUDA 11.8's build env is a docker image from Nvidia that is based on UBI8. #### How the problem is solved Almalinux is an alternative to RHEL. Almalinux 8 provides GCC 11. And the CUDA 11.8 docker image from Nvidia is open source, which means we can rebuild the image based on Almalinux 8 to get GCC 11. I've done this, but I cannot republish the new image due to various complicated license restrictions. Therefore I put them at an internal location in onnxruntimebuildcache.azurecr.io.

### Description  ### Motivation and Context  It breaks the python package pipeline. A new run: https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=477415&view=logs&s=d66927fc-650e-5e6f-874c-ae9229c1e7e4 --------- Co-authored-by: Your Name <you@example.com>

### Description Fix regression caused by #20995 ### Motivation and Context

Avoid using command line flags to pass in CMAKE_PREFIX_PATH. Use environment variables instead. Because, otherwise the value of CMAKE_PREFIX_PATH could get encoded twice. For example, if the prefix is `C:\a\root`, then in tools/ci_build/github/windows/helpers.ps1 we set it in Env:CMAKE_ARGS which will be consumed by ONNX. Then when ONNX get it and decoded it, ONNX will get `C:aroot` instead. Then because the path doesn't exist, the CMAKE_PREFIX_PATH couldn't take effect when the script installs ONNX. This PR fixes the issue. The issue got discovered when I tried to upgrade cmake to a newer version. Now our Windows CPU CI build pipeline uses cmake 3.27. In the main branch even the CMAKE_PREFIX_PATH setting does not work, cmake still can find protoc.exe from the directories. However, starting from 3.28 cmake changed it. With the newer cmake versions the find_library(), find_path(), and find_file() cmake commands no longer search in installation prefixes derived from the PATH environment variable.

jchen351 and others added 30 commits June 18, 2024 01:18

Fix critical and High issues from Component Governance (#20611)

6db6b10

### Description  ### Motivation and Context

Component governance fix round 2 (#20679)

37d218e

Increase NPM ComponentDetection.Timeout: 1200 (#20681)

8c3a8de

### Description  ### Motivation and Context

Reenabling Nuget Cuda Packaging Pipeline (#20688)

f520cd1

### Description  ### Motivation and Context

Component Governance fix round 3 (#20689)

5cb1c1f

### Description  ### Motivation and Context

Using CPU pool to build Linux GPU C API Package (#20648)

aba4ed9

### Description  ### Motivation and Context

component-governance fix round 4 (#20754)

7a37323

### Description  ### Motivation and Context

Replace ubuntu-latest with onnxruntime-Ubuntu2204-AMD-CPU (#20736)

3d310b5

### Description  ### Motivation and Context

Fix Onnx >= to == (#20798)

0878f44

### Description  ### Motivation and Context

Adding java build and packaging stage to cuda-packaging-pipeline.yml (#…

d1d446d

…20812) ### Description Adding java build/packaging stage to `cuda-packaging-pipeline.yml` ### Motivation and Context This way we can enable publishing the Java Cuda 12 along with Nuget CUDA 12

adding publishing stage to publish java CUDA 12 pkg to ado (#20834)

46f8d58

Update py-publishing pipeline to use the resoure from packaging pipel…

0edd46f

…ine (#20888) ### Description  ### Motivation and Context To allow nightly release to be automatic triggered

Remove failOnStderr from Gradle cmakeCheck (#20919)

aa957ee

### Description Remove failOnStderr from Gradle cmakeCheck ### Motivation and Context The Gradle is still using the deprecated API

Adding Job names to jobs without a name (#20961)

0c1ea5e

### Description Adding Job names to jobs without a name ### Motivation and Context This way we will know which job fails CG scan.

Refactor deprecated gradle syntax (#20922)

bdcb4c5

To replaced deprecated API. Should verify with the `Gradle cmakeCheck` step from `Windows_Packaging_CPU_x64_default` stage from the Zip-Nuge-... pipeline.

Updating cudnn from 8 to 9 on exsiting cuda 12 docker image (#20925)

aeb278a

### Description Adding support of cudnn 9 ### Motivation and Context Keep exsiting cuda 12.2 with nvidia dirver 535

Component Governance Fix round 6 (#21021)

705cf7b

### Description  ### Motivation and Context

Migrate training storage from SAS to managed identity (#20618)

f1d5270

### Description orttrainingtestdatascus has only save mnist whose size is only 64M in Azure File To meet security requirements and reduce maintenance cost, move the test data to lotusscus and saved in Azure blob.

Upgrade ESRP signing task from v2 to v5 (#20995)

d11cc7f

### Description  ### Motivation and Context

Fix onebranch exception in code signing (#21088)

84efbfb

### Description Fix regression caused by #20995 ### Motivation and Context

yf711 requested a review from a team as a code owner June 19, 2024 18:47

yf711 changed the title ~~[ORT 1.18.1 Release] Cherry pick~~ [ORT 1.18.1 Release] Cherry pick 1st round Jun 19, 2024

jywu-msft approved these changes Jun 20, 2024

View reviewed changes

snnn approved these changes Jun 20, 2024

View reviewed changes

jywu-msft merged commit 91fb865 into rel-1.18.1 Jun 20, 2024
98 of 109 checks passed

jywu-msft deleted the yifanl/cherry-pick-1.18.1 branch June 20, 2024 03:10

jywu-msft restored the yifanl/cherry-pick-1.18.1 branch June 20, 2024 03:11

snnn mentioned this pull request Jul 11, 2024

Fix ETW Sink Initialize unproperly locking (#21226) #21302

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ORT 1.18.1 Release] Cherry pick 1st round #21105

[ORT 1.18.1 Release] Cherry pick 1st round #21105

yf711 commented Jun 19, 2024

[ORT 1.18.1 Release] Cherry pick 1st round #21105

[ORT 1.18.1 Release] Cherry pick 1st round #21105

Conversation

yf711 commented Jun 19, 2024

Description

Motivation and Context