feat(pt): allow PT OP CXXABI different from TF #3891

njzjz · 2024-06-20T22:38:05Z

Build PT OP libraries with compatible CXXABI if PT has a different CXX ABI with TF;
Enable PT OP in test_cuda workflow;
Update documentation.

Summary by CodeRabbit

Documentation
- Removed outdated instructions related to setting environment variables for enabling customized C++ OPs in PyTorch.
Chores
- Updated build configuration to handle PyTorch CXX11 ABI compatibility with TensorFlow.
- Refactored library creation processes for better handling of CUDA and ROCm toolkits.
- Improved build scripts to dynamically adjust compile definitions and installation paths based on different build configurations.
CI/CD
- Enhanced the continuous integration workflow to include PyTorch variable assignments and settings for testing with CUDA.

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

coderabbitai · 2024-06-20T22:39:53Z

Walkthrough

This update enhances the build process for managing PyTorch and TensorFlow compatibility by introducing checks and appropriate settings for the CXX11 ABI flag, refactoring GPU library creation, and adding conditional linking. These modifications support customized C++ operations without requiring explicit environment variables, ensuring seamless integration and compatibility within different build configurations.

Changes

File	Change Summary
`doc/install/install-from-source.md`	Removed instructions and explanations related to the `DP_ENABLE_PYTORCH` environment variable and CXX11 ABI flags.
`source/CMakeLists.txt`	Handled PyTorch CXX11 ABI compatibility checks with TensorFlow, setting compatibility flags accordingly.
`source/api_cc/CMakeLists.txt`	Linked libraries conditionally based on ABI flag comparisons to adjust build configurations for libraries.
`source/lib/CMakeLists.txt`	Introduced a `create_library` function for dynamic library creation with suffixes based on CUDA/ROCm toolkits.
`source/lib/src/gpu/CMakeLists.txt`	Refactored GPU library creation and linking process, removed unused compile definitions, and handled ABI compatibility.
`source/op/pt/CMakeLists.txt`	Implemented conditional linking and adjusted compile definitions based on PyTorch ABI settings.
`.github/workflows/test_cuda.yml`	Added environment variable exports and assignments to streamline testing configurations.

Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between c85b559 and ee0f409.

Files selected for processing (2)

.github/workflows/test_cuda.yml (1 hunks)
doc/install/install-from-source.md (1 hunks)

Additional context used

LanguageTool

doc/install/install-from-source.md

[style] ~89-~89: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing. (REP_WANT_TO_VB)
Context: ...e $deepmd_venv/bin/activate ``` if one wants to skip out of the virtual environment, he...

[uncategorized] ~146-~146: Did you mean: “By default,”? (BY_DEFAULT_COMMA)
Context: ... The path to TensorFlow Python library. By default the installer only finds TensorFlow und...

[uncategorized] ~190-~190: “your” (belonging to you) seems less likely than “you”. (AI_HYDRA_LEO_CP_YOUR_YOU)
Context: ...nccl pip install horovod mpi4py ``` If your work in a CPU environment, please prepa...

[style] ~295-~295: Consider using “incompatible” to avoid wordiness. (NOT_ABLE_PREMIUM)
Context: ...rch/pytorch/issues/51039), which may be not compatible with other libraries. ```bash cmake -D...

Markdownlint

doc/install/install-from-source.md

335-335: null (MD014, commands-show-output)
Dollar signs used before commands without showing output

336-336: null (MD014, commands-show-output)
Dollar signs used before commands without showing output

107-107: null (MD040, fenced-code-language)
Fenced code blocks should have a language specified

153-153: null (MD040, fenced-code-language)
Fenced code blocks should have a language specified

Additional comments not posted (1)

.github/workflows/test_cuda.yml (1)
52-58: Ensure correct environment variable assignment and usage.

The environment variables PYTORCH_ROOT and TENSORFLOW_ROOT are being set using Python commands, which is a robust way to dynamically determine their paths. However, the DP_ENABLE_PYTORCH variable is set to 1 without conditional checks, which might not be suitable for all build configurations or might conflict with other settings. Consider adding conditions or configurations to enable/disable this based on specific needs.

Verification successful

The search output confirms that DP_ENABLE_PYTORCH is set to 1 without any conditional logic in the .github/workflows/test_cuda.yml file. This suggests that the variable is always enabled in this workflow.

To ensure that this is appropriate for all configurations, we should inspect the workflow file for any other relevant configurations or steps that might influence the necessity of DP_ENABLE_PYTORCH.

Let's search for other environment variables or steps in the workflow that might provide context on whether DP_ENABLE_PYTORCH should always be enabled or if there are scenarios where it should be conditional.

To understand the context and necessity of DP_ENABLE_PYTORCH, we need to extract and analyze the sections of the workflow file containing environment variables (env:) and steps (steps:).

The extracted sections from the workflow file show that DP_ENABLE_PYTORCH is set to 1 without any conditional logic. Additionally, the presence of other environment variables and steps does not indicate any scenarios where DP_ENABLE_PYTORCH should be disabled or conditionally set.

Given this, it appears that DP_ENABLE_PYTORCH is intended to be enabled for all configurations within this workflow. Therefore, the current setup seems appropriate, and no conditional logic is necessary for DP_ENABLE_PYTORCH.

.github/workflows/test_cuda.yml: DP_ENABLE_PYTORCH is set to 1 unconditionally.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Ensure that DP_ENABLE_PYTORCH is set correctly based on the build configurations.
# Test: Search for conditional logic related to DP_ENABLE_PYTORCH in the workflow file. Expect: Conditional logic that sets DP_ENABLE_PYTORCH based on certain parameters.
rg --type yaml 'DP_ENABLE_PYTORCH' .github/workflows/test_cuda.yml
Length of output: 95

Script:
#!/bin/bash
# Description: Search for environment variables and steps in the workflow file to understand the context of DP_ENABLE_PYTORCH.
# Test: Look for other environment variables and steps in the workflow file that might influence the necessity of DP_ENABLE_PYTORCH.
rg --type yaml 'env:' .github/workflows/test_cuda.yml
rg --type yaml 'steps:' .github/workflows/test_cuda.yml
Length of output: 175

Script:
#!/bin/bash
# Description: Extract sections containing environment variables and steps for detailed inspection.
# Test: Use rg to extract the sections containing env: and steps: from the workflow file.
rg --type yaml -A 5 'env:' .github/workflows/test_cuda.yml
rg --type yaml -A 5 'steps:' .github/workflows/test_cuda.yml
Length of output: 1340

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

njzjz · 2024-06-20T23:24:04Z

It seems to me that we can still not enable PyTorch by default, as the default PyTorch on the PyPI requires NVCC...

CUDA_TOOLKIT_ROOT_DIR not found or specified
CMake Warning at /home/runner/.cache/uv/environments-v0/.tmpgyfA4c/lib/python3.11/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:31 (message):
  Caffe2: CUDA cannot be found.  Depending on whether you are building Caffe2
  or a Caffe2 dependent library, the next warning / error will give you more
  info.
Call Stack (most recent call first):
  /home/runner/.cache/uv/environments-v0/.tmpgyfA4c/lib/python3.11/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:87 (include)
  /home/runner/.cache/uv/environments-v0/.tmpgyfA4c/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
  CMakeLists.txt:1[85](https://github.com/deepmodeling/deepmd-kit/actions/runs/9605428597/job/26493048913?pr=3891#step:11:86) (find_package)


CMake Error at /home/runner/.cache/uv/environments-v0/.tmpgyfA4c/lib/python3.11/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:91 (message):
  Your installed Caffe2 version uses CUDA but I cannot find the CUDA
  libraries.  Please set the proper CUDA prefixes and / or install CUDA.
Call Stack (most recent call first):
  /home/runner/.cache/uv/environments-v0/.tmpgyfA4c/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
  CMakeLists.txt:185 (find_package)

xref: pytorch/pytorch#78530

This reverts commit 60b6903.

This reverts commit ebb1a87.

This reverts commit 1164aab.

This reverts commit ca3497b.

coderabbitai

Actionable comments posted: 3

Outside diff range and nitpick comments (5)

source/lib/CMakeLists.txt (1)

15-58: Approve the introduction of create_library function for modularity.

The create_library function significantly improves the modularity and reusability of the library creation process. Ensure that the function is used consistently across the project to maintain uniformity.

Consider using this pattern in other parts of the project where similar functionality is required to maintain consistency and improve maintainability.

source/lib/src/gpu/CMakeLists.txt (1)

82-118: Approve the introduction of create_gpu_lib function for GPU libraries.

The create_gpu_lib function enhances the modularity and reusability for GPU library creation. This is a positive change that aligns with best practices in modern CMake usage.

Encourage the use of this pattern in other parts of the project dealing with similar library creation requirements to maintain consistency and enhance modularity.
doc/install/install-from-source.md (3)
Line range hint 89-89: Improve clarity in virtual environment exit instructions.

The phrase "if one wants to skip out of the virtual environment" could be simplified to enhance clarity. Consider rephrasing to make the instructions more direct.
- if one wants to skip out of the virtual environment, he/she can do
+ To exit the virtual environment, simply run:
Line range hint 145-145: Clarify installation path details for TensorFlow.

The sentence regarding the TensorFlow path could benefit from a comma for better readability.
- By default the installer only finds TensorFlow under user site-package directory...
+ By default, the installer only finds TensorFlow under user site-package directory...
Line range hint 189-189: Correct grammatical error in CPU environment setup.

There's a minor grammatical mistake in the description of setting up the environment for CPU usage.
- If your work in a CPU environment, please prepare
+ If you work in a CPU environment, please prepare

source/CMakeLists.txt

source/op/pt/CMakeLists.txt

coderabbitai

Actionable comments posted: 0

Outside diff range and nitpick comments (3)

doc/install/install-from-source.md (3)
Line range hint 146-146: Clarify installation instructions for TensorFlow.

The sentence "By default the installer only finds TensorFlow under user site-package directory" might be confusing due to the lack of a comma after "By default". Adding a comma can improve the readability.
- By default the installer only finds TensorFlow under user site-package directory
+ By default, the installer only finds TensorFlow under user site-package directory
Line range hint 190-190: Correct grammatical error to enhance clarity.

The phrase "If your work in a CPU environment" should be corrected to "If you work in a CPU environment" to avoid confusion and improve the grammatical structure.
- If your work in a CPU environment, please prepare runtime as below:
+ If you work in a CPU environment, please prepare runtime as below:
Line range hint 295-295: Clarify compatibility note regarding PyTorch.

The phrase "which may be not compatible with other libraries" is somewhat awkward. Rewording it to "which may not be compatible with other libraries" could enhance clarity.
- which may be not compatible with other libraries
+ which may not be compatible with other libraries

`TORCH_CXX_FLAGS` on macOS and Windows doesn't have `_GLIBCXX_USE_CXX11_ABI`. This PR sets `OP_CXX_ABI_PT` to a default value to fix the error introduced in #3891.  ## Summary by CodeRabbit - **Chores** - Updated build configuration to set `OP_CXX_ABI_PT` conditionally for improved compatibility with macOS and Windows environments.  --------- Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

- Build PT OP libraries with compatible CXXABI if PT has a different CXX ABI with TF; - Enable PT OP in test_cuda workflow; - Update documentation.  ## Summary by CodeRabbit - **Documentation** - Removed outdated instructions related to setting environment variables for enabling customized C++ OPs in PyTorch. - **Chores** - Updated build configuration to handle PyTorch CXX11 ABI compatibility with TensorFlow. - Refactored library creation processes for better handling of CUDA and ROCm toolkits. - Improved build scripts to dynamically adjust compile definitions and installation paths based on different build configurations. - **CI/CD** - Enhanced the continuous integration workflow to include PyTorch variable assignments and settings for testing with CUDA.  --------- Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

`TORCH_CXX_FLAGS` on macOS and Windows doesn't have `_GLIBCXX_USE_CXX11_ABI`. This PR sets `OP_CXX_ABI_PT` to a default value to fix the error introduced in deepmodeling#3891.  ## Summary by CodeRabbit - **Chores** - Updated build configuration to set `OP_CXX_ABI_PT` conditionally for improved compatibility with macOS and Windows environments.  --------- Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

njzjz added 5 commits June 20, 2024 18:10

build compat ABI

1d00373

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

enable pytorch by default; remove the workaround

ca3497b

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

update documentation

593c94d

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

force pytorch customized ops to be compiled

1164aab

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

automatically set pytorch requires

ebb1a87

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

github-actions bot added Python Core CUDA ROCM OP C++ Docs labels Jun 20, 2024

fix typos

2e9c0d4

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

njzjz added 3 commits June 20, 2024 18:41

fix the situation when pt is not enabled

32831df

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

remove the error message when PT is not preinstalled

60b6903

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

cleanup

3b182b9

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

njzjz added 7 commits June 20, 2024 19:35

Revert "remove the error message when PT is not preinstalled"

c6d126b

This reverts commit 60b6903.

Revert "automatically set pytorch requires"

a28d3ce

This reverts commit ebb1a87.

Revert "force pytorch customized ops to be compiled"

7ee15c0

This reverts commit 1164aab.

Revert "enable pytorch by default; remove the workaround"

deda1f5

This reverts commit ca3497b.

remove duplicated add_library

c85b559

revert doc changes

83d71ac

enable pytorch ops in cuda test

ee0f409

coderabbitai bot reviewed Jun 20, 2024

View reviewed changes

source/CMakeLists.txt Show resolved Hide resolved

source/op/pt/CMakeLists.txt Show resolved Hide resolved

source/op/pt/CMakeLists.txt Show resolved Hide resolved

njzjz added the Test CUDA Trigger test CUDA workflow label Jun 21, 2024

github-actions bot removed the Test CUDA Trigger test CUDA workflow label Jun 21, 2024

njzjz removed the Python label Jun 21, 2024

njzjz marked this pull request as ready for review June 21, 2024 00:56

coderabbitai bot reviewed Jun 21, 2024

View reviewed changes

njzjz requested a review from wanghan-iapcm June 21, 2024 01:29

wanghan-iapcm approved these changes Jun 21, 2024

View reviewed changes

wanghan-iapcm added this pull request to the merge queue Jun 21, 2024

Merged via the queue into deepmodeling:devel with commit 7a0ec5d Jun 21, 2024
54 checks passed

njzjz mentioned this pull request Jun 22, 2024

fix(cmake): fix OP_CXX_ABI_PT on macos/windows #3893

Merged

njzjz mentioned this pull request Sep 24, 2024

Error when running LAMMPS in the devel branch #4161

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(pt): allow PT OP CXXABI different from TF #3891

feat(pt): allow PT OP CXXABI different from TF #3891

njzjz commented Jun 20, 2024 •

edited

Loading

coderabbitai bot commented Jun 20, 2024 •

edited

Loading

Walkthrough

Changes

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

Documentation and Community

njzjz commented Jun 20, 2024

coderabbitai bot left a comment

coderabbitai bot left a comment

feat(pt): allow PT OP CXXABI different from TF #3891

feat(pt): allow PT OP CXXABI different from TF #3891

Conversation

njzjz commented Jun 20, 2024 • edited Loading

Summary by CodeRabbit

coderabbitai bot commented Jun 20, 2024 • edited Loading

Walkthrough

Changes

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

Documentation and Community

njzjz commented Jun 20, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

njzjz commented Jun 20, 2024 •

edited

Loading

coderabbitai bot commented Jun 20, 2024 •

edited

Loading

CodeRabbit Configration File (`.coderabbit.yaml`)