From c7fd48f20322ddc895b9adbc77a523fc72bbb6ff Mon Sep 17 00:00:00 2001 From: Jinzhe Zeng Date: Tue, 20 Aug 2024 19:32:57 -0400 Subject: [PATCH 1/4] docs: improve docs for environment variables Signed-off-by: Jinzhe Zeng --- doc/env.md | 70 +++++++++++++++++++ doc/index.rst | 1 + doc/inference/cxx.md | 4 ++ doc/inference/nodejs.md | 4 ++ doc/inference/python.md | 4 ++ doc/install/install-from-source.md | 83 +++++++++++++++++++---- doc/third-party/ase.md | 4 ++ doc/third-party/dpdata.md | 4 ++ doc/third-party/gromacs.md | 4 ++ doc/third-party/ipi.md | 4 ++ doc/third-party/lammps-command.md | 4 ++ doc/train/training-advanced.md | 9 +-- doc/troubleshooting/howtoset_num_nodes.md | 10 +-- source/api_cc/include/common.h | 2 +- 14 files changed, 181 insertions(+), 26 deletions(-) create mode 100644 doc/env.md diff --git a/doc/env.md b/doc/env.md new file mode 100644 index 0000000000..b22a716480 --- /dev/null +++ b/doc/env.md @@ -0,0 +1,70 @@ +# Runtime environment variables + +:::{note} +For build-time environment variables, see [Install from source code](./install/install-from-source.md). +::: + +## All interfaces + +:::{envvar} DP_INTER_OP_PARALLELISM_THREADS + +**Alias**: `TF_INTER_OP_PARALLELISM_THREADS` +**Default**: `0` + +Control parallelism within TensorFlow (when TensorFlow is built against Eigen) and PyTorch native OPs for CPU devices. +See [How to control the parallelism of a job](./troubleshooting/howtoset_num_nodes.md) for details. +::: + +:::{envvar} DP_INTRA_OP_PARALLELISM_THREADS + +**Alias**: `TF_INTRA_OP_PARALLELISM_THREADS`\*\* +**Default**: `0` + +Control parallelism within TensorFlow (when TensorFlow is built against Eigen) and PyTorch native OPs. +See [How to control the parallelism of a job](./troubleshooting/howtoset_num_nodes.md) for details. +::: + +## Environment variables of dependencies + +- If OpenMP is used, [OpenMP environment variables](https://www.openmp.org/spec-html/5.0/openmpch6.html) can be used to control OpenMP threads, such as [`OMP_NUM_THREADS`](https://www.openmp.org/spec-html/5.0/openmpse50.html#x289-20540006.2). +- If CUDA is used, [CUDA environment variables](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-environment-variables) can be used to control CUDA devices, such as `CUDA_VISIBLE_DEVICES`. +- If ROCm is used, [ROCm environment variables](https://rocm.docs.amd.com/en/latest/conceptual/gpu-isolation.html#environment-variables) can be used to control ROCm devices. +- {{ tensorflow_icon }} If TensorFlow is used, TensorFlow environment variables can be used. +- {{ pytorch_icon }} If PyTorch is used, [PyTorch environment variables](https://pytorch.org/docs/stable/torch_environment_variables.html) can be used. + +## Python interface only + +:::{envvar} DP_INTERFACE_PREC + +**Choices**: `high`, `low`; **Default**: `high` + +Control high (double) or low (float) precision of training. +::: + +:::{envvar} DP_AUTO_PARALLELIZATION + +**Choices**: `0`, `1`; **Default**: `0` + +Enable auto parallelization for CPU operators. +::: + +:::{envvar} DP_JIT + +**Choices**: `0`, `1`; **Default**: `0` + +Enable JIT. Note that this option may either improve or decrease the performance. Requires TensorFlow supports JIT. +::: + +:::{envvar} DP_INFER_BATCH_SIZE + +**Default**: `1024` on CPUs and as maximum as possible until out-of-memory on GPUs + +Inference batch size, calculated by multiplying the number of frames with the number of atoms. +::: + +:::{envvar} DP_BACKEND + +**Default**: `tensorflow` + +Default backend. +::: diff --git a/doc/index.rst b/doc/index.rst index fcce1e37e7..7eb1bb2f18 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -45,6 +45,7 @@ DeePMD-kit is a package written in Python/C++, designed to minimize the effort r cli third-party/index nvnmd/index + env troubleshooting/index diff --git a/doc/inference/cxx.md b/doc/inference/cxx.md index 58c74df068..ec8a3248a1 100644 --- a/doc/inference/cxx.md +++ b/doc/inference/cxx.md @@ -1,5 +1,9 @@ # C/C++ interface +:::{note} +See [Environment variables](../env.md) for the runtime environment variables. +::: + ## C++ interface The C++ interface of DeePMD-kit is also available for the model interface, which is considered faster than the Python interface. An example `infer_water.cpp` is given below: diff --git a/doc/inference/nodejs.md b/doc/inference/nodejs.md index 8d58881898..abe9dc36ab 100644 --- a/doc/inference/nodejs.md +++ b/doc/inference/nodejs.md @@ -1,5 +1,9 @@ # Node.js interface +:::{note} +See [Environment variables](../env.md) for the runtime environment variables. +::: + If [Node.js interface is installed](../install/install-nodejs.md), one can use the Node.js interface for model inference, which is a wrapper of [the header-only C++ API](./cxx.md). A simple example is shown below. diff --git a/doc/inference/python.md b/doc/inference/python.md index 73faa2b329..b2603c85f8 100644 --- a/doc/inference/python.md +++ b/doc/inference/python.md @@ -1,5 +1,9 @@ # Python interface +:::{note} +See [Environment variables](../env.md) for the runtime environment variables. +::: + One may use the python interface of DeePMD-kit for model inference, an example is given as follows ```python diff --git a/doc/install/install-from-source.md b/doc/install/install-from-source.md index c0b78004d0..b0920d3b5d 100644 --- a/doc/install/install-from-source.md +++ b/doc/install/install-from-source.md @@ -136,18 +136,77 @@ pip install . One may set the following environment variables before executing `pip`: -| Environment variables | Allowed value | Default value | Usage | -| --------------------------------------------------- | --------------------- | ---------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| DP_VARIANT | `cpu`, `cuda`, `rocm` | `cpu` | Build CPU variant or GPU variant with CUDA or ROCM support. | -| CUDAToolkit_ROOT | Path | Detected automatically | The path to the CUDA toolkit directory. CUDA 9.0 or later is supported. NVCC is required. | -| ROCM_ROOT | Path | Detected automatically | The path to the ROCM toolkit directory. | -| DP_ENABLE_TENSORFLOW | 0, 1 | 1 | {{ tensorflow_icon }} Enable the TensorFlow backend. | -| DP_ENABLE_PYTORCH | 0, 1 | 0 | {{ pytorch_icon }} Enable customized C++ OPs for the PyTorch backend. PyTorch can still run without customized C++ OPs, but features will be limited. | -| TENSORFLOW_ROOT | Path | Detected automatically | {{ tensorflow_icon }} The path to TensorFlow Python library. By default the installer only finds TensorFlow under user site-package directory (`site.getusersitepackages()`) or system site-package directory (`sysconfig.get_path("purelib")`) due to limitation of [PEP-517](https://peps.python.org/pep-0517/). If not found, the latest TensorFlow (or the environment variable `TENSORFLOW_VERSION` if given) from PyPI will be built against. | -| PYTORCH_ROOT | Path | Detected automatically | {{ pytorch_icon }} The path to PyTorch Python library. By default, the installer only finds PyTorch under the user site-package directory (`site.getusersitepackages()`) or the system site-package directory (`sysconfig.get_path("purelib")`) due to the limitation of [PEP-517](https://peps.python.org/pep-0517/). If not found, the latest PyTorch (or the environment variable `PYTORCH_VERSION` if given) from PyPI will be built against. | -| DP_ENABLE_NATIVE_OPTIMIZATION | 0, 1 | 0 | Enable compilation optimization for the native machine's CPU type. Do not enable it if generated code will run on different CPUs. | -| CMAKE_ARGS | str | - | Additional CMake arguments | -| <LANG>FLAGS (``=`CXX`, `CUDA` or `HIP`) | str | - | Default compilation flags to be used when compiling `` files. See [CMake documentation](https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_FLAGS.html). | +:::{envvar} DP_VARIANT + +**Choices**: `cpu`, `cuda`, `rocm`; **Default**: `cpu` + +Build CPU variant or GPU variant with CUDA or ROCM support. +::: + +:::{envvar} CUDAToolkit_ROOT + +**Type**: Path; **Default**: Detected automatically + +The path to the CUDA toolkit directory. CUDA 9.0 or later is supported. NVCC is required. +::: + +:::{envvar} ROCM_ROOT + +**Type**: Path; **Default**: Detected automatically + +The path to the ROCM toolkit directory. +::: + +:::{envvar} DP_ENABLE_TENSORFLOW + +**Choices**: `0`, `1`; **Default**: `1` + +{{ tensorflow_icon }} Enable the TensorFlow +::: + +:::{envvar} DP_ENABLE_PYTORCH + +**Choices**: `0`, `1`; **Default**: `1` + +{{ pytorch_icon }} Enable customized C++ OPs for the PyTorch backend. PyTorch can still run without customized C++ OPs, but features will be limited. +::: + +:::{envvar} TENSORFLOW_ROOT + +**Type**: Path; **Default**: Detected automatically + +{{ tensorflow_icon }} The path to TensorFlow Python library. If not given, by default the installer only finds TensorFlow under user site-package directory (`site.getusersitepackages()`) or system site-package directory (`sysconfig.get_path("purelib")`) due to limitation of [PEP-517](https://peps.python.org/pep-0517/). If not found, the latest TensorFlow (or the environment variable `TENSORFLOW_VERSION` if given) from PyPI will be built against. +::: + +:::{envvar} PYTORCH_ROOT + +**Type**: Path; **Default**: Detected automatically + +{{ pytorch_icon }} The path to PyTorch Python library. If not given, by default, the installer only finds PyTorch under the user site-package directory (`site.getusersitepackages()`) or the system site-package directory (`sysconfig.get_path("purelib")`) due to the limitation of [PEP-517](https://peps.python.org/pep-0517/). If not found, the latest PyTorch (or the environment variable `PYTORCH_VERSION` if given) from PyPI will be built against. +::: + +:::{envvar} DP_ENABLE_NATIVE_OPTIMIZATION + +**Choices**: `0`, `1`; **Default**: `0` + +Enable compilation optimization for the native machine's CPU type. Do not enable it if generated code will run on different CPUs. +::: + +:::{envvar} CMAKE_ARGS + +**Type**: string + +Control high (double) or low (float) precision of training. +::: + +:::{envvar} <LANG>FLAGS (``=`CXX`, `CUDA` or `HIP`) + +**Type**: string + +Default compilation flags to be used when compiling `` files. See [CMake documentation](https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_FLAGS.html). +::: + +Other [CMake environment variables](https://cmake.org/cmake/help/latest/manual/cmake-env-variables.7.html) may also be critical. To test the installation, one should first jump out of the source directory diff --git a/doc/third-party/ase.md b/doc/third-party/ase.md index 76371a3197..6ede63e2f9 100644 --- a/doc/third-party/ase.md +++ b/doc/third-party/ase.md @@ -1,5 +1,9 @@ # Use deep potential with ASE +:::{note} +See [Environment variables](../env.md) for the runtime environment variables. +::: + Deep potential can be set up as a calculator with ASE to obtain potential energies and forces. ```python diff --git a/doc/third-party/dpdata.md b/doc/third-party/dpdata.md index 05e0f6fb40..ddb8f13aad 100644 --- a/doc/third-party/dpdata.md +++ b/doc/third-party/dpdata.md @@ -1,5 +1,9 @@ # Use deep potential with dpdata +:::{note} +See [Environment variables](../env.md) for the runtime environment variables. +::: + DeePMD-kit provides a driver for [dpdata](https://github.com/deepmodeling/dpdata) >=0.2.7 via the plugin mechanism, making it possible to call the `predict` method for `System` class: ```py diff --git a/doc/third-party/gromacs.md b/doc/third-party/gromacs.md index 5c5132feab..791caeb419 100644 --- a/doc/third-party/gromacs.md +++ b/doc/third-party/gromacs.md @@ -1,5 +1,9 @@ # Running MD with GROMACS +:::{note} +See [Environment variables](../env.md) for the runtime environment variables. +::: + ## DP/MM Simulation This part gives a simple tutorial on how to run a DP/MM simulation for methane in water, which means using DP for methane and TIP3P for water. All relevant files can be found in `examples/methane`. diff --git a/doc/third-party/ipi.md b/doc/third-party/ipi.md index 84a972d885..117512138e 100644 --- a/doc/third-party/ipi.md +++ b/doc/third-party/ipi.md @@ -1,5 +1,9 @@ # Run path-integral MD with i-PI +:::{note} +See [Environment variables](../env.md) for the runtime environment variables. +::: + The i-PI works in a client-server model. The i-PI provides the server for integrating the replica positions of atoms, while the DeePMD-kit provides a client named `dp_ipi` that computes the interactions (including energy, forces and virials). The server and client communicate via the Unix domain socket or the Internet socket. Installation instructions for i-PI can be found [here](../install/install-ipi.md). The client can be started by ```bash diff --git a/doc/third-party/lammps-command.md b/doc/third-party/lammps-command.md index 89c89b24fe..4baba00e05 100644 --- a/doc/third-party/lammps-command.md +++ b/doc/third-party/lammps-command.md @@ -1,5 +1,9 @@ # Run MD with LAMMPS +:::{note} +See [Environment variables](../env.md) for the runtime environment variables. +::: + ## units All units in LAMMPS except `lj` are supported. `lj` is not supported. diff --git a/doc/train/training-advanced.md b/doc/train/training-advanced.md index 5051d981e8..8f517273a6 100644 --- a/doc/train/training-advanced.md +++ b/doc/train/training-advanced.md @@ -161,14 +161,7 @@ optional arguments: **`--skip-neighbor-stat`** will skip calculating neighbor statistics if one is concerned about performance. Some features will be disabled. To maximize the performance, one should follow [FAQ: How to control the parallelism of a job](../troubleshooting/howtoset_num_nodes.md) to control the number of threads. - -One can set other environmental variables: - -| Environment variables | Allowed value | Default value | Usage | -| ----------------------- | ------------- | ------------- | ------------------------------------------------------------------------------------------------------------------- | -| DP_INTERFACE_PREC | `high`, `low` | `high` | Control high (double) or low (float) precision of training. | -| DP_AUTO_PARALLELIZATION | 0, 1 | 0 | Enable auto parallelization for CPU operators. | -| DP_JIT | 0, 1 | 0 | Enable JIT. Note that this option may either improve or decrease the performance. Requires TensorFlow supports JIT. | +See [Runtime environment variables](../env.md) for all runtime environment variables. ## Adjust `sel` of a frozen model {{ tensorflow_icon }} diff --git a/doc/troubleshooting/howtoset_num_nodes.md b/doc/troubleshooting/howtoset_num_nodes.md index d5800d380b..59de5f480a 100644 --- a/doc/troubleshooting/howtoset_num_nodes.md +++ b/doc/troubleshooting/howtoset_num_nodes.md @@ -30,12 +30,12 @@ For CPU devices, TensorFlow and PyTorch use multiple streams to run independent export DP_INTER_OP_PARALLELISM_THREADS=3 ``` -However, for GPU devices, TensorFlow uses only one compute stream and multiple copy streams. -Note that some of DeePMD-kit OPs do not have GPU support, so it is still encouraged to set environmental variables even if one has a GPU. +However, for GPU devices, TensorFlow and PyTorch use only one compute stream and multiple copy streams. +Note that some of DeePMD-kit OPs do not have GPU support, so it is still encouraged to set environment variables even if one has a GPU. ## Parallelism within an individual operators -For CPU devices, `DP_INTRA_OP_PARALLELISM_THREADS` controls parallelism within TensorFlow (when TensorFlow is built against Eigen) and PyTorch native OPs. +For CPU devices, {envvar}`DP_INTRA_OP_PARALLELISM_THREADS` controls parallelism within TensorFlow (when TensorFlow is built against Eigen) and PyTorch native OPs. ```bash export DP_INTRA_OP_PARALLELISM_THREADS=2 @@ -49,7 +49,7 @@ It may also control parallelism for NumPy when NumPy is built against OpenMP, so export OMP_NUM_THREADS=2 ``` -There are several other environmental variables for OpenMP, such as `KMP_BLOCKTIME`. +There are several other environment variables for OpenMP, such as `KMP_BLOCKTIME`. ::::{tab-set} @@ -70,7 +70,7 @@ See [PyTorch documentation](https://pytorch.org/tutorials/recipes/recipes/tuning There is no one general parallel configuration that works for all situations, so you are encouraged to tune parallel configurations yourself after empirical testing. Here are some empirical examples. -If you wish to use 3 cores of 2 CPUs on one node, you may set the environmental variables and run DeePMD-kit as follows: +If you wish to use 3 cores of 2 CPUs on one node, you may set the environment variables and run DeePMD-kit as follows: ::::{tab-set} diff --git a/source/api_cc/include/common.h b/source/api_cc/include/common.h index 6b06cac2f4..9b1adcbd62 100644 --- a/source/api_cc/include/common.h +++ b/source/api_cc/include/common.h @@ -142,7 +142,7 @@ void select_map_inv(typename std::vector::iterator out, /** * @brief Get the number of threads from the environment variable. - * @details A warning will be thrown if environmental variables are not set. + * @details A warning will be thrown if environment variables are not set. * @param[out] num_intra_nthreads The number of intra threads. Read from *DP_INTRA_OP_PARALLELISM_THREADS. * @param[out] num_inter_nthreads The number of inter threads. Read from From eefb67ff0d8f9795d15006856f6fdb8a4eb5dc03 Mon Sep 17 00:00:00 2001 From: Jinzhe Zeng Date: Tue, 20 Aug 2024 19:56:05 -0400 Subject: [PATCH 2/4] fix --- doc/env.md | 4 ++-- doc/install/install-from-source.md | 8 +++++--- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/doc/env.md b/doc/env.md index b22a716480..fee65578a3 100644 --- a/doc/env.md +++ b/doc/env.md @@ -45,14 +45,14 @@ Control high (double) or low (float) precision of training. **Choices**: `0`, `1`; **Default**: `0` -Enable auto parallelization for CPU operators. +{{ tensorflow_icon }} Enable auto parallelization for CPU operators. ::: :::{envvar} DP_JIT **Choices**: `0`, `1`; **Default**: `0` -Enable JIT. Note that this option may either improve or decrease the performance. Requires TensorFlow supports JIT. +{{ tensorflow_icon }} Enable JIT. Note that this option may either improve or decrease the performance. Requires TensorFlow to support JIT. ::: :::{envvar} DP_INFER_BATCH_SIZE diff --git a/doc/install/install-from-source.md b/doc/install/install-from-source.md index b0920d3b5d..9320a86c22 100644 --- a/doc/install/install-from-source.md +++ b/doc/install/install-from-source.md @@ -161,7 +161,7 @@ The path to the ROCM toolkit directory. **Choices**: `0`, `1`; **Default**: `1` -{{ tensorflow_icon }} Enable the TensorFlow +{{ tensorflow_icon }} Enable the TensorFlow backend. ::: :::{envvar} DP_ENABLE_PYTORCH @@ -199,11 +199,13 @@ Enable compilation optimization for the native machine's CPU type. Do not enable Control high (double) or low (float) precision of training. ::: -:::{envvar} <LANG>FLAGS (``=`CXX`, `CUDA` or `HIP`) +:::{envvar} FLAGS + +``=`CXX`, `CUDA` or `HIP` **Type**: string -Default compilation flags to be used when compiling `` files. See [CMake documentation](https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_FLAGS.html). +Default compilation flags to be used when compiling `` files. See [CMake documentation](https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_FLAGS.html) for details. ::: Other [CMake environment variables](https://cmake.org/cmake/help/latest/manual/cmake-env-variables.7.html) may also be critical. From 0ca4e8b26f7d52862048674d261436ba46f318d4 Mon Sep 17 00:00:00 2001 From: Jinzhe Zeng Date: Tue, 20 Aug 2024 20:51:50 -0400 Subject: [PATCH 3/4] document NUM_WORKERS --- doc/env.md | 9 +++++++++ doc/troubleshooting/howtoset_num_nodes.md | 2 +- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/doc/env.md b/doc/env.md index fee65578a3..169729229c 100644 --- a/doc/env.md +++ b/doc/env.md @@ -68,3 +68,12 @@ Inference batch size, calculated by multiplying the number of frames with the nu Default backend. ::: + +:::{envvar} NUM_WORKERS + +**Default**: 8 or the number of cores (whichever is smaller) + +{{ pytorch_icon }} Number of subprocesses to use for data loading in the PyTorch backend. +See [PyTorch documentation](https://pytorch.org/docs/stable/data.html) for details. + +::: \ No newline at end of file diff --git a/doc/troubleshooting/howtoset_num_nodes.md b/doc/troubleshooting/howtoset_num_nodes.md index 59de5f480a..0c547650fb 100644 --- a/doc/troubleshooting/howtoset_num_nodes.md +++ b/doc/troubleshooting/howtoset_num_nodes.md @@ -33,7 +33,7 @@ export DP_INTER_OP_PARALLELISM_THREADS=3 However, for GPU devices, TensorFlow and PyTorch use only one compute stream and multiple copy streams. Note that some of DeePMD-kit OPs do not have GPU support, so it is still encouraged to set environment variables even if one has a GPU. -## Parallelism within an individual operators +## Parallelism within individual operators For CPU devices, {envvar}`DP_INTRA_OP_PARALLELISM_THREADS` controls parallelism within TensorFlow (when TensorFlow is built against Eigen) and PyTorch native OPs. From 3215fb24092a35fd99e9729a38f5443b17e83bd8 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Wed, 21 Aug 2024 00:52:54 +0000 Subject: [PATCH 4/4] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- doc/env.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/env.md b/doc/env.md index 169729229c..7dfd2d8827 100644 --- a/doc/env.md +++ b/doc/env.md @@ -74,6 +74,6 @@ Default backend. **Default**: 8 or the number of cores (whichever is smaller) {{ pytorch_icon }} Number of subprocesses to use for data loading in the PyTorch backend. -See [PyTorch documentation](https://pytorch.org/docs/stable/data.html) for details. +See [PyTorch documentation](https://pytorch.org/docs/stable/data.html) for details. -::: \ No newline at end of file +:::