Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: improve docs for environment variables #4070

Merged
merged 4 commits into from
Aug 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 79 additions & 0 deletions doc/env.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Runtime environment variables

:::{note}
For build-time environment variables, see [Install from source code](./install/install-from-source.md).
:::

## All interfaces

:::{envvar} DP_INTER_OP_PARALLELISM_THREADS

**Alias**: `TF_INTER_OP_PARALLELISM_THREADS`
**Default**: `0`

Control parallelism within TensorFlow (when TensorFlow is built against Eigen) and PyTorch native OPs for CPU devices.
See [How to control the parallelism of a job](./troubleshooting/howtoset_num_nodes.md) for details.
:::

:::{envvar} DP_INTRA_OP_PARALLELISM_THREADS

**Alias**: `TF_INTRA_OP_PARALLELISM_THREADS`\*\*
**Default**: `0`

Control parallelism within TensorFlow (when TensorFlow is built against Eigen) and PyTorch native OPs.
See [How to control the parallelism of a job](./troubleshooting/howtoset_num_nodes.md) for details.
:::

## Environment variables of dependencies

- If OpenMP is used, [OpenMP environment variables](https://www.openmp.org/spec-html/5.0/openmpch6.html) can be used to control OpenMP threads, such as [`OMP_NUM_THREADS`](https://www.openmp.org/spec-html/5.0/openmpse50.html#x289-20540006.2).
- If CUDA is used, [CUDA environment variables](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-environment-variables) can be used to control CUDA devices, such as `CUDA_VISIBLE_DEVICES`.
- If ROCm is used, [ROCm environment variables](https://rocm.docs.amd.com/en/latest/conceptual/gpu-isolation.html#environment-variables) can be used to control ROCm devices.
- {{ tensorflow_icon }} If TensorFlow is used, TensorFlow environment variables can be used.
- {{ pytorch_icon }} If PyTorch is used, [PyTorch environment variables](https://pytorch.org/docs/stable/torch_environment_variables.html) can be used.

## Python interface only

:::{envvar} DP_INTERFACE_PREC

**Choices**: `high`, `low`; **Default**: `high`

Control high (double) or low (float) precision of training.
:::

:::{envvar} DP_AUTO_PARALLELIZATION

**Choices**: `0`, `1`; **Default**: `0`

{{ tensorflow_icon }} Enable auto parallelization for CPU operators.
:::

:::{envvar} DP_JIT

**Choices**: `0`, `1`; **Default**: `0`

{{ tensorflow_icon }} Enable JIT. Note that this option may either improve or decrease the performance. Requires TensorFlow to support JIT.
:::

:::{envvar} DP_INFER_BATCH_SIZE

**Default**: `1024` on CPUs and as maximum as possible until out-of-memory on GPUs

Inference batch size, calculated by multiplying the number of frames with the number of atoms.
:::

:::{envvar} DP_BACKEND

**Default**: `tensorflow`

Default backend.
:::

:::{envvar} NUM_WORKERS

**Default**: 8 or the number of cores (whichever is smaller)

{{ pytorch_icon }} Number of subprocesses to use for data loading in the PyTorch backend.
See [PyTorch documentation](https://pytorch.org/docs/stable/data.html) for details.

:::
1 change: 1 addition & 0 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ DeePMD-kit is a package written in Python/C++, designed to minimize the effort r
cli
third-party/index
nvnmd/index
env
troubleshooting/index


Expand Down
4 changes: 4 additions & 0 deletions doc/inference/cxx.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# C/C++ interface

:::{note}
See [Environment variables](../env.md) for the runtime environment variables.
:::

## C++ interface

The C++ interface of DeePMD-kit is also available for the model interface, which is considered faster than the Python interface. An example `infer_water.cpp` is given below:
Expand Down
4 changes: 4 additions & 0 deletions doc/inference/nodejs.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Node.js interface

:::{note}
See [Environment variables](../env.md) for the runtime environment variables.
:::

If [Node.js interface is installed](../install/install-nodejs.md), one can use the Node.js interface for model inference, which is a wrapper of [the header-only C++ API](./cxx.md).

A simple example is shown below.
Expand Down
4 changes: 4 additions & 0 deletions doc/inference/python.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Python interface

:::{note}
See [Environment variables](../env.md) for the runtime environment variables.
:::

One may use the python interface of DeePMD-kit for model inference, an example is given as follows

```python
Expand Down
85 changes: 73 additions & 12 deletions doc/install/install-from-source.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,18 +136,79 @@ pip install .

One may set the following environment variables before executing `pip`:

| Environment variables | Allowed value | Default value | Usage |
| --------------------------------------------------- | --------------------- | ---------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| DP_VARIANT | `cpu`, `cuda`, `rocm` | `cpu` | Build CPU variant or GPU variant with CUDA or ROCM support. |
| CUDAToolkit_ROOT | Path | Detected automatically | The path to the CUDA toolkit directory. CUDA 9.0 or later is supported. NVCC is required. |
| ROCM_ROOT | Path | Detected automatically | The path to the ROCM toolkit directory. |
| DP_ENABLE_TENSORFLOW | 0, 1 | 1 | {{ tensorflow_icon }} Enable the TensorFlow backend. |
| DP_ENABLE_PYTORCH | 0, 1 | 0 | {{ pytorch_icon }} Enable customized C++ OPs for the PyTorch backend. PyTorch can still run without customized C++ OPs, but features will be limited. |
| TENSORFLOW_ROOT | Path | Detected automatically | {{ tensorflow_icon }} The path to TensorFlow Python library. By default the installer only finds TensorFlow under user site-package directory (`site.getusersitepackages()`) or system site-package directory (`sysconfig.get_path("purelib")`) due to limitation of [PEP-517](https://peps.python.org/pep-0517/). If not found, the latest TensorFlow (or the environment variable `TENSORFLOW_VERSION` if given) from PyPI will be built against. |
| PYTORCH_ROOT | Path | Detected automatically | {{ pytorch_icon }} The path to PyTorch Python library. By default, the installer only finds PyTorch under the user site-package directory (`site.getusersitepackages()`) or the system site-package directory (`sysconfig.get_path("purelib")`) due to the limitation of [PEP-517](https://peps.python.org/pep-0517/). If not found, the latest PyTorch (or the environment variable `PYTORCH_VERSION` if given) from PyPI will be built against. |
| DP_ENABLE_NATIVE_OPTIMIZATION | 0, 1 | 0 | Enable compilation optimization for the native machine's CPU type. Do not enable it if generated code will run on different CPUs. |
| CMAKE_ARGS | str | - | Additional CMake arguments |
| &lt;LANG&gt;FLAGS (`<LANG>`=`CXX`, `CUDA` or `HIP`) | str | - | Default compilation flags to be used when compiling `<LANG>` files. See [CMake documentation](https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_FLAGS.html). |
:::{envvar} DP_VARIANT

**Choices**: `cpu`, `cuda`, `rocm`; **Default**: `cpu`

Build CPU variant or GPU variant with CUDA or ROCM support.
:::

:::{envvar} CUDAToolkit_ROOT

**Type**: Path; **Default**: Detected automatically

The path to the CUDA toolkit directory. CUDA 9.0 or later is supported. NVCC is required.
:::

:::{envvar} ROCM_ROOT

**Type**: Path; **Default**: Detected automatically

The path to the ROCM toolkit directory.
:::

:::{envvar} DP_ENABLE_TENSORFLOW

**Choices**: `0`, `1`; **Default**: `1`

{{ tensorflow_icon }} Enable the TensorFlow backend.
:::

:::{envvar} DP_ENABLE_PYTORCH

**Choices**: `0`, `1`; **Default**: `1`

{{ pytorch_icon }} Enable customized C++ OPs for the PyTorch backend. PyTorch can still run without customized C++ OPs, but features will be limited.
:::

:::{envvar} TENSORFLOW_ROOT

**Type**: Path; **Default**: Detected automatically

{{ tensorflow_icon }} The path to TensorFlow Python library. If not given, by default the installer only finds TensorFlow under user site-package directory (`site.getusersitepackages()`) or system site-package directory (`sysconfig.get_path("purelib")`) due to limitation of [PEP-517](https://peps.python.org/pep-0517/). If not found, the latest TensorFlow (or the environment variable `TENSORFLOW_VERSION` if given) from PyPI will be built against.
:::

:::{envvar} PYTORCH_ROOT

**Type**: Path; **Default**: Detected automatically

{{ pytorch_icon }} The path to PyTorch Python library. If not given, by default, the installer only finds PyTorch under the user site-package directory (`site.getusersitepackages()`) or the system site-package directory (`sysconfig.get_path("purelib")`) due to the limitation of [PEP-517](https://peps.python.org/pep-0517/). If not found, the latest PyTorch (or the environment variable `PYTORCH_VERSION` if given) from PyPI will be built against.
:::

:::{envvar} DP_ENABLE_NATIVE_OPTIMIZATION

**Choices**: `0`, `1`; **Default**: `0`

Enable compilation optimization for the native machine's CPU type. Do not enable it if generated code will run on different CPUs.
:::

:::{envvar} CMAKE_ARGS

**Type**: string

Control high (double) or low (float) precision of training.
:::

:::{envvar} <LANG>FLAGS

`<LANG>`=`CXX`, `CUDA` or `HIP`

**Type**: string

Default compilation flags to be used when compiling `<LANG>` files. See [CMake documentation](https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_FLAGS.html) for details.
:::

Other [CMake environment variables](https://cmake.org/cmake/help/latest/manual/cmake-env-variables.7.html) may also be critical.

To test the installation, one should first jump out of the source directory

Expand Down
4 changes: 4 additions & 0 deletions doc/third-party/ase.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Use deep potential with ASE

:::{note}
See [Environment variables](../env.md) for the runtime environment variables.
:::

Deep potential can be set up as a calculator with ASE to obtain potential energies and forces.

```python
Expand Down
4 changes: 4 additions & 0 deletions doc/third-party/dpdata.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Use deep potential with dpdata

:::{note}
See [Environment variables](../env.md) for the runtime environment variables.
:::

DeePMD-kit provides a driver for [dpdata](https://github.com/deepmodeling/dpdata) >=0.2.7 via the plugin mechanism, making it possible to call the `predict` method for `System` class:

```py
Expand Down
4 changes: 4 additions & 0 deletions doc/third-party/gromacs.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Running MD with GROMACS

:::{note}
See [Environment variables](../env.md) for the runtime environment variables.
:::

## DP/MM Simulation

This part gives a simple tutorial on how to run a DP/MM simulation for methane in water, which means using DP for methane and TIP3P for water. All relevant files can be found in `examples/methane`.
Expand Down
4 changes: 4 additions & 0 deletions doc/third-party/ipi.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Run path-integral MD with i-PI

:::{note}
See [Environment variables](../env.md) for the runtime environment variables.
:::

The i-PI works in a client-server model. The i-PI provides the server for integrating the replica positions of atoms, while the DeePMD-kit provides a client named `dp_ipi` that computes the interactions (including energy, forces and virials). The server and client communicate via the Unix domain socket or the Internet socket. Installation instructions for i-PI can be found [here](../install/install-ipi.md). The client can be started by

```bash
Expand Down
4 changes: 4 additions & 0 deletions doc/third-party/lammps-command.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Run MD with LAMMPS

:::{note}
See [Environment variables](../env.md) for the runtime environment variables.
:::

## units

All units in LAMMPS except `lj` are supported. `lj` is not supported.
Expand Down
9 changes: 1 addition & 8 deletions doc/train/training-advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,14 +161,7 @@ optional arguments:
**`--skip-neighbor-stat`** will skip calculating neighbor statistics if one is concerned about performance. Some features will be disabled.

To maximize the performance, one should follow [FAQ: How to control the parallelism of a job](../troubleshooting/howtoset_num_nodes.md) to control the number of threads.

One can set other environmental variables:

| Environment variables | Allowed value | Default value | Usage |
| ----------------------- | ------------- | ------------- | ------------------------------------------------------------------------------------------------------------------- |
| DP_INTERFACE_PREC | `high`, `low` | `high` | Control high (double) or low (float) precision of training. |
| DP_AUTO_PARALLELIZATION | 0, 1 | 0 | Enable auto parallelization for CPU operators. |
| DP_JIT | 0, 1 | 0 | Enable JIT. Note that this option may either improve or decrease the performance. Requires TensorFlow supports JIT. |
See [Runtime environment variables](../env.md) for all runtime environment variables.

## Adjust `sel` of a frozen model {{ tensorflow_icon }}

Expand Down
12 changes: 6 additions & 6 deletions doc/troubleshooting/howtoset_num_nodes.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,12 @@ For CPU devices, TensorFlow and PyTorch use multiple streams to run independent
export DP_INTER_OP_PARALLELISM_THREADS=3
```

However, for GPU devices, TensorFlow uses only one compute stream and multiple copy streams.
Note that some of DeePMD-kit OPs do not have GPU support, so it is still encouraged to set environmental variables even if one has a GPU.
However, for GPU devices, TensorFlow and PyTorch use only one compute stream and multiple copy streams.
Note that some of DeePMD-kit OPs do not have GPU support, so it is still encouraged to set environment variables even if one has a GPU.

## Parallelism within an individual operators
## Parallelism within individual operators

For CPU devices, `DP_INTRA_OP_PARALLELISM_THREADS` controls parallelism within TensorFlow (when TensorFlow is built against Eigen) and PyTorch native OPs.
For CPU devices, {envvar}`DP_INTRA_OP_PARALLELISM_THREADS` controls parallelism within TensorFlow (when TensorFlow is built against Eigen) and PyTorch native OPs.

```bash
export DP_INTRA_OP_PARALLELISM_THREADS=2
Expand All @@ -49,7 +49,7 @@ It may also control parallelism for NumPy when NumPy is built against OpenMP, so
export OMP_NUM_THREADS=2
```

There are several other environmental variables for OpenMP, such as `KMP_BLOCKTIME`.
There are several other environment variables for OpenMP, such as `KMP_BLOCKTIME`.

::::{tab-set}

Expand All @@ -70,7 +70,7 @@ See [PyTorch documentation](https://pytorch.org/tutorials/recipes/recipes/tuning
There is no one general parallel configuration that works for all situations, so you are encouraged to tune parallel configurations yourself after empirical testing.

Here are some empirical examples.
If you wish to use 3 cores of 2 CPUs on one node, you may set the environmental variables and run DeePMD-kit as follows:
If you wish to use 3 cores of 2 CPUs on one node, you may set the environment variables and run DeePMD-kit as follows:

::::{tab-set}

Expand Down
2 changes: 1 addition & 1 deletion source/api_cc/include/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ void select_map_inv(typename std::vector<VT>::iterator out,

/**
* @brief Get the number of threads from the environment variable.
* @details A warning will be thrown if environmental variables are not set.
* @details A warning will be thrown if environment variables are not set.
* @param[out] num_intra_nthreads The number of intra threads. Read from
*DP_INTRA_OP_PARALLELISM_THREADS.
* @param[out] num_inter_nthreads The number of inter threads. Read from
Expand Down