forked from gromacs/gromacs
-
Notifications
You must be signed in to change notification settings - Fork 4
/
README
214 lines (180 loc) · 10.7 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
This repository is out of date.
You can find the latest version of GROMACS with OpenCL support here: http://www.gromacs.org/Downloads
Check the manual for how to build GROMACS with OpenCL support enabled: http://manual.gromacs.org/documentation/5.1/install-guide/index.html#opencl-gpu-acceleration
============================================================
This is a modified version of GROMACS 5.0 adding OpenCL support.
All CUDA accelerated features have been ported to OpenCL 1.1.
Official GROMACS website: http://www.gromacs.org/
TABLE OF CONTENTS
1. SUPPORTED OPENCL DEVICES
2. GENERAL PROJECT SETUP
3. OPENCL SETUP
4. KNOWN LIMITATIONS
5. TESTED CONFIGURATIONS
1. SUPPORTED OPENCL DEVICES
========================
The current version works with NVIDIA GPUs and GCN based AMD GPUs.
Make sure that you have the latest drivers installed.
Also check "Known Limitations" chapter.
2. GENERAL PROJECT SETUP
=====================
Check GROMACS website for how to build the project:
http://www.gromacs.org/Documentation/Installation_Instructions
You can find additional information here:
https://github.com/StreamComputing/gromacs/wiki/GROMACS-GPU-ACCELERATION-USING-OPENCL#General_Project_Setup
3. OPENCL SETUP
============
Build Gromacs with OpenCL support enabled
-----------------------------------------
To build Gromacs with OpenCL support enabled, an OpenCL SDK must be installed
and the following cmake flags must be set:
GMX_GPU
GMX_USE_OPENCL
After setting these flags, if an OpenCL implementation has been detected on
your system, the following cmake entries will be defined:
OPENCL_INCLUDE_DIR - the OpenCL include directory
OPENCL_LIBRARY - the OpenCL library directory
The two paths will be automatically used to configure the project.
Run Gromacs with OpenCL accelerations enabled
---------------------------------------------
Gromacs loads and builds at runtime the OpenCL kernels. To do so, it needs to
know the location of the OpenCL source files.
If you want to run the installed version, the path to the OpenCL files is
automatically defined.
If you do not wish to install Gromacs, but run the version built from sources,
you need to provide the path to the source tree with the OpenCL kernels like
below:
export GMX_OCL_FILE_PATH=/path-to-gromacs/src/
Caching options for building the kernels
----------------------------------------
Building an OpenCL program can take a significant amount of time. NVIDIA
implements a mechanism to cache the result of the build. As a consequence,
only the first build of the OpenCL kernels will take longer, the following
builds will be very fast. AMD drivers, on the other hand, implement no
caching and building a program can be very slow. That's why we have started
implementing our own caching. Caching for OpenCL kernel builds is by default
enabled. To disable it, set GMX_OCL_NOGENCACHE environment variable.
If you plan to modify the OpenCL kernels, you should disable any caching:
* add GMX_OCL_NOGENCACHE environment variable and set it to 1
* for NVIDIA cards: add CUDA_CACHE_DISABLE environment variable and set it to 1
OpenCL Device Selection
-----------------------
The same option used to select CUDA devices or to define a mapping of GPUs to
PP ranks can also be used for OpenCL devices: -gpu_id
Environment Variables For OpenCL
--------------------------------
Currently, several environment variables exist that help customize some aspects
of the OpenCL version of Gromacs. They are mostly related to the runtime
compilation of OpenCL kernels, but they are also used on the device selection.
GXM_OCL_FILE_PATH: Is the full path to Gromacs src folder. Useful when gmx
is called from a folder other than the installation/bin folder.
GMX_OCL_NOGENCACHE: Disable caching for OpenCL kernel builds.
GMX_OCL_NOFASTGEN: Generates and compiles all algorithm flavors, otherwise
only the flavor required for the simulation is generated and compiled.
GMX_OCL_FASTMATH: Adds the option cl-fast-relaxed-math to the compiler
options (in the CUDA version this is enabled by default, it is likely that
the same will happen with the OpenCL version soon)
GMX_OCL_DUMP_LOG: If defined, the OpenCL build log is always written to file.
The file is saved in the current directory with the name
OpenCL_kernel_file_name.build_status where OpenCL_kernel_file_name is the name
of the file containing the OpenCL source code (usually nbnxn_ocl_kernels.cl)
and build_status can be either SUCCEEDED or FAILED. If this environment
variable is not defined, the default behavior is the following:
- Debug build: build log is always written to file
- Release build: build log is written to file only in case of errors.
GMX_OCL_VERBOSE: If defined, it enables verbose mode for OpenCL kernel build.
Currently available only for NVIDIA GPUs. See GMX_OCL_DUMP_LOG for details
about how to obtain the OpenCL build log.
GMX_OCL_DUMP_INTERM_FILES: If defined, intermediate language code corresponding
to the OpenCL build process is saved to file. Caching has to be turned off in
order for this option to take effect (see GMX_OCL_NOGENCACHE).
- NVIDIA GPUs: PTX code is saved in the current directory with the name
device_name.ptx
- AMD GPUs: .IL/.ISA files will be created for each OpenCL kernel built.
For details about where these files are created check AMD documentation
for -save-temps compiler option.
GMX_OCL_DEBUG: Use in conjunction with OCL_FORCE_CPU or with an AMD device.
It adds the debug flag to the compiler options (-g).
GMX_OCL_NOOPT: Disable optimisations. Adds the option cl-opt-disable to the
compiler options.
GMX_OCL_FORCE_CPU: Force the selection of a CPU device instead of a GPU.
This exists only for debugging purposes. Do not expect Gromacs to function
properly with this option on, it is solely for the simplicity of stepping
in a kernel and see what is happening.
GMX_OCL_NB_ANA_EWALD: Forces the use of analytical Ewald kernels.
Equivalent of CUDA env var GMX_CUDA_NB_ANA_EWALD
GMX_OCL_NB_TAB_EWALD: Forces the use of tabulated Ewald kernel. Equivalent
of CUDA env var GMX_OCL_NB_TAB_EWALD
GMX_OCL_NB_EWALD_TWINCUT: Forces the use of twin-range cutoff kernel.
Equivalent of CUDA env var GMX_CUDA_NB_EWALD_TWINCUT
GMX_DISABLE_OCL_TIMING: Disables timing for OpenCL operations
4. KNOWN LIMITATIONS
=================
- OpenCL on Intel CPUs is not supported
- OpenCL on Intel GPUs is not supported
- The current implementation is not compatible with OpenCL devices that are
not using warp/wavefronts or for which the warp/wavefront size is not a
multiple of 32
- The following kernels are known to produce incorrect results:
nbnxn_kernel_ElecEwQSTab_VdwLJ_VF_prune_opencl
nbnxn_kernel_ElecEwQSTab_VdwLJ_F_prune_opencl
- Due to blocking behavior of clEnqueue functions in NVIDIA driver, there is
almost no performance gain when using NVIDIA GPUs. A bug report has already
been filled on about this issue. A possible workaround would be to have a
separate thread for issuing GPU commands. However this hasn't been implemented
yet.
5. TESTED CONFIGURATIONS
=====================
Tested devices:
NVIDIA GPUs: GeForce GTX 660M, GeForce GTX 750Ti, GeForce GTX 780
AMD GPUs: FirePro W5100, HD 7950, FirePro W9100, Radeon R7 M260, R9 290
Tested kernels:
Kernel |Benchmark test |Remarks
--------------------------------------------------------------------------------------------------------
nbnxn_kernel_ElecCut_VdwLJ_VF_prune_opencl |d.poly-ch2 |
nbnxn_kernel_ElecCut_VdwLJ_F_opencl |d.poly-ch2 |
nbnxn_kernel_ElecCut_VdwLJ_F_prune_opencl |d.poly-ch2 |
nbnxn_kernel_ElecCut_VdwLJ_VF_opencl |d.poly-ch2 |
nbnxn_kernel_ElecRF_VdwLJ_VF_prune_opencl |adh_cubic with rf_verlet.mdp |
nbnxn_kernel_ElecRF_VdwLJ_F_opencl |adh_cubic with rf_verlet.mdp |
nbnxn_kernel_ElecRF_VdwLJ_F_prune_opencl |adh_cubic with rf_verlet.mdp |
nbnxn_kernel_ElecEwQSTab_VdwLJ_VF_prune_opencl |adh_cubic_vsites with pme_verlet_vsites.mdp |Failed
nbnxn_kernel_ElecEwQSTab_VdwLJ_F_prune_opencl |adh_cubic_vsites with pme_verlet_vsites.mdp |Failed
nbnxn_kernel_ElecEw_VdwLJ_VF_prune_opencl |adh_cubic_vsites with pme_verlet_vsites.mdp |
nbnxn_kernel_ElecEw_VdwLJ_F_opencl |adh_cubic_vsites with pme_verlet_vsites.mdp |
nbnxn_kernel_ElecEw_VdwLJ_F_prune_opencl |adh_cubic_vsites with pme_verlet_vsites.mdp |
nbnxn_kernel_ElecEwTwinCut_VdwLJ_F_prune_opencl |adh_cubic_vsites with pme_verlet_vsites.mdp |
nbnxn_kernel_ElecEwTwinCut_VdwLJ_F_opencl |adh_cubic_vsites with pme_verlet_vsites.mdp |
Input data used for testing - Benchmark data sets available here:
ftp://ftp.gromacs.org/pub/benchmarks
* * * * *
The GROMACS OpenCL team
StreamComputing - www.streamcomputing.eu
Vincent Hindriksen - vincent@streamcomputing.eu
Teemu Virolainen - teemu@streamcomputing.eu
Anca Hamuraru - anca@streamcomputing.eu
Contributors
Dimitrios Karkoulis - dimitris.karkoulis@gmail.com
* * * * *
GROMACS is free software, distributed under the GNU Lesser General
Public License, version 2.1 However, scientific software is a little
special compared to most other programs. Both you, we, and all other
GROMACS users depend on the quality of the code, and when we find bugs
(every piece of software has them) it is crucial that we can correct
it and say that it was fixed in version X of the file or package
release. For the same reason, it is important that you can reproduce
other people's result from a certain GROMACS version.
The easiest way to avoid this kind of problems is to get your modifications
included in the main distribution. We'll be happy to consider any decent
code. If it's a separate program it can probably be included in the contrib
directory straight away (not supported by us), but for major changes in the
main code we appreciate if you first test that it works with (and without)
MPI, threads, double precision, etc.
If you still want to distribute a modified version or use part of GROMACS
in your own program, remember that the entire project must be licensed
according to the requirements of the LGPL v2.1 license under which you
received this copy of GROMACS. We request that it must clearly be labeled as
derived work. It should not use the name "official GROMACS", and make
sure support questions are directed to you instead of the GROMACS developers.
Sorry for the hard wording, but it is meant to protect YOUR reseach results!
* * * * *