Precipitation Implementation Using CUDA

Due to the paraboloid free energy representation, we can leverage the HiPerC project to implement the equations of motion on the GPU using CUDA.

Workflow

`MMSP::generate()`

Initial conditions are generated using the existing MMSP code, and written to a compressed MMSP checkpoint, with no work assigned to the GPU.

Initialization

main.cpp was adapted from MMSP, rather than HiPerC.
The initial condition checkpoint gets read back into an MMSP::grid object.
The Laplacian kernel gets written into const cache on the GPU.
12 device arrays get allocated on the GPU: one old and one new for each field variable.
1. x_Cr
2. x_Nb
3. phi_del
4. phi_lav
5. x_gam_Cr
6. x_gam_Nb
12 identical host arrays get allocated on the CPU.

`MMSP::update()`

Before timestepping:

Data gets read from the MMSP::grid into the host arrays.
Data gets copied from the host to device arrays.

For each iteration:

Boundary conditions get applied on each of the "old" device arrays.
Laplacian values gets computed and recorded in each of the "new" arrays.
Boundary conditions get applied on each of the "new" device arrays.
Updated field values get computed from the "old" and "new" device arrays.
Secondary phases are stochastically inserted into the "new" device arrays.
Fictitious matrix phase compositions get computed and written into the "new" device array.
Pointers to "old" and "new" arrays are swapped in the device.

After timestepping:

Data gets copied from the 6 "old" device arrays, holding the updated values, into the "new" host arrays.
Data gets read from the host arrays into the MMSP::grid object.
The MMSP::grid object gets written to a compressed MMSP checkpoint.

Cleanup

Arrays gets freed from the host and device once the last checkpoint is written.