Skip to content

Latest commit

 

History

History
468 lines (379 loc) · 17 KB

README.md

File metadata and controls

468 lines (379 loc) · 17 KB

Tutorial

Table of Contents

Before getting started, it is best to test QuEST on your hardware.

Coding

QuEST can be integrated into your C or C++ project, simply by including

#include <QuEST.h>

Your simulation code will look the same and compile with the same build system, regardless of whether run in multithreaded, GPU and distributed modes.

For example, here is a platform agnostic simulation of a very simple circuit which produces and measures state equation

#include <QuEST.h>

int main() {

  // load QuEST
  QuESTEnv env = createQuESTEnv();
  
  // create a 2 qubit register in the zero state
  Qureg qubits = createQureg(2, env);
  initZeroState(qubits);
	
  // apply circuit
  hadamard(qubits, 0);
  controlledNot(qubits, 0, 1);
  measure(qubits, 1);
	
  // unload QuEST
  destroyQureg(qubits, env); 
  destroyQuESTEnv(env);
  return 0;
}

Of course, this code doesn't output anything!


Let's walk through a more sophisticated circuit.

We first construct a QuEST environment with createQuESTEnv() which abstracts away any preparation of multithreading, distribution or GPU-acceleration strategies.

QuESTEnv env = createQuESTEnv();

We then create a quantum register, in this case containing 3 qubits, via createQureg()

Qureg qubits = createQureg(3, env);

and initialise the register.

initZeroState(qubits);

We can create multiple Qureg instances, and QuEST will sort out allocating memory for the state-vectors, even over networks! If we wanted to simulate noise in our circuit, we can replace createQureg with createDensityQureg to create a more powerful density matrix capable of representing mixed states, and simulating decoherence.

We're now ready to apply some unitaries to our qubits, which in this case have indices 0, 1 and 2. When applying an operator, we pass along which quantum register to operate upon.

hadamard(qubits, 0);
controlledNot(qubits, 0, 1);
rotateY(qubits, 2, .1);

Some gates allow us to specify a general number of control qubits

int controls[] = {0, 1, 2};
multiControlledPhaseGate(qubits, controls, 3);

We can specify general single-qubit unitary operations as 2x2 matrices

// sqrt(X) with a pi/4 global phase
ComplexMatrix2 u = {
    .real = {{.5, .5}, { .5,.5}},
    .imag = {{.5,-.5}, {-.5,.5}}};
unitary(qubits, 0, u);

or more compactly, foregoing the global phase factor,

Complex a = {.real = .5, .imag = .5};
Complex b = {.real = .5, .imag =-.5};
compactUnitary(qubits, 1, a, b);

or even more compactly, as a rotation around an arbitrary axis on the Bloch-sphere

Vector v = {.x=1, .y=0, .z=0};
rotateAroundAxis(qubits, 2, 3.14/2, v);

We can controlled-apply general unitaries

controlledCompactUnitary(qubits, 0, 1, a, b);

even with multiple control qubits!

multiControlledUnitary(qubits, (int[]) {0, 1}, 2, 2, u);

There are many questions and calculations we can now ask of our quantum register.

qreal prob = getProbAmp(qubits, 7);
printf("Probability amplitude of |111>: %lf\n", prob);

Here, qreal is an alias for a real floating point number, like double. This is to keep our code precision agnostic, so that we may change the numerical precision at compile time (by setting build option PRECISION) without any changes to our code. Changing the precision can be useful in verifying numerical convergences or studying rounding errors.

How probable is measuring our final qubit (with index 2) in outcome 1?

prob = calcProbOfOutcome(qubits, 2, 1);
printf("Probability of qubit 2 being in state 1: %f\n", prob);

We can also perform non-unitary gates upon the state. Let's destructively measure the first qubit, randomly collapsing into outcome 0 or 1

int outcome = measure(qubits, 0);
printf("Qubit 0 was measured in state %d\n", outcome);

and now measure our final qubit, while also learning of the probability of its outcome.

outcome = measureWithStats(qubits, 2, &prob);
printf("Qubit 2 collapsed to %d with probability %f\n", outcome, prob);

We could even apply non-physical operators to our register, to break its normalisation, which can often allow us to take computational shortcuts like this one.

At the conclusion of our circuit, we should free up the memory used by our quantum registers.

destroyQureg(qubits, env);
destroyQuESTEnv(env);

The effect of the code above is to simulate the circuit below


and after compiling (see section below) and running, gives psuedo-random output

Probability amplitude of |111>: 0.498751
Probability of qubit 2 being in state 1: 0.749178
Qubit 0 was measured in state 1
Qubit 2 collapsed to 1 with probability 0.998752
Probability amplitude of |111>: 0.498751
Probability of qubit 2 being in state 1: 0.749178
Qubit 0 was measured in state 0
Qubit 2 collapsed to 1 with probability 0.499604

QuEST uses the Mersenne Twister algorithm to generate random numbers used for randomly collapsing quantum states. The user can seed this RNG using seedQuEST(), otherwise QuEST will by default create a seed from the current time and the process id.

In distributed mode (see below), all code in your source files will be executed independently on every node. To execute some code (e.g. printing) only on one node, use

QuESTEnv env = createQuESTEnv();

if (env.rank == 0)
    printf("Only one node executes this print!");

Such conditions are valid and always satisfied in code run on a single node.


Compiling

QuEST uses CMake (version 3.7 or higher) as its build system. Configure the build by supplying the below -D[VAR=VALUE] options after the cmake .. command. You can alternatively compile via GNU Make directly with the provided makefile.

Windows users should install CMake and Build Tools, and run the below commands in the Developer Command Prompt for VS

To compile, run:

mkdir build
cd build
cmake .. -DUSER_SOURCE="[FILENAME]"
make

where [FILENAME] is the name of your source file, including the file extension, relative to the root QuEST directory (above build).

Windows users should replace the final two build commands with

cmake .. -G "NMake Makefiles"
nmake

If using MSVC and NMake in this way fails, users can forego GPU acceleration, download MinGW-w64, and compile via

cmake .. -G "MinGW Makefiles"
make

Compiling directly with make and the provided makefile, copied to the root directory, may prove easier.

If your project contains multiple source files, separate them with semi-colons. For example,

 -DUSER_SOURCE="source1.c;source2.cpp"
  • To set the compilers used by cmake (to e.g. gcc-6), use

     -DCMAKE_C_COMPILER=gcc-6

    and similarly to set the C++ compiler (as used in GPU mode), use

     -DCMAKE_CXX_COMPILER=g++-6
  • If you wish your executable to be named something other than demo, you can set this too by adding argument:

     -DOUTPUT_EXE="myExecutable" 
  • To compile your code to use multithreading, for parallelism on multi-core or multi-CPU systems, use

    -DMULTITHREADED=1

    Before launching your executable, set the number of participating threads using OMP_NUM_THREADS. For example,

    export OMP_NUM_THREADS=16
    ./myExecutable
  • To compile your code to run on distributed or networked systems use

     -DDISTRIBUTED=1

    Depending on your MPI implementation, your executable can be launched via

    mpirun -np [NUM_NODES] [EXEC]

    where [NUM_NODES] is the number of distributed compute nodes to use, and [EXEC] is the name of your executable. Note that QuEST hybridises multithreading and distribution. Hence you should set [NUM_NODES] to equal exactly the number of distinct compute nodes (which don't share memory), and set OMP_NUM_THREADS as above to assign the number of threads used on each compute node.

  • To compile for GPU, use

     -DGPUACCELERATED=1 -DGPU_COMPUTE_CAPABILITY=[CC] ..

    were [CC] is the compute cabability of your GPU, written without a decimal point. This can can be looked up at the NVIDIA website.

    Note that CUDA is not compatible with all compilers. To force cmake to use a compatible compiler, override CMAKE_C_COMPILER and CMAKE_CXX_COMPILER.
    For example, to compile for the Quadro P6000 with gcc-6:

    cmake .. -DGPUACCELERATED=1 -DGPU_COMPUTE_CAPABILITY=61 \
             -DCMAKE_C_COMPILER=gcc-6 -DCMAKE_CXX_COMPILER=g++-6
  • You can additionally customise the floating point precision used by QuEST's qreal type, via

     -DPRECISION=1
     -DPRECISION=2
     -DPRECISION=4

    which uses single (qreal = float), double (qreal = double) and quad (qreal = long double) respectively. Using greater precision means more precise computation but at the expense of additional memory requirements and runtime. Checking results are unchanged when switching the precision can be a great test that your calculations are sufficiently precise.

After making changes to your code, you can quickly recompile using make directly, within the build/ directory.

For a full list of available configuration parameters, use

cmake -LH ..

For manual configuration (not recommended) you can change the CMakeLists.txt in the root QuEST directory. You can also directly modify makefile, and compile using GNUMake directly, by copying makefile into the root repository directory and running

make

Running

Locally

Once compiled as above, the compiled executable can be locally run from within the build directory.

./myExecutable
  • In multithreaded mode, the number of threads QuEST will use can be set by modifying OMP_NUM_THREADS, ideally to the number of available cores on your machine

    export OMP_NUM_THREADS=8
    ./myExecutable
  • In distributed mode, QuEST will uniformly divide every Qureg between a power-of-2 number of nodes, and can be launched with mpirun. For example, here using 8 nodes

    mpirun -np 8 ./myExecutable

    If multithreading is also enabled, the number of threads used by each node can be set using OMP_NUM_THREADS. For example, here using 8 nodes with 16 threads on each (a total of 128 processors):

    export OMP_NUM_THREADS=16
    mpirun -np 8 ./myExecutable
  • In GPU mode, the executable is launched directly via

    ./myExecutable

On supercomputers

There are no special requirements for running QuEST through job submission systems, like SLURM. Just call ./myExecutable as you would any other binary.

For example, the tutorial code can be run with on 4 distributed nodes (each with 8 cores) on a SLURM system using the following SLURM submission script

#SBATCH --nodes=4
#SBATCH --ntasks-per-node=1

module load mvapich2

mkdir build
cd build
cmake .. -DDISTRIBUTED=1 -DMULTITHREADED=1
make

export OMP_NUM_THREADS=8
mpirun ./myExecutable

A PBS submission script like is similar

#PBS -l select=4:ncpus=8

module purge
module load mvapich2

mkdir build
cd build
cmake -DDISTRIBUTED=1 ..
make

export OMP_NUM_THREADS=8
aprun -n 4 -d 8 -cc numa_node ./myExecutable

Running QuEST on a GPU is just a matter of specifying resources and the appropriate compilers

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:1 

#SBATCH --partition=gpu    ## name may vary

module purge
module load cuda  ## name may vary

mkdir build
cd build
cmake -DGPUACCELERATED=1 -DGPU_COMPUTE_CAPABILITY=[Compute capability] ..
make

./myExecutable

On each platform, there is no change to our source code or our QuEST interface. We simply recompile, and QuEST will utilise the available hardware (a GPU, shared-memory or distributed CPUs) to speedup our code.


Testing

QuEST includes a comprehensive set of unit tests, to assure every function performs correctly. These are located in the tests directory (documented here), and compare QuEST's optimised routines to slower, algorithmically distinct methods (documented here). It is a good idea to run these tests on your machine to check QuEST is properly configured, and especially so in GPU mode, to check you have correctly set GPU_COMPUTE_CAPABILITY.

Tests should be compiled in a build directory within the root QuEST directory.

mkdir build 
cd build

To compile, run:

cmake .. -DTESTING=ON
make

You can include additional CMake arguments to target your desired hardware, such as -DDISTRIBUTION=1.

Next, to launch all unit tests, run:

make test

You should see each function being tested in turn; some will be very fast, and some very slow.

This is because the tests run functions with every one of their possible inputs (where possible). Functions with more possible inputs will hence take longer to test. The difference in testing time between different functions can hence be very large, and does not indicate a testing nor performance problem.

For example:

      Start   1: calcDensityInnerProduct
1/117 Test   #1: calcDensityInnerProduct .............   Passed    0.16 sec
      Start   2: calcExpecDiagonalOp
2/117 Test   #2: calcExpecDiagonalOp .................   Passed    0.07 sec
      Start   3: calcExpecPauliHamil
3/117 Test   #3: calcExpecPauliHamil .................   Passed    0.64 sec
      Start   4: calcExpecPauliProd
4/117 Test   #4: calcExpecPauliProd ..................   Passed   94.88 sec

You can also run the executable build/tests/tests directly, to see more statistics, and to make use of the Catch2 command-line

./tests/tests

===============================================================================
All tests passed (99700 assertions in 117 test cases)

This is necessary to run the tests in distributed mode:

mpirun -np 8 tests/tests

Using the command-line is especially useful for contributors to QuEST, for example to run only their new function:

./tests/tests myNewFunction

or a sub-test within:

./tests/tests myNewFunction -c "correctness" -c "density-matrix" -c "unnormalised"

Ideally, a new function should have its unit test run in every configuration of hardware (including #threads and #nodes) and precision. The below bash script automates this.

export f=myNewFunction    # function to test
export cc=30              # GPU compute-capability
export nt=16              # number of CPU threads

test() {
    cmake .. -DTESTING=ON -DPRECISION=$p \
             -DMULTITHREADED=$mt -DDISTRIBUTED=$d \
             -DGPUACCELERATED=$ga -DGPU_COMPUTE_CAPABILITY=$cc
             # insert additional cmake params here, if needed
    make
    export OMP_NUM_THREADS=$nt
    if (( $d == 1 )); then 
        mpirun -np $nn ./tests/tests $f
    else 
        ./tests/tests $f
    fi
}

# precision
for p in 1 2 4; do
    # serial
    mt=0 d=0 ga=0 test
    # multithreaded
    mt=1 d=0 ga=0 test
    # gpu 
    mt=0 d=0 ga=1 test
    # distributed (+multithreaded)
    for nn in 2 4 8 16; do
        mt=1 d=1 ga=0 test
    done
done