marp | theme | headingDivider |
---|---|---|
true |
ginkgo |
2 |
Early-Adopter Session 27.11.2024
- Concept: Executors
- Concept: LinOps
- Hello World!
- Creating Matrices
- Concept: Factories
Numerical Sparse Linear Algebra Library https://github.com/ginkgo-project/ginkgo
Supports GPUs and Manycore architecture
Focus on Portability and Composability
Written in modern C++
Comprehensive test and benchmark framework
Integrations into MFEM, deal.II, OpenFOAM, ...
Available on Spack
Represents a single device to run HPC code on:
- Singlecore CPU (Reference)
auto exec = gko::ReferenceExecutor::create();
- Multicore CPU (OpenMP)
auto exec = gko::OmpExecutor::create();
Represents a single device to run HPC code on:
- NVIDIA GPU (CUDA)
auto host_exec = gko::OmpExecutor::create(); auto exec = gko::CudaExecutor::create(device_id, host_exec);
- AMD GPU (HIP/ROCm)
auto host_exec = gko::OmpExecutor::create(); auto exec = gko::HipExecutor::create(device_id, host_exec);
- Intel GPU (SYCL)
auto host_exec = gko::OmpExecutor::create(); auto exec = gko::DpcppExecutor::create(device_id, host_exec);
Responsibilities:
- Memory allocation
- Copying data
- Executing kernels
Passed to other functions, not used directly!
void gko::func(std::shared_ptr<const gko::Executor> exec, ...);
Linear Operator representing:
- Matrices (Dense, Sparse CSR, COO, ELL, ...)
$$y = A\cdot x$$ - Solvers (Krylov CG, GMRES,..., Direct, Triangular)
$$S = A^{-1}, y = S\cdot x$$ - Preconditioners (Block-Jacobi, Multigrid, ILU(0), ...)
$$M \approx A^{-1}, y = M\cdot x$$ - Compositions and linear combination $$ C = A\cdot B, y = C\cdot x $$ $$ C = \alpha A + \beta B, y = C\cdot x$$
- Factorizations (ILU, LU, Cholesky, ...) $$ L\cdot U = A, y = LU \cdot x$$
- Functionality:
- Matrix (multi-)vector products
A->apply(x, y); // y = A * x A->apply(alpha, x, beta, y); // y = alpha * A * x + beta * y
- Matrix (multi-)vector products
- Properties:
- Executor
- Where is the data stored?
- Where are operations executed?
- Dimensions
$n \times m$
- Executor
- Compatibility:
- Ensures all inputs are on the operator's executor
Equivalent across all matrix formats:
- From C++ initializer lists (scalars, tiny matrices)
auto mtx = gko::initialize<Mtx>({{0.0, 1.0}, {1.0, 0.0}}, exec);
- From MatrixMarket files
auto mtx = gko::read<Mtx>(std::ifstream{"A.mtx"}, exec);
- From an existing matrix
auto mtx = other_mtx->clone(); auto mtx = gko::clone(exec, mtx);
- From a matrix in a different format
auto mtx = Mtx::create(exec); other_mtx->convert_to(mtx);
Three equivalent ways of specifying matrix values:
gko::matrix_data
for sequential assembly (no duplicates)gko::matrix_data<> data{gko::dim<2>{num_rows, num_cols}}; data.nonzeros.emplace_back(row, column, value); mtx->read(data);
gko::matrix_assembly_data
for sequential assemblygko::matrix_assembly_data<> data{gko::dim<2>{num_rows, num_cols}}; data.set_value(row, column, value); data.add_value(row, column, value); mtx->read(data);
gko::device_matrix_data
for high-performance assemblydevice_matrix_data<> data{exec, dim<2>{nrows, ncols}, nnz}; run_assembly_kernel(data.get_row_idxs(), data.get_col_idxs(), data.get_values()); // runs on device data.sum_duplicates(); // optional, e.g. for FEM edge DoFs mtx->read(data);
#include <ginkgo/ginkgo.hpp>
int main() {
using Mtx = gko::matrix::Csr<double, int>;
using Vec = gko::matrix::Dense<double>;
auto exec = gko::ReferenceExecutor::create();
auto mtx = gko::initialize<Mtx>({{0.0, 1.0}, {1.0, 0.0}}, exec);
auto vec = gko::initialize<Vec>({2.0, 3.0}, exec);
auto result = vec->clone();
mtx->apply(vec, result);
gko::write(std::cout, result);
}
#include <ginkgo/ginkgo.hpp>
int main() {
using Mtx = gko::matrix::Csr<double, int>;
using Vec = gko::matrix::Dense<double>;
auto exec = gko::CudaExecutor(0, gko::OmpExecutor::create());
auto mtx = gko::initialize<Mtx>({{0.0, 1.0}, {1.0, 0.0}}, exec);
auto vec = gko::initialize<Vec>({2.0, 3.0}, exec);
auto result = vec->clone();
mtx->apply(vec, result);
gko::write(std::cout, result);
}
#include <ginkgo/ginkgo.hpp>
int main() {
using Mtx = gko::matrix::Csr<double, int>;
using Vec = gko::matrix::Dense<double>;
std::shared_ptr<gko::CudaExecutor> exec =
gko::CudaExecutor::create(0, gko::OmpExecutor::create());
std::unique_ptr<Mtx> mtx = gko::initialize<Mtx>({{0.0, 1.0}, {1.0, 0.0}},
exec);
std::unique_ptr<Vec> vec = gko::initialize<Vec>({2.0, 3.0}, exec);
std::unique_ptr<Vec> result = vec->clone();
mtx->apply(vec, result);
gko::write(std::cout, result);
}
#include <ginkgo/ginkgo.hpp>
int main() {
using Mtx = gko::matrix::Csr<double, int>;
using Vec = gko::matrix::Dense<double>;
using dim = gko::dim<2>;
using matrix_data = gko::matrix_data<double, int>;
auto exec = gko::ReferenceExecutor::create();
matrix_data data{dim{10, 10}};
auto mtx = Mtx::create(exec);
auto x = Vec::create(exec, dim{10, 1});
auto y = Vec::create(exec, dim{10, 1});
for (int row = 0; row < 10; row++) {
data.nonzeros.emplace_back(row, row, 2.0);
data.nonzeros.emplace_back(row, (row + 1) % 10, -1.0);
data.nonzeros.emplace_back(row, (row + 9) % 10, -1.0);
x->at(row, 0) = 1.0; // only works on CPU executors
}
mtx->read(data);
mtx->apply(x, y);
gko::write(std::cout, y);
}