-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ScaledReordered interface #1059
Conversation
format! |
b7dadd1
to
6bba2d5
Compare
format! |
c66575c
to
590b0ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking about removing the ValueType and IndexType from the ScaledReordered.
set_cache_to you can use create_with_config_of to create cache.
some requirement of ValueType / IndexType can be done by run
Will it use ScaledReordered even when only need scale or reorder part?
Not a Scaled and Reordered class to handle them separately and ScaledReordered inherit from Scaled and Reordered.
GKO_ASSERT_EQUAL_DIMENSIONS(parameters_.row_scaling, | ||
system_matrix_); | ||
row_scaling_ = parameters_.row_scaling; | ||
row_scaling_->apply(system_matrix_.get(), system_matrix_.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is dangerous in my opinion. it also destroys the __restrict__
property for GPU.
You can add the in place operation when detecting the input and output are the same pointer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#988 <---
for some LinOps, this is perfectly safe - in-place scaling is one of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, at least in my imagine about what compiler works. TBH, I wouldn't like to challenge the compiler did the same thing as imagine.
cache_.inner_b->copy_from(b); | ||
cache_.inner_x->copy_from(x); | ||
cache_.intermediate->copy_from(b); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need to copy the data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apply_uses_initial_guess()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. LGTM!
#include <ginkgo/core/reorder/reordering_base.hpp> | ||
|
||
|
||
namespace gko { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this fits better in the reorder/
folder rather than in base/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, before merging this we should agree on naming and placement.
* inner operator is applied to the system matrix P*R*A*C*P^T instead of A, | ||
* so the instead of A*x = b the inner operator attempts to solve the equivalent | ||
* linear system P*R*A*C*P^T*y = P*R*b and retrieves the solution x = C*P^T*y. | ||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe note that the input matrix is cloned and not modified.
LinOp* x) const override; | ||
|
||
/** | ||
* Prepares the intermediate right hand side, solution and intermediateb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Prepares the intermediate right hand side, solution and intermediateb | |
* Prepares the intermediate right hand side, solution and intermediate |
virtual std::shared_ptr<const gko::LinOp> get_permutation() const = 0; | ||
|
||
virtual std::shared_ptr<const gko::LinOp> get_inverse_permutation() | ||
const = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to have these getters here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the scaled reordered factory takes a ReorderingBaseFactory
, I think these are necessary here rather than only in the algorithm implemenations.
AFAIR, combing scaled and reordered together allows some algorithms uses only one operation. |
590b0ee
to
ee1b949
Compare
At the moment, not at all. So far the operations are not merged yet. |
will the merged operations be in this class or another class inherited from this class? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an interface break at the moment (the only reason why I don't approve this PR).
Unfortunately, it will be quite difficult to find a solution that does not break the existing interface while being intuitive to use (for example, adding get_permutation_2
would not break the interface, but be quite confusing for the user).
Additionally, the actual reordering inside the constructor of ScaledReordered
needs a bit more work.
reference/test/base/reordered.cpp
Outdated
SR::build().with_row_scaling(this->diag).on(this->exec); | ||
auto scaled_reordered = scaled_reordered_fact->generate(this->mtx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit (for AAA separation):
SR::build().with_row_scaling(this->diag).on(this->exec); | |
auto scaled_reordered = scaled_reordered_fact->generate(this->mtx); | |
SR::build().with_row_scaling(this->diag).on(this->exec); | |
auto scaled_reordered = scaled_reordered_fact->generate(this->mtx); |
reference/test/base/reordered.cpp
Outdated
auto scaled_reordered = scaled_reordered_fact->generate(this->mtx); | ||
|
||
auto before_system_matrix = scaled_reordered->get_system_matrix(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
auto scaled_reordered = scaled_reordered_fact->generate(this->mtx); | |
auto before_system_matrix = scaled_reordered->get_system_matrix(); | |
auto scaled_reordered = scaled_reordered_fact->generate(this->mtx); | |
auto before_system_matrix = scaled_reordered->get_system_matrix(); |
reference/test/base/reordered.cpp
Outdated
auto scaled_reordered = scaled_reordered_fact->generate(this->mtx); | ||
|
||
auto before_system_matrix = scaled_reordered->get_system_matrix(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here:
auto scaled_reordered = scaled_reordered_fact->generate(this->mtx); | |
auto before_system_matrix = scaled_reordered->get_system_matrix(); | |
auto scaled_reordered = scaled_reordered_fact->generate(this->mtx); | |
auto before_system_matrix = scaled_reordered->get_system_matrix(); |
auto reordering = parameters_.reordering->generate(system_matrix_); | ||
auto perm = reordering->get_permutation(); | ||
permutation_ = | ||
clone(exec, as<matrix::Permutation<index_type>>(perm)); | ||
auto permutation_array = | ||
make_array_view(exec, permutation_->get_size()[0], | ||
permutation_->get_permutation()); | ||
system_matrix_ = as<Permutable<index_type>>(system_matrix_) | ||
->permute(&permutation_array); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires your reordering to return a matrix::Permutation
. However, this is not documented and not intuitive (your ReorderingBaseFactory
needs get_permutation()
to return a matrix::Permutation
).
Is there a reason why you don't call the apply
method on the permutation matrix? That would allow for more formats than just matrix::Permutation
and would work a lot better with your new interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with an array now, that works for all kinds of permutations and does not assume we have a permutation matrix.
cache_struct() = default; | ||
~cache_struct() = default; | ||
cache_struct(const cache_struct&) {} | ||
cache_struct(cache_struct&&) {} | ||
cache_struct& operator=(const cache_struct&) { return *this; } | ||
cache_struct& operator=(cache_struct&&) { return *this; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need one newline between these function
void set_cache_to(const LinOp* b, const LinOp* x) const | ||
{ | ||
if (cache_.inner_b == nullptr) { | ||
const auto size = b->get_size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you also need to check the size. When the size is different, create the new cache
5077a13
to
7604362
Compare
format! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvements. I have only minor requests, after addressing these, you can feel free and merge it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for what we can currently change - I think there is still a lot of work to be done in the Permutation, Diagonal and ScaledPermutation area, but that needs to happen in Ginkgo 2.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general. I have discussed with Fritz about the combining the permutation and scaling. It will change the implementation not a new class, so my above worry will not happen.
I put it is request changes because it may give error when inner_operator is nullptr (generated Identity)
std::unique_ptr<matrix::Dense<value_type>> inner_b{}; | ||
|
||
std::unique_ptr<matrix::Dense<value_type>> inner_x{}; | ||
|
||
std::unique_ptr<matrix::Dense<value_type>> intermediate{}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::unique_ptr<matrix::Dense<value_type>> inner_b{}; | |
std::unique_ptr<matrix::Dense<value_type>> inner_x{}; | |
std::unique_ptr<matrix::Dense<value_type>> intermediate{}; | |
std::unique_ptr<matrix::Dense<value_type>> inner_b{}; | |
std::unique_ptr<matrix::Dense<value_type>> inner_x{}; | |
std::unique_ptr<matrix::Dense<value_type>> intermediate{}; |
nit: these are data
inner_operator_ = | ||
parameters_.inner_operator->generate(system_matrix_); | ||
} else { | ||
inner_operator_ = gko::matrix::Identity<value_type>::create(exec); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to put the size, apply will check the size with b, x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general. two concern: 1. public interface from using alias 2. move the get_permutation_array into ReorderBase.
public: | ||
using index_type = IndexType; | ||
|
||
virtual const array<index_type>& get_permutation_array() const = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can also provide the setter for permutation_array in protected and the permutation data in private.
without implement it in each class
/** | ||
* Declares an Abstract Factory specialized for ReorderingBases | ||
*/ | ||
using ReorderingBaseFactory = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the IndexType
to these two. But I think technically, the second one is still an interface break since in case someone used the PolymorphicBase
, this would no longer work...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine if you break the interface at that point as I don't think a lot of people (my assumption is 0 outside of us) were using it. If it is compiled with c++17
, the interface is no longer broken, so I would consider it as acceptable.
0aa15d4
to
cd4fc2a
Compare
Depends on #1171 now. |
Kudos, SonarCloud Quality Gate passed! |
cd4fc2a
to
bceffbe
Compare
Ready to merge after #1171 |
Co-authored-by: fritzgoebel <fritzgoebel@users.noreply.github.com>
Co-authored-by: fritzgoebel <fritzgoebel@users.noreply.github.com>
bceffbe
to
57a058e
Compare
Note: This PR changes the Ginkgo ABI:
For details check the full ABI diff under Artifacts here |
Advertise release 1.5.0 and last changes + Add changelog, + Update third party libraries + A small fix to a CMake file See PR: #1195 The Ginkgo team is proud to announce the new Ginkgo minor release 1.5.0. This release brings many important new features such as: - MPI-based multi-node support for all matrix formats and most solvers; - full DPC++/SYCL support, - functionality and interface for GPU-resident sparse direct solvers, - an interface for wrapping solvers with scaling and reordering applied, - a new algebraic Multigrid solver/preconditioner, - improved mixed-precision support, - support for device matrix assembly, and much more. If you face an issue, please first check our [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues) and the [open issues list](https://github.com/ginkgo-project/ginkgo/issues) and if you do not find a solution, feel free to [open a new issue](https://github.com/ginkgo-project/ginkgo/issues/new/choose) or ask a question using the [github discussions](https://github.com/ginkgo-project/ginkgo/discussions). Supported systems and requirements: + For all platforms, CMake 3.13+ + C++14 compliant compiler + Linux and macOS + GCC: 5.5+ + clang: 3.9+ + Intel compiler: 2018+ + Apple LLVM: 8.0+ + NVHPC: 22.7+ + Cray Compiler: 14.0.1+ + CUDA module: CUDA 9.2+ or NVHPC 22.7+ + HIP module: ROCm 4.0+ + DPC++ module: Intel OneAPI 2021.3 with oneMKL and oneDPL. Set the CXX compiler to `dpcpp`. + Windows + MinGW and Cygwin: GCC 5.5+ + Microsoft Visual Studio: VS 2019 + CUDA module: CUDA 9.2+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. Algorithm and important feature additions: + Add MPI-based multi-node for all matrix formats and solvers (except GMRES and IDR). ([#676](#676), [#908](#908), [#909](#909), [#932](#932), [#951](#951), [#961](#961), [#971](#971), [#976](#976), [#985](#985), [#1007](#1007), [#1030](#1030), [#1054](#1054), [#1100](#1100), [#1148](#1148)) + Porting the remaining algorithms (preconditioners like ISAI, Jacobi, Multigrid, ParILU(T) and ParIC(T)) to DPC++/SYCL, update to SYCL 2020, and improve support and performance ([#896](#896), [#924](#924), [#928](#928), [#929](#929), [#933](#933), [#943](#943), [#960](#960), [#1057](#1057), [#1110](#1110), [#1142](#1142)) + Add a Sparse Direct interface supporting GPU-resident numerical LU factorization, symbolic Cholesky factorization, improved triangular solvers, and more ([#957](#957), [#1058](#1058), [#1072](#1072), [#1082](#1082)) + Add a ScaleReordered interface that can wrap solvers and automatically apply reorderings and scalings ([#1059](#1059)) + Add a Multigrid solver and improve the aggregation based PGM coarsening scheme ([#542](#542), [#913](#913), [#980](#980), [#982](#982), [#986](#986)) + Add infrastructure for unified, lambda-based, backend agnostic, kernels and utilize it for some simple kernels ([#833](#833), [#910](#910), [#926](#926)) + Merge different CUDA, HIP, DPC++ and OpenMP tests under a common interface ([#904](#904), [#973](#973), [#1044](#1044), [#1117](#1117)) + Add a device_matrix_data type for device-side matrix assembly ([#886](#886), [#963](#963), [#965](#965)) + Add support for mixed real/complex BLAS operations ([#864](#864)) + Add a FFT LinOp for all but DPC++/SYCL ([#701](#701)) + Add FBCSR support for NVIDIA and AMD GPUs and CPUs with OpenMP ([#775](#775)) + Add CSR scaling ([#848](#848)) + Add array::const_view and equivalent to create constant matrices from non-const data ([#890](#890)) + Add a RowGatherer LinOp supporting mixed precision to gather dense matrix rows ([#901](#901)) + Add mixed precision SparsityCsr SpMV support ([#970](#970)) + Allow creating CSR submatrix including from (possibly discontinuous) index sets ([#885](#885), [#964](#964)) + Add a scaled identity addition (M <- aI + bM) feature interface and impls for Csr and Dense ([#942](#942)) Deprecations and important changes: + Deprecate AmgxPgm in favor of the new Pgm name. ([#1149](#1149)). + Deprecate specialized residual norm classes in favor of a common `ResidualNorm` class ([#1101](#1101)) + Deprecate CamelCase non-polymorphic types in favor of snake_case versions (like array, machine_topology, uninitialized_array, index_set) ([#1031](#1031), [#1052](#1052)) + Bug fix: restrict gko::share to rvalue references (*possible interface break*) ([#1020](#1020)) + Bug fix: when using cuSPARSE's triangular solvers, specifying the factory parameter `num_rhs` is now required when solving for more than one right-hand side, otherwise an exception is thrown ([#1184](#1184)). + Drop official support for old CUDA < 9.2 ([#887](#887)) Improved performance additions: + Reuse tmp storage in reductions in solvers and add a mutable workspace to all solvers ([#1013](#1013), [#1028](#1028)) + Add HIP unsafe atomic option for AMD ([#1091](#1091)) + Prefer vendor implementations for Dense dot, conj_dot and norm2 when available ([#967](#967)). + Tuned OpenMP SellP, COO, and ELL SpMV kernels for a small number of RHS ([#809](#809)) Fixes: + Fix various compilation warnings ([#1076](#1076), [#1183](#1183), [#1189](#1189)) + Fix issues with hwloc-related tests ([#1074](#1074)) + Fix include headers for GCC 12 ([#1071](#1071)) + Fix for simple-solver-logging example ([#1066](#1066)) + Fix for potential memory leak in Logger ([#1056](#1056)) + Fix logging of mixin classes ([#1037](#1037)) + Improve value semantics for LinOp types, like moved-from state in cross-executor copy/clones ([#753](#753)) + Fix some matrix SpMV and conversion corner cases ([#905](#905), [#978](#978)) + Fix uninitialized data ([#958](#958)) + Fix CUDA version requirement for cusparseSpSM ([#953](#953)) + Fix several issues within bash-script ([#1016](#1016)) + Fixes for `NVHPC` compiler support ([#1194](#1194)) Other additions: + Simplify and properly name GMRES kernels ([#861](#861)) + Improve pkg-config support for non-CMake libraries ([#923](#923), [#1109](#1109)) + Improve gdb pretty printer ([#987](#987), [#1114](#1114)) + Add a logger highlighting inefficient allocation and copy patterns ([#1035](#1035)) + Improved and optimized test random matrix generation ([#954](#954), [#1032](#1032)) + Better CSR strategy defaults ([#969](#969)) + Add `move_from` to `PolymorphicObject` ([#997](#997)) + Remove unnecessary device_guard usage ([#956](#956)) + Improvements to the generic accessor for mixed-precision ([#727](#727)) + Add a naive lower triangular solver implementation for CUDA ([#764](#764)) + Add support for int64 indices from CUDA 11 onward with SpMV and SpGEMM ([#897](#897)) + Add a L1 norm implementation ([#900](#900)) + Add reduce_add for arrays ([#831](#831)) + Add utility to simplify Dense View creation from an existing Dense vector ([#1136](#1136)). + Add a custom transpose implementation for Fbcsr and Csr transpose for unsupported vendor types ([#1123](#1123)) + Make IDR random initilization deterministic ([#1116](#1116)) + Move the algorithm choice for triangular solvers from Csr::strategy_type to a factory parameter ([#1088](#1088)) + Update CUDA archCoresPerSM ([#1175](#1116)) + Add kernels for Csr sparsity pattern lookup ([#994](#994)) + Differentiate between structural and numerical zeros in Ell/Sellp ([#1027](#1027)) + Add a binary IO format for matrix data ([#984](#984)) + Add a tuple zip_iterator implementation ([#966](#966)) + Simplify kernel stubs and declarations ([#888](#888)) + Simplify GKO_REGISTER_OPERATION with lambdas ([#859](#859)) + Simplify copy to device in tests and examples ([#863](#863)) + More verbose output to array assertions ([#858](#858)) + Allow parallel compilation for Jacobi kernels ([#871](#871)) + Change clang-format pointer alignment to left ([#872](#872)) + Various improvements and fixes to the benchmarking framework ([#750](#750), [#759](#759), [#870](#870), [#911](#911), [#1033](#1033), [#1137](#1137)) + Various documentation improvements ([#892](#892), [#921](#921), [#950](#950), [#977](#977), [#1021](#1021), [#1068](#1068), [#1069](#1069), [#1080](#1080), [#1081](#1081), [#1108](#1108), [#1153](#1153), [#1154](#1154)) + Various CI improvements ([#868](#868), [#874](#874), [#884](#884), [#889](#889), [#899](#899), [#903](#903), [#922](#922), [#925](#925), [#930](#930), [#936](#936), [#937](#937), [#958](#958), [#882](#882), [#1011](#1011), [#1015](#1015), [#989](#989), [#1039](#1039), [#1042](#1042), [#1067](#1067), [#1073](#1073), [#1075](#1075), [#1083](#1083), [#1084](#1084), [#1085](#1085), [#1139](#1139), [#1178](#1178), [#1187](#1187))
This PR adds an interface for wrapping an inner operator (like a direct or iterative solver) into a (for now symmetrical, e.g. RCM) reordering via a reordering factory and diagonal scaling via one row and one column scaling diagonal matrix.
The
ScaledReordered
LinOp applies the given scaling and reordering to the system matrix and generates the inner operator on the modified problem. It manages three vectors, a right-hand side and a solution vector for the inner operator, and an intermediate vector used for the permutations.With a permutation matrix
P
, a row scaling matrixR
, and a column matrixC
, the system matrix and vectors are scaled and permuted such that instead of the original system of equationsA*x = b
the inner operator solves
P*R*A*C*P^T*y = P*R*b
.Hence, when generating a
ScaledReordered
LinOp, the new system matrixP*R*A*C*P^T
is computed. When it is applied to vectorsb
andx
, the right-hand side vector is modified toP*R*b
, and the solutiony
to the inner system gives the solutionx
to the original system viax = C*P^T*y
.When using an inner operator that uses an initial guess, currently the initial guess for the inner system is computed by
y_0 = P*C^(-1)*x_0
, which is the same as usingx_0
for the systemP*R*A*x = P*R*b
. For the inverse diagonal scaling here I added aninverse_apply
to thematrix::Diagonal
. I still need to add tests for that.Any feedback is welcome, especially on ideas regarding how to handle algorithms like RCM that generate a reordering, but also be able to handle MC64 that generates both a reordering and scaling coefficients and simpler equilibration techniques that only generate scaling coefficients.