L1 norm implementation #900

greole · 2021-10-06T18:09:43Z

This pull request implements an l1 norm computation for dense matrices.

greole · 2021-10-06T18:15:47Z

format!

MarcelKoch

Just some comments, not a full review yet.

cuda/matrix/dense_kernels.cu

reference/matrix/dense_kernels.cpp

reference/test/matrix/dense_kernels.cpp

omp/test/matrix/dense_kernels.cpp

hip/test/matrix/dense_kernels.hip.cpp

upsj · 2021-10-13T08:37:05Z

you can use #833 now to significantly simplify the kernel implementation :)

pratikvn

LGTM! Very minor nits about empty lines.

pratikvn · 2021-10-22T19:00:31Z

common/cuda_hip/matrix/dense_kernels.hpp.inc

@@ -142,7 +142,6 @@ __global__ __launch_bounds__(block_size) void compute_partial_norm2(
        [](const norm_type& x, const norm_type& y) { return x + y; });
 }

-


I think we always leave 2 empty lines between functions.

pratikvn · 2021-10-22T19:03:28Z

dpcpp/matrix/dense_kernels.dp.cpp

@@ -83,7 +83,6 @@ constexpr int default_block_size = 256;

 namespace kernel {

-


Again, i think this empty line should not be removed.

pratikvn · 2021-10-22T19:04:12Z

hip/matrix/dense_kernels.hip.cpp

@@ -119,7 +119,6 @@ void apply(std::shared_ptr<const HipExecutor> exec,

 GKO_INSTANTIATE_FOR_EACH_VALUE_TYPE(GKO_DECLARE_DENSE_APPLY_KERNEL);

-


hip/test/matrix/dense_kernels.hip.cpp

omp/test/matrix/dense_kernels.cpp

reference/matrix/dense_kernels.cpp

reference/test/matrix/dense_kernels.cpp

upsj

LGTM! Since all of these device executors tests are identical, you could also move them to test/matrix/dense_kernels.cpp

greole · 2021-10-25T08:05:30Z

LGTM! Since all of these device executors tests are identical, you could also move them to test/matrix/dense_kernels.cpp

Ok with 8c63099 I moved the tests to test/matrix/dense_kernels.cpp

pratikvn · 2021-10-25T08:11:56Z

@greole, if possible can you please combine the one line change commits into one single commit ?

greole · 2021-10-25T08:13:21Z

@greole, if possible can you please combine the one line change commits into one single commit ?

Yes I'll rebase them into a single commit.

codecov · 2021-10-25T12:37:53Z

Codecov Report

Merging #900 (8c63099) into develop (8123284) will decrease coverage by 0.06%.
The diff coverage is 97.05%.

❗ Current head 8c63099 differs from pull request most recent head f27ad36. Consider uploading reports for the commit f27ad36 to get more accurate results

@@             Coverage Diff             @@
##           develop     #900      +/-   ##
===========================================
- Coverage    94.80%   94.73%   -0.07%     
===========================================
  Files          436      434       -2     
  Lines        36007    35742     -265     
===========================================
- Hits         34136    33860     -276     
- Misses        1871     1882      +11

Impacted Files	Coverage Δ
core/device_hooks/common_kernels.inc.cpp	`0.00% <0.00%> (ø)`
omp/matrix/dense_kernels.cpp	`97.41% <ø> (ø)`
common/unified/matrix/dense_kernels.cpp	`100.00% <100.00%> (ø)`
core/matrix/dense.cpp	`98.69% <100.00%> (+0.01%)`	⬆️
include/ginkgo/core/matrix/dense.hpp	`95.58% <100.00%> (+0.09%)`	⬆️
reference/matrix/dense_kernels.cpp	`100.00% <100.00%> (ø)`
reference/test/matrix/dense_kernels.cpp	`99.81% <100.00%> (+<0.01%)`	⬆️
test/matrix/dense_kernels.cpp	`99.47% <100.00%> (+<0.01%)`	⬆️
omp/matrix/fbcsr_kernels.cpp	`0.00% <0.00%> (-93.07%)`	⬇️
reference/components/format_conversion.hpp	`81.81% <0.00%> (-18.19%)`	⬇️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8123284...f27ad36. Read the comment docs.

Co-authored-by: greole <greole@users.noreply.github.com>

Co-authored-by: Marcel Koch <marcel.koch@kit.edu>

Co-authored-by: Pratik Nayak <pratikvn@protonmail.com>

sonarqubecloud · 2021-11-03T12:37:34Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
3 Code Smells

79.2% Coverage
0.0% Duplication

Advertise release 1.5.0 and last changes + Add changelog, + Update third party libraries + A small fix to a CMake file See PR: #1195 The Ginkgo team is proud to announce the new Ginkgo minor release 1.5.0. This release brings many important new features such as: - MPI-based multi-node support for all matrix formats and most solvers; - full DPC++/SYCL support, - functionality and interface for GPU-resident sparse direct solvers, - an interface for wrapping solvers with scaling and reordering applied, - a new algebraic Multigrid solver/preconditioner, - improved mixed-precision support, - support for device matrix assembly, and much more. If you face an issue, please first check our [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues) and the [open issues list](https://github.com/ginkgo-project/ginkgo/issues) and if you do not find a solution, feel free to [open a new issue](https://github.com/ginkgo-project/ginkgo/issues/new/choose) or ask a question using the [github discussions](https://github.com/ginkgo-project/ginkgo/discussions). Supported systems and requirements: + For all platforms, CMake 3.13+ + C++14 compliant compiler + Linux and macOS + GCC: 5.5+ + clang: 3.9+ + Intel compiler: 2018+ + Apple LLVM: 8.0+ + NVHPC: 22.7+ + Cray Compiler: 14.0.1+ + CUDA module: CUDA 9.2+ or NVHPC 22.7+ + HIP module: ROCm 4.0+ + DPC++ module: Intel OneAPI 2021.3 with oneMKL and oneDPL. Set the CXX compiler to `dpcpp`. + Windows + MinGW and Cygwin: GCC 5.5+ + Microsoft Visual Studio: VS 2019 + CUDA module: CUDA 9.2+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. Algorithm and important feature additions: + Add MPI-based multi-node for all matrix formats and solvers (except GMRES and IDR). ([#676](#676), [#908](#908), [#909](#909), [#932](#932), [#951](#951), [#961](#961), [#971](#971), [#976](#976), [#985](#985), [#1007](#1007), [#1030](#1030), [#1054](#1054), [#1100](#1100), [#1148](#1148)) + Porting the remaining algorithms (preconditioners like ISAI, Jacobi, Multigrid, ParILU(T) and ParIC(T)) to DPC++/SYCL, update to SYCL 2020, and improve support and performance ([#896](#896), [#924](#924), [#928](#928), [#929](#929), [#933](#933), [#943](#943), [#960](#960), [#1057](#1057), [#1110](#1110), [#1142](#1142)) + Add a Sparse Direct interface supporting GPU-resident numerical LU factorization, symbolic Cholesky factorization, improved triangular solvers, and more ([#957](#957), [#1058](#1058), [#1072](#1072), [#1082](#1082)) + Add a ScaleReordered interface that can wrap solvers and automatically apply reorderings and scalings ([#1059](#1059)) + Add a Multigrid solver and improve the aggregation based PGM coarsening scheme ([#542](#542), [#913](#913), [#980](#980), [#982](#982), [#986](#986)) + Add infrastructure for unified, lambda-based, backend agnostic, kernels and utilize it for some simple kernels ([#833](#833), [#910](#910), [#926](#926)) + Merge different CUDA, HIP, DPC++ and OpenMP tests under a common interface ([#904](#904), [#973](#973), [#1044](#1044), [#1117](#1117)) + Add a device_matrix_data type for device-side matrix assembly ([#886](#886), [#963](#963), [#965](#965)) + Add support for mixed real/complex BLAS operations ([#864](#864)) + Add a FFT LinOp for all but DPC++/SYCL ([#701](#701)) + Add FBCSR support for NVIDIA and AMD GPUs and CPUs with OpenMP ([#775](#775)) + Add CSR scaling ([#848](#848)) + Add array::const_view and equivalent to create constant matrices from non-const data ([#890](#890)) + Add a RowGatherer LinOp supporting mixed precision to gather dense matrix rows ([#901](#901)) + Add mixed precision SparsityCsr SpMV support ([#970](#970)) + Allow creating CSR submatrix including from (possibly discontinuous) index sets ([#885](#885), [#964](#964)) + Add a scaled identity addition (M <- aI + bM) feature interface and impls for Csr and Dense ([#942](#942)) Deprecations and important changes: + Deprecate AmgxPgm in favor of the new Pgm name. ([#1149](#1149)). + Deprecate specialized residual norm classes in favor of a common `ResidualNorm` class ([#1101](#1101)) + Deprecate CamelCase non-polymorphic types in favor of snake_case versions (like array, machine_topology, uninitialized_array, index_set) ([#1031](#1031), [#1052](#1052)) + Bug fix: restrict gko::share to rvalue references (*possible interface break*) ([#1020](#1020)) + Bug fix: when using cuSPARSE's triangular solvers, specifying the factory parameter `num_rhs` is now required when solving for more than one right-hand side, otherwise an exception is thrown ([#1184](#1184)). + Drop official support for old CUDA < 9.2 ([#887](#887)) Improved performance additions: + Reuse tmp storage in reductions in solvers and add a mutable workspace to all solvers ([#1013](#1013), [#1028](#1028)) + Add HIP unsafe atomic option for AMD ([#1091](#1091)) + Prefer vendor implementations for Dense dot, conj_dot and norm2 when available ([#967](#967)). + Tuned OpenMP SellP, COO, and ELL SpMV kernels for a small number of RHS ([#809](#809)) Fixes: + Fix various compilation warnings ([#1076](#1076), [#1183](#1183), [#1189](#1189)) + Fix issues with hwloc-related tests ([#1074](#1074)) + Fix include headers for GCC 12 ([#1071](#1071)) + Fix for simple-solver-logging example ([#1066](#1066)) + Fix for potential memory leak in Logger ([#1056](#1056)) + Fix logging of mixin classes ([#1037](#1037)) + Improve value semantics for LinOp types, like moved-from state in cross-executor copy/clones ([#753](#753)) + Fix some matrix SpMV and conversion corner cases ([#905](#905), [#978](#978)) + Fix uninitialized data ([#958](#958)) + Fix CUDA version requirement for cusparseSpSM ([#953](#953)) + Fix several issues within bash-script ([#1016](#1016)) + Fixes for `NVHPC` compiler support ([#1194](#1194)) Other additions: + Simplify and properly name GMRES kernels ([#861](#861)) + Improve pkg-config support for non-CMake libraries ([#923](#923), [#1109](#1109)) + Improve gdb pretty printer ([#987](#987), [#1114](#1114)) + Add a logger highlighting inefficient allocation and copy patterns ([#1035](#1035)) + Improved and optimized test random matrix generation ([#954](#954), [#1032](#1032)) + Better CSR strategy defaults ([#969](#969)) + Add `move_from` to `PolymorphicObject` ([#997](#997)) + Remove unnecessary device_guard usage ([#956](#956)) + Improvements to the generic accessor for mixed-precision ([#727](#727)) + Add a naive lower triangular solver implementation for CUDA ([#764](#764)) + Add support for int64 indices from CUDA 11 onward with SpMV and SpGEMM ([#897](#897)) + Add a L1 norm implementation ([#900](#900)) + Add reduce_add for arrays ([#831](#831)) + Add utility to simplify Dense View creation from an existing Dense vector ([#1136](#1136)). + Add a custom transpose implementation for Fbcsr and Csr transpose for unsupported vendor types ([#1123](#1123)) + Make IDR random initilization deterministic ([#1116](#1116)) + Move the algorithm choice for triangular solvers from Csr::strategy_type to a factory parameter ([#1088](#1088)) + Update CUDA archCoresPerSM ([#1175](#1116)) + Add kernels for Csr sparsity pattern lookup ([#994](#994)) + Differentiate between structural and numerical zeros in Ell/Sellp ([#1027](#1027)) + Add a binary IO format for matrix data ([#984](#984)) + Add a tuple zip_iterator implementation ([#966](#966)) + Simplify kernel stubs and declarations ([#888](#888)) + Simplify GKO_REGISTER_OPERATION with lambdas ([#859](#859)) + Simplify copy to device in tests and examples ([#863](#863)) + More verbose output to array assertions ([#858](#858)) + Allow parallel compilation for Jacobi kernels ([#871](#871)) + Change clang-format pointer alignment to left ([#872](#872)) + Various improvements and fixes to the benchmarking framework ([#750](#750), [#759](#759), [#870](#870), [#911](#911), [#1033](#1033), [#1137](#1137)) + Various documentation improvements ([#892](#892), [#921](#921), [#950](#950), [#977](#977), [#1021](#1021), [#1068](#1068), [#1069](#1069), [#1080](#1080), [#1081](#1081), [#1108](#1108), [#1153](#1153), [#1154](#1154)) + Various CI improvements ([#868](#868), [#874](#874), [#884](#884), [#889](#889), [#899](#899), [#903](#903), [#922](#922), [#925](#925), [#930](#930), [#936](#936), [#937](#937), [#958](#958), [#882](#882), [#1011](#1011), [#1015](#1015), [#989](#989), [#1039](#1039), [#1042](#1042), [#1067](#1067), [#1073](#1073), [#1075](#1075), [#1083](#1083), [#1084](#1084), [#1085](#1085), [#1139](#1139), [#1178](#1178), [#1187](#1187))

greole added is:new-feature A request or implementation of a feature that does not exist yet. mod:all This touches all Ginkgo modules. 1:ST:need-feedback The PR is somewhat ready but feedback on a blocking topic is required before a proper review. labels Oct 6, 2021

ginkgo-bot added reg:testing This is related to testing. type:matrix-format This is related to the Matrix formats labels Oct 6, 2021

MarcelKoch reviewed Oct 7, 2021

View reviewed changes

greole force-pushed the l1_norm branch 2 times, most recently from 6897aa3 to bc68533 Compare October 21, 2021 19:22

yhmtsai mentioned this pull request Oct 22, 2021

Feature requests #639

Open

9 tasks

pratikvn approved these changes Oct 22, 2021

View reviewed changes

upsj approved these changes Oct 22, 2021

View reviewed changes

upsj assigned greole Oct 22, 2021

greole force-pushed the l1_norm branch from 8c63099 to e70f70a Compare October 25, 2021 08:44

greole force-pushed the l1_norm branch from e70f70a to 0090235 Compare October 30, 2021 08:57

greole added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:need-feedback The PR is somewhat ready but feedback on a blocking topic is required before a proper review. labels Nov 2, 2021

greole and others added 8 commits November 3, 2021 09:12

Add COMPUTE_NORM1 declare macro

ebbdcd6

Add initial compute_norm1 implementation

7ab0be3

fix common kernels macro

a74665e

Add l1 norm header definition

61f478d

Add norm1 unit tests

6adf333

fix hip/cuda norm1 test assertion

43b594f

add compute_norm1_impl virtual member

83ec6f1

add omp l1 kernel

015467c

greole and others added 7 commits November 3, 2021 09:12

add hip l1 norm kernel

cb3e0a3

Implement dpcpp norm1

133c601

Format files

995e4df

Co-authored-by: greole <greole@users.noreply.github.com>

Apply suggestions from code review

ebda6d6

Co-authored-by: Marcel Koch <marcel.koch@kit.edu>

Base l1 norm on #833

da7d75e

Fix empty lines

662976a

Co-authored-by: Pratik Nayak <pratikvn@protonmail.com>

Move l1_norm tests to test/matrix/dense_kernels

f27ad36

greole force-pushed the l1_norm branch from 0090235 to f27ad36 Compare November 3, 2021 08:13

greole merged commit e972cf9 into develop Nov 3, 2021

greole deleted the l1_norm branch November 3, 2021 12:12

greole mentioned this pull request Nov 10, 2021

Feature request: Adding an L1 norm calculation #798

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

L1 norm implementation #900

L1 norm implementation #900

greole commented Oct 6, 2021

greole commented Oct 6, 2021

MarcelKoch left a comment

upsj commented Oct 13, 2021

pratikvn left a comment

pratikvn Oct 22, 2021

pratikvn Oct 22, 2021

pratikvn Oct 22, 2021

upsj left a comment

greole commented Oct 25, 2021

pratikvn commented Oct 25, 2021

greole commented Oct 25, 2021

codecov bot commented Oct 25, 2021 •

edited

Loading

sonarqubecloud bot commented Nov 3, 2021

		@@ -142,7 +142,6 @@ __global__ __launch_bounds__(block_size) void compute_partial_norm2(
		[](const norm_type& x, const norm_type& y) { return x + y; });
		}

		@@ -83,7 +83,6 @@ constexpr int default_block_size = 256;

		namespace kernel {

		@@ -119,7 +119,6 @@ void apply(std::shared_ptr<const HipExecutor> exec,

		GKO_INSTANTIATE_FOR_EACH_VALUE_TYPE(GKO_DECLARE_DENSE_APPLY_KERNEL);

L1 norm implementation #900

L1 norm implementation #900

Conversation

greole commented Oct 6, 2021

greole commented Oct 6, 2021

MarcelKoch left a comment

Choose a reason for hiding this comment

upsj commented Oct 13, 2021

pratikvn left a comment

Choose a reason for hiding this comment

pratikvn Oct 22, 2021

Choose a reason for hiding this comment

pratikvn Oct 22, 2021

Choose a reason for hiding this comment

pratikvn Oct 22, 2021

Choose a reason for hiding this comment

upsj left a comment

Choose a reason for hiding this comment

greole commented Oct 25, 2021

pratikvn commented Oct 25, 2021

greole commented Oct 25, 2021

codecov bot commented Oct 25, 2021 • edited Loading

Codecov Report

sonarqubecloud bot commented Nov 3, 2021

codecov bot commented Oct 25, 2021 •

edited

Loading