add icpx support #1350

yhmtsai · 2023-06-14T14:55:19Z

This PR adds the icpx support.
add -fsycl when compiling any device function or queue operation and add the corresponding include path in all ginkgo.

upsj · 2023-06-14T15:33:02Z

I think the canonical way to enable the SYCL flags would be add_sycl_to_target, as it's supported by hipSYCL, ComputeCpp and hopefully also Intel's SYCL support

tcojean

Generally LGTM. Some suggestions inline.

tcojean · 2023-08-16T12:23:48Z

dpcpp/CMakeLists.txt

@@ -82,8 +82,10 @@ configure_file(preconditioner/jacobi_common.hpp.in preconditioner/jacobi_common.
 ginkgo_compile_features(ginkgo_dpcpp)
 target_compile_definitions(ginkgo_dpcpp PRIVATE GKO_COMPILING_DPCPP _ONEDPL_COMPILE_KERNEL=0)

+set(GINKGO_DPCPP_FLAGS "-fsycl")


Does this flag also work with the dpcpp compiler?

Also, we could try rewriting this this form:
https://github.com/oneapi-src/oneAPI-samples/pull/1721/files#diff-cbe919c8af7559493969dbc2c78bf849e49a594bd82fc018982f948266dbee24R1

Note: it requires a higher CMake version, also probably a high oneAPi version. I see IntelSYCLConfig provided in 2023.1. Before it was IntelDPCPP. See:
2023.0 instructions: https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2023-0/use-cmake-with-the-compiler.html
2023.1 instructions: https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2023-1/use-cmake-with-the-compiler.html

yes, dpcpp has the flag, too.
I went through IntelDpcpp.config. there's no add_sycl_to_target in this module.

CMakeLists.txt

sonarqubecloud · 2023-08-22T10:40:34Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
No Duplication information

The version of Java (11.0.3) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here

upsj

LGTM!

upsj · 2023-08-22T11:18:53Z

cmake/rename.cmake

+                # They are different
+                set(${deprecated}_copy ${${deprecated}})
+                unset(${deprecated} CACHE)
+                message(FATAL_ERROR "Both ${deprecated} and ${actual} were specified, please use ${actual} instead.  "


This error message would vanish after rerunning CMake, is that intended? Maybe people might miss it under the large amount of output, and just rerun CMake, silently losing their configuration option.

I currently do not unset the deprecated variable.
It will keep showing the warning or error when the deprecated is there

upsj · 2023-08-22T11:19:38Z

cmake/rename.cmake

+            # Only set `deprecated`, move it to `actual`.
+            message(WARNING "${deprecated} was deprecated, please use ${actual} instead.  "
+                "We copy ${${deprecated}} to ${actual} and unset ${deprecated}.")
+            set(${actual} ${${deprecated}} CACHE ${type} "")


This way we have no description for the variable, maybe we can find a way to keep it?

can pass it thought the doc_string now

upsj · 2023-08-22T11:20:49Z

cmake/sycl.cmake

+if(CMAKE_CXX_COMPILER MATCHES "dpcpp|icpx")
+    if(CMAKE_HOST_WIN32 AND CMAKE_VERSION VERSION_GREATER_EQUAL 3.25)
+        find_package(IntelSYCL QUIET)
+    elseif(CMAKE_VERSION VERSION_GREATER_EQUAL 3.20.5)


I think we should require 3.20 for SYCL if that is necessary to make IntelLLVM work correctly.

no, it is optional. if there's no config and cmake, simply adding -fsycl in the following code is enough currently.

upsj · 2023-08-22T11:21:16Z

cmake/sycl.cmake

+# IntelSYCL for dpcpp and icpx if the config is existed and cmake reaches the requirement
+if(CMAKE_CXX_COMPILER MATCHES "dpcpp|icpx")
+    if(CMAKE_HOST_WIN32 AND CMAKE_VERSION VERSION_GREATER_EQUAL 3.25)
+        find_package(IntelSYCL QUIET)


should this be optional?

Suggested change

find_package(IntelSYCL QUIET)

find_package(IntelSYCL QUIET REQUIRED)

yes it is optional. If there's no IntelSYCL, we have -fsycl for that.

tcojean

LGTM

Co-authored-by: Terry Cojean <terry.cojean@kit.edu> Co-authored-by: Tobias Ribizel <ribizel@kit.edu>

Co-authored-by: Tobias Ribizel <ribizel@kit.edu>

sonarqubecloud · 2023-10-13T22:15:35Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
No Duplication information

The version of Java (11.0.3) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here

codecov · 2023-10-13T23:29:21Z

Codecov Report

All modified lines are covered by tests ✅

see 4 files with indirect coverage changes

📢 Thoughts on this report? Let us know!.

Release 1.7.0 to master The Ginkgo team is proud to announce the new Ginkgo minor release 1.7.0. This release brings new features such as: - Complete GPU-resident sparse direct solvers feature set and interfaces, - Improved Cholesky factorization performance, - A new MC64 reordering, - Batched iterative solver support with the BiCGSTAB solver with batched Dense and ELL matrix types, - MPI support for the SYCL backend, - Improved ParILU(T)/ParIC(T) preconditioner convergence, and more! If you face an issue, please first check our [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues) and the [open issues list](https://github.com/ginkgo-project/ginkgo/issues) and if you do not find a solution, feel free to [open a new issue](https://github.com/ginkgo-project/ginkgo/issues/new/choose) or ask a question using the [github discussions](https://github.com/ginkgo-project/ginkgo/discussions). Supported systems and requirements: + For all platforms, CMake 3.16+ + C++14 compliant compiler + Linux and macOS + GCC: 5.5+ + clang: 3.9+ + Intel compiler: 2019+ + Apple Clang: 14.0 is tested. Earlier versions might also work. + NVHPC: 22.7+ + Cray Compiler: 14.0.1+ + CUDA module: CMake 3.18+, and CUDA 10.1+ or NVHPC 22.7+ + HIP module: ROCm 4.5+ + DPC++ module: Intel oneAPI 2022.1+ with oneMKL and oneDPL. Set the CXX compiler to `dpcpp` or `icpx`. + MPI: standard version 3.1+, ideally GPU Aware, for best performance + Windows + MinGW: GCC 5.5+ + Microsoft Visual Studio: VS 2019+ + CUDA module: CUDA 10.1+, Microsoft Visual Studio + OpenMP module: MinGW. ### Version support changes + CUDA 9.2 is no longer supported and 10.0 is untested [#1382](#1382) + Ginkgo now requires CMake version 3.16 (and 3.18 for CUDA) [#1368](#1368) ### Interface changes + `const` Factory parameters can no longer be modified through `with_*` functions, as this breaks const-correctness [#1336](#1336) [#1439](#1439) ### New Deprecations + The `device_reset` parameter of CUDA and HIP executors no longer has an effect, and its `allocation_mode` parameters have been deprecated in favor of the `Allocator` interface. [#1315](#1315) + The CMake parameter `GINKGO_BUILD_DPCPP` has been deprecated in favor of `GINKGO_BUILD_SYCL`. [#1350](#1350) + The `gko::reorder::Rcm` interface has been deprecated in favor of `gko::experimental::reorder::Rcm` based on `Permutation`. [#1418](#1418) + The Permutation class' `permute_mask` functionality. [#1415](#1415) + Multiple functions with typos (`set_complex_subpsace()`, range functions such as `conj_operaton` etc). [#1348](#1348) ### Summary of previous deprecations + `gko::lend()` is not necessary anymore. + The classes `RelativeResidualNorm` and `AbsoluteResidualNorm` are deprecated in favor of `ResidualNorm`. + The class `AmgxPgm` is deprecated in favor of `Pgm`. + Default constructors for the CSR `load_balance` and `automatical` strategies + The PolymorphicObject's move-semantic `copy_from` variant + The templated `SolverBase` class. + The class `MachineTopology` is deprecated in favor of `machine_topology`. + Logger constructors and create functions with the `executor` parameter. + The virtual, protected, Dense functions `compute_norm1_impl`, `add_scaled_impl`, etc. + Logger events for solvers and criterion without the additional `implicit_tau_sq` parameter. + The global `gko::solver::default_krylov_dim`, use instead `gko::solver::gmres_default_krylov_dim`. ### Added features + Adds a batch::BatchLinOp class that forms a base class for batched linear operators such as batched matrix formats, solver and preconditioners [#1379](#1379) + Adds a batch::MultiVector class that enables operations such as dot, norm, scale on batched vectors [#1371](#1371) + Adds a batch::Dense matrix format that stores batched dense matrices and provides gemv operations for these dense matrices. [#1413](#1413) + Adds a batch::Ell matrix format that stores batched Ell matrices and provides spmv operations for these batched Ell matrices. [#1416](#1416) [#1437](#1437) + Add a batch::Bicgstab solver (class, core, and reference kernels) that enables iterative solution of batched linear systems [#1438](#1438). + Add device kernels (CUDA, HIP, and DPCPP) for batch::Bicgstab solver. [#1443](#1443). + New MC64 reordering algorithm which optimizes the diagonal product or sum of a matrix by permuting the rows, and computes additional scaling factors for equilibriation [#1120](#1120) + New interface for (non-symmetric) permutation and scaled permutation of Dense and Csr matrices [#1415](#1415) + LU and Cholesky Factorizations can now be separated into their factors [#1432](#1432) + New symbolic LU factorization algorithm that is optimized for matrices with an almost-symmetric sparsity pattern [#1445](#1445) + Sorting kernels for SparsityCsr on all backends [#1343](#1343) + Allow passing pre-generated local solver as factory parameter for the distributed Schwarz preconditioner [#1426](#1426) + Add DPCPP kernels for Partition [#1034](#1034), and CSR's `check_diagonal_entries` and `add_scaled_identity` functionality [#1436](#1436) + Adds a helper function to create a partition based on either local sizes, or local ranges [#1227](#1227) + Add function to compute arithmetic mean of dense and distributed vectors [#1275](#1275) + Adds `icpx` compiler supports [#1350](#1350) + All backends can be built simultaneously [#1333](#1333) + Emits a CMake warning in downstream projects that use different compilers than the installed Ginkgo [#1372](#1372) + Reordering algorithms in sparse_blas benchmark [#1354](#1354) + Benchmarks gained an `-allocator` parameter to specify device allocators [#1385](#1385) + Benchmarks gained an `-input_matrix` parameter that initializes the input JSON based on the filename [#1387](#1387) + Benchmark inputs can now be reordered as a preprocessing step [#1408](#1408) ### Improvements + Significantly improve Cholesky factorization performance [#1366](#1366) + Improve parallel build performance [#1378](#1378) + Allow constrained parallel test execution using CTest resources [#1373](#1373) + Use arithmetic type more inside mixed precision ELL [#1414](#1414) + Most factory parameters of factory type no longer need to be constructed explicitly via `.on(exec)` [#1336](#1336) [#1439](#1439) + Improve ParILU(T)/ParIC(T) convergence by using more appropriate atomic operations [#1434](#1434) ### Fixes + Fix an over-allocation for OpenMP reductions [#1369](#1369) + Fix DPCPP's common-kernel reduction for empty input sizes [#1362](#1362) + Fix several typos in the API and documentation [#1348](#1348) + Fix inconsistent `Threads` between generations [#1388](#1388) + Fix benchmark median condition [#1398](#1398) + Fix HIP 5.6.0 compilation [#1411](#1411) + Fix missing destruction of rand_generator from cuda/hip [#1417](#1417) + Fix PAPI logger destruction order [#1419](#1419) + Fix TAU logger compilation [#1422](#1422) + Fix relative criterion to not iterate if the residual is already zero [#1079](#1079) + Fix memory_order invocations with C++20 changes [#1402](#1402) + Fix `check_diagonal_entries_exist` report correctly when only missing diagonal value in the last rows. [#1440](#1440) + Fix checking OpenMPI version in cross-compilation settings [#1446](#1446) + Fix false-positive deprecation warnings in Ginkgo, especially for the old Rcm (it doesn't emit deprecation warnings anymore as a result but is still considered deprecated) [#1444](#1444) ### Related PR: #1451

Release 1.7.0 to develop The Ginkgo team is proud to announce the new Ginkgo minor release 1.7.0. This release brings new features such as: - Complete GPU-resident sparse direct solvers feature set and interfaces, - Improved Cholesky factorization performance, - A new MC64 reordering, - Batched iterative solver support with the BiCGSTAB solver with batched Dense and ELL matrix types, - MPI support for the SYCL backend, - Improved ParILU(T)/ParIC(T) preconditioner convergence, and more! If you face an issue, please first check our [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues) and the [open issues list](https://github.com/ginkgo-project/ginkgo/issues) and if you do not find a solution, feel free to [open a new issue](https://github.com/ginkgo-project/ginkgo/issues/new/choose) or ask a question using the [github discussions](https://github.com/ginkgo-project/ginkgo/discussions). Supported systems and requirements: + For all platforms, CMake 3.16+ + C++14 compliant compiler + Linux and macOS + GCC: 5.5+ + clang: 3.9+ + Intel compiler: 2019+ + Apple Clang: 14.0 is tested. Earlier versions might also work. + NVHPC: 22.7+ + Cray Compiler: 14.0.1+ + CUDA module: CMake 3.18+, and CUDA 10.1+ or NVHPC 22.7+ + HIP module: ROCm 4.5+ + DPC++ module: Intel oneAPI 2022.1+ with oneMKL and oneDPL. Set the CXX compiler to `dpcpp` or `icpx`. + MPI: standard version 3.1+, ideally GPU Aware, for best performance + Windows + MinGW: GCC 5.5+ + Microsoft Visual Studio: VS 2019+ + CUDA module: CUDA 10.1+, Microsoft Visual Studio + OpenMP module: MinGW. ### Version support changes + CUDA 9.2 is no longer supported and 10.0 is untested [#1382](#1382) + Ginkgo now requires CMake version 3.16 (and 3.18 for CUDA) [#1368](#1368) ### Interface changes + `const` Factory parameters can no longer be modified through `with_*` functions, as this breaks const-correctness [#1336](#1336) [#1439](#1439) ### New Deprecations + The `device_reset` parameter of CUDA and HIP executors no longer has an effect, and its `allocation_mode` parameters have been deprecated in favor of the `Allocator` interface. [#1315](#1315) + The CMake parameter `GINKGO_BUILD_DPCPP` has been deprecated in favor of `GINKGO_BUILD_SYCL`. [#1350](#1350) + The `gko::reorder::Rcm` interface has been deprecated in favor of `gko::experimental::reorder::Rcm` based on `Permutation`. [#1418](#1418) + The Permutation class' `permute_mask` functionality. [#1415](#1415) + Multiple functions with typos (`set_complex_subpsace()`, range functions such as `conj_operaton` etc). [#1348](#1348) ### Summary of previous deprecations + `gko::lend()` is not necessary anymore. + The classes `RelativeResidualNorm` and `AbsoluteResidualNorm` are deprecated in favor of `ResidualNorm`. + The class `AmgxPgm` is deprecated in favor of `Pgm`. + Default constructors for the CSR `load_balance` and `automatical` strategies + The PolymorphicObject's move-semantic `copy_from` variant + The templated `SolverBase` class. + The class `MachineTopology` is deprecated in favor of `machine_topology`. + Logger constructors and create functions with the `executor` parameter. + The virtual, protected, Dense functions `compute_norm1_impl`, `add_scaled_impl`, etc. + Logger events for solvers and criterion without the additional `implicit_tau_sq` parameter. + The global `gko::solver::default_krylov_dim`, use instead `gko::solver::gmres_default_krylov_dim`. ### Added features + Adds a batch::BatchLinOp class that forms a base class for batched linear operators such as batched matrix formats, solver and preconditioners [#1379](#1379) + Adds a batch::MultiVector class that enables operations such as dot, norm, scale on batched vectors [#1371](#1371) + Adds a batch::Dense matrix format that stores batched dense matrices and provides gemv operations for these dense matrices. [#1413](#1413) + Adds a batch::Ell matrix format that stores batched Ell matrices and provides spmv operations for these batched Ell matrices. [#1416](#1416) [#1437](#1437) + Add a batch::Bicgstab solver (class, core, and reference kernels) that enables iterative solution of batched linear systems [#1438](#1438). + Add device kernels (CUDA, HIP, and DPCPP) for batch::Bicgstab solver. [#1443](#1443). + New MC64 reordering algorithm which optimizes the diagonal product or sum of a matrix by permuting the rows, and computes additional scaling factors for equilibriation [#1120](#1120) + New interface for (non-symmetric) permutation and scaled permutation of Dense and Csr matrices [#1415](#1415) + LU and Cholesky Factorizations can now be separated into their factors [#1432](#1432) + New symbolic LU factorization algorithm that is optimized for matrices with an almost-symmetric sparsity pattern [#1445](#1445) + Sorting kernels for SparsityCsr on all backends [#1343](#1343) + Allow passing pre-generated local solver as factory parameter for the distributed Schwarz preconditioner [#1426](#1426) + Add DPCPP kernels for Partition [#1034](#1034), and CSR's `check_diagonal_entries` and `add_scaled_identity` functionality [#1436](#1436) + Adds a helper function to create a partition based on either local sizes, or local ranges [#1227](#1227) + Add function to compute arithmetic mean of dense and distributed vectors [#1275](#1275) + Adds `icpx` compiler supports [#1350](#1350) + All backends can be built simultaneously [#1333](#1333) + Emits a CMake warning in downstream projects that use different compilers than the installed Ginkgo [#1372](#1372) + Reordering algorithms in sparse_blas benchmark [#1354](#1354) + Benchmarks gained an `-allocator` parameter to specify device allocators [#1385](#1385) + Benchmarks gained an `-input_matrix` parameter that initializes the input JSON based on the filename [#1387](#1387) + Benchmark inputs can now be reordered as a preprocessing step [#1408](#1408) ### Improvements + Significantly improve Cholesky factorization performance [#1366](#1366) + Improve parallel build performance [#1378](#1378) + Allow constrained parallel test execution using CTest resources [#1373](#1373) + Use arithmetic type more inside mixed precision ELL [#1414](#1414) + Most factory parameters of factory type no longer need to be constructed explicitly via `.on(exec)` [#1336](#1336) [#1439](#1439) + Improve ParILU(T)/ParIC(T) convergence by using more appropriate atomic operations [#1434](#1434) ### Fixes + Fix an over-allocation for OpenMP reductions [#1369](#1369) + Fix DPCPP's common-kernel reduction for empty input sizes [#1362](#1362) + Fix several typos in the API and documentation [#1348](#1348) + Fix inconsistent `Threads` between generations [#1388](#1388) + Fix benchmark median condition [#1398](#1398) + Fix HIP 5.6.0 compilation [#1411](#1411) + Fix missing destruction of rand_generator from cuda/hip [#1417](#1417) + Fix PAPI logger destruction order [#1419](#1419) + Fix TAU logger compilation [#1422](#1422) + Fix relative criterion to not iterate if the residual is already zero [#1079](#1079) + Fix memory_order invocations with C++20 changes [#1402](#1402) + Fix `check_diagonal_entries_exist` report correctly when only missing diagonal value in the last rows. [#1440](#1440) + Fix checking OpenMPI version in cross-compilation settings [#1446](#1446) + Fix false-positive deprecation warnings in Ginkgo, especially for the old Rcm (it doesn't emit deprecation warnings anymore as a result but is still considered deprecated) [#1444](#1444) ### Related PR: #1454

yhmtsai added the 1:ST:ready-for-review This PR is ready for review label Jun 21, 2023

yhmtsai self-assigned this Jun 21, 2023

yhmtsai requested a review from a team July 27, 2023 09:18

tcojean reviewed Aug 16, 2023

View reviewed changes

yhmtsai force-pushed the icpx_compilation branch 7 times, most recently from 0fd9584 to 0da88a1 Compare August 21, 2023 07:41

yhmtsai requested a review from a team August 21, 2023 07:44

yhmtsai force-pushed the icpx_compilation branch from 93a817e to 9b4779a Compare August 21, 2023 20:56

upsj approved these changes Aug 22, 2023

View reviewed changes

yhmtsai mentioned this pull request Aug 22, 2023

Depercated dpcpp by sycl #1397

Open

4 tasks

thoasm self-requested a review September 18, 2023 12:33

tcojean approved these changes Oct 12, 2023

View reviewed changes

yhmtsai force-pushed the icpx_compilation branch 4 times, most recently from d59a8f6 to 26d266c Compare October 13, 2023 11:32

yhmtsai removed the 1:ST:ready-for-review This PR is ready for review label Oct 13, 2023

yhmtsai added the 1:ST:ready-to-merge This PR is ready to merge. label Oct 13, 2023

yhmtsai and others added 5 commits October 13, 2023 14:52

add icpx support

164617e

add gko_add_sycl_to_target

c00d20e

rename GINKGO_BUILD_DPCPP to GINKGO_BUILD_SYCL

bde2a8a

Co-authored-by: Terry Cojean <terry.cojean@kit.edu> Co-authored-by: Tobias Ribizel <ribizel@kit.edu>

do not delete deprecated var from CMake, keep doc

c7bfeb4

Co-authored-by: Tobias Ribizel <ribizel@kit.edu>

adapt MKL and oneDPL env

acf7a82

yhmtsai force-pushed the icpx_compilation branch from 26d266c to acf7a82 Compare October 13, 2023 12:52

yhmtsai merged commit 2f3720f into develop Oct 14, 2023

yhmtsai deleted the icpx_compilation branch October 14, 2023 08:55

tcojean mentioned this pull request Nov 6, 2023

Release 1.7.0 to master #1451

Merged

yhmtsai mentioned this pull request Nov 27, 2023

Remove old variable in CMake gko_rename_cache #1471

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add icpx support #1350

add icpx support #1350

yhmtsai commented Jun 14, 2023

upsj commented Jun 14, 2023 •

edited

Loading

tcojean left a comment

tcojean Aug 16, 2023

tcojean Aug 16, 2023

yhmtsai Aug 17, 2023

sonarqubecloud bot commented Aug 22, 2023

upsj left a comment

upsj Aug 22, 2023

yhmtsai Oct 13, 2023

upsj Aug 22, 2023

yhmtsai Oct 13, 2023

upsj Aug 22, 2023

yhmtsai Oct 13, 2023

upsj Aug 22, 2023

yhmtsai Oct 13, 2023

tcojean left a comment

sonarqubecloud bot commented Oct 13, 2023

codecov bot commented Oct 13, 2023

	find_package(IntelSYCL QUIET)
	find_package(IntelSYCL QUIET REQUIRED)

add icpx support #1350

add icpx support #1350

Conversation

yhmtsai commented Jun 14, 2023

upsj commented Jun 14, 2023 • edited Loading

tcojean left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonarqubecloud bot commented Aug 22, 2023

upsj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tcojean left a comment

Choose a reason for hiding this comment

sonarqubecloud bot commented Oct 13, 2023

codecov bot commented Oct 13, 2023

Codecov Report

upsj commented Jun 14, 2023 •

edited

Loading