[SYCL][Matrix] Add joint matrix query for CUDA and HIP backends #12075

konradkusiak97 · 2023-12-05T10:33:46Z

This PR adds joint matrix query for CUDA and HIP backends as described in sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc

mmoadeli · 2023-12-05T10:59:27Z

sycl/include/sycl/ext/oneapi/matrix/static-query-use.hpp

+  else
+    return false;
+}
+


I'd use only one instance of ((sM == 32 && sN == 32 && sK == 8) || (sM == 16 && sN == 16 && sK == 16))) to be &&ed with ORed std::is_same_vs.

I've just tried it this way but the code now looks quite unreadable due to the one more OR in that case:
This would take shape of: the above conditions ORed with the extra case for double:

if ((((sM == 32 && sN == 32 && sK == 8) || (sM == 16 && sN == 16 && sK == 16)) && (std::is_same_v<Ta, half> && std::is_same_v<Tc, float>) || (std::is_same_v<Ta, int8_t> && std::is_same_v<Tc, int32_t>) || (std::is_same_v<Ta, bfloat16> && std::is_same_v<Tc, float>)) || ((sM == 16 && sN == 16 && sK == 4) && (std::is_same_v<Ta, double> && std::is_same_v<Tc, double>)))

btw, this is already after applying clang-format. I think for the sake of readability this should be left as is.

mmoadeli · 2023-12-05T11:02:00Z

sycl/include/sycl/ext/oneapi/matrix/static-query-use.hpp

+       ((sM == 32 && sN == 32 && sK == 8) ||
+        (sM == 16 && sN == 16 && sK == 16))) ||
+      (std::is_same_v<Ta, unsigned short> && std::is_same_v<Tc, float> &&
+       ((sM == 32 && sN == 32 && sK == 8) ||


unsigned short is not a supported input type. It seems bfloat16 is missing here.
bfloat16 is used in joint_matrix_hip_gfx90a.cpp test

mmoadeli · 2023-12-05T11:04:28Z

sycl/include/sycl/ext/oneapi/matrix/static-query-use.hpp

+  else
+    return false;
+}
+


You may return the statement without if / else.

mmoadeli · 2023-12-05T11:11:05Z

sycl/include/sycl/ext/oneapi/matrix/static-query-use.hpp

+        !std::is_same_v<Ta, void> && !std::is_same_v<Tb, void> &&
+        !std::is_same_v<Tc, void> && !std::is_same_v<Td, void> &&
+        std::is_same_v<Ta, Tb> && std::is_same_v<Tc, Td>)>::type> {
+


I'd replace std::enable_if<..>::type with std::enable_if_t<..>
I'd also try to avoid the below static_assert by bringing the required logic into the enable_if above.

Thanks for pointing that out! I switched to using std::enable_if_t. Is there a reason for avoiding the static_assert here? I think it gives a more informative error message, giving more context to the user as to why such a combination could be wrong.

mmoadeli · 2023-12-05T11:17:16Z

sycl/include/sycl/ext/oneapi/matrix/static-query-use.hpp

+      "Invalid types for AMD gfx90a, supported types are half, float, "
+      "int8_t, int32_t, double and bf16 (Note that unsigned short"
+      "should be used in the DPC++ code to implement bf16) ");
+


bfloat16 is used in DPC++ code for instance in joint_matrix_hip_gfx90a.cpp test.

Changed to bfloat16

mmoadeli · 2023-12-05T11:23:56Z

sycl/include/sycl/ext/oneapi/matrix/static-query-use.hpp

+
+template <typename Ta, typename Tc>
+constexpr bool is_combination_valid_amd_gfx90a(size_t sM, size_t sN,
+                                               size_t sK) {


Not sure why using sM, sN and sK to represent dimensions. I appreciate you followed them for consistency, though.

mmoadeli · 2023-12-05T11:28:34Z

sycl/include/sycl/ext/oneapi/matrix/static-query-use.hpp

+        !std::is_same_v<Ta, void> && !std::is_same_v<Tb, void> &&
+        !std::is_same_v<Tc, void> && !std::is_same_v<Td, void> &&
+        std::is_same_v<Ta, Tb> && std::is_same_v<Tc, Td> && sM != 0 &&
+        sN != 0 && sK != 0)>::type> {


another instance to potentially use std::enable_if_t and also improve it to have no need for below static_assert

dm-vodopyanov · 2023-12-05T13:13:48Z

sycl/source/detail/device_info.hpp

@@ -718,6 +722,8 @@ struct get_device_info_impl<
  get(const DeviceImplPtr &Dev) {
    using namespace ext::oneapi::experimental::matrix;
    using namespace ext::oneapi::experimental;
+    using oneapi_exp_arch = sycl::ext::oneapi::experimental::architecture;


Seems not used anywhere

Ah, this is actually used in line 814. The macro NVIDIA_AMD_ARCHES defined a few lines above needs it:

auto GetArchNum = [](const architecture &arch) { NVIDIA_AMD_ARCHES(CMP_NVIDIA_AMD_ARCH); ...

I removed that line after incorporating the newest changes

JackAKirk · 2023-12-05T14:42:45Z

sycl/test-e2e/Matrix/runtime_query_tensorcores.cpp

+// RUN: %{run} %t.out
+//
+// This tests the joint matrix runtime query for the cuda backend.
+// This test must be compiled with -Xsycl-target-backend --cuda-gpu-arch=sm_xx,


I don't think this statement is actually true. I think that if you compile with default sm_50 the test will pass, even if you run it on e.g. sm_80.

I removed the statement

dkhaldi · 2023-12-05T15:29:49Z

sycl/test-e2e/Matrix/runtime_query_tensorcores.cpp

please add nvidia to the name of the test

I see that other cuda tests use "_tensorcores" suffix as well. I think we should keep the name as-is. No need to add "nvidia".

sycl/test/check_device_code/cuda/matrix/compile-query.cpp

dm-vodopyanov · 2023-12-06T22:14:31Z

sycl/source/detail/device_info.hpp

+        throw sycl::exception(
+            make_error_code(errc::runtime),
+            "The current device architecture is not supported by "
+            "sycl_ext_oneapi_device_architecture.");


Please avoid this duplication. matrix_combinations query which is part of one extension, should not implement anything from separate extension, the extension should re-use another extension.

I removed the error message completely and left only throw;, since this part of the lambda will never be executed. The matching arch number will be always found for a given DeviceArch, otherwise the error would be thrown earlier while querying for the DeviceArch. Let me know if that looks plausible

JackAKirk · 2023-12-07T17:46:50Z

sycl/include/sycl/ext/oneapi/matrix/static-query-use.hpp

+
+template <typename Ta, typename Tc, typename Td>
+constexpr bool is_combination_valid_cuda_sm70(size_t sM, size_t sN, size_t sK) {
+  return (((std::is_same_v<Ta, half> && std::is_same_v<Tc, float> &&


nit: Think it would be better to just call are_types_valid_cuda_sm70 here instead of repeating the logic

Yep, that could definitely make use of are_types_valid. Changed it now

JackAKirk · 2024-01-05T17:09:20Z

sycl/test-e2e/Matrix/runtime_query_tensorcores.cpp

@@ -0,0 +1,118 @@
+// REQUIRES: cuda
+// RUN: %{build} -Xsycl-target-backend --cuda-gpu-arch=sm_70 -o %t.out


nit, this arch flag isn't necessary for this test, you can use the default which means it will work on all supported devices.

Suggested change

// RUN: %{build} -Xsycl-target-backend --cuda-gpu-arch=sm_70 -o %t.out

// RUN: %{build} -o %t.out

(note also see the below related suggested change)

JackAKirk · 2024-01-05T17:11:09Z

sycl/test-e2e/Matrix/runtime_query_tensorcores.cpp

+    std::move(sm_70_combinations.begin(), sm_70_combinations.end(),
+              std::back_inserter(expected_combinations));
+  }
+


Suggested change

else {

return 0;

}

JackAKirk

CUDA part LGTM.

JackAKirk · 2024-01-08T17:18:54Z

_ No description provided. _

Could you write a short description, which acts as a commit message.

konradkusiak97 · 2024-01-15T08:52:58Z

Pinging @intel/llvm-reviewers-runtime, is this good to go?

ldrumm

Parens can help to clarify grouping, but at this level they actually make things harder to read. Apart from that, things look sane

sycl/include/sycl/ext/oneapi/matrix/static-query-use.hpp

sycl/source/detail/device_info.hpp

sycl/test-e2e/Matrix/runtime_query_hip_gfx90a.cpp

sycl/test-e2e/Matrix/runtime_query_tensorcores.cpp

sycl/include/sycl/ext/oneapi/matrix/static-query-use.hpp

konradkusiak97 · 2024-02-06T16:21:04Z

Parens can help to clarify grouping, but at this level they actually make things harder to read. Apart from that, things look sane

Thanks for review, changes were applied.

ldrumm · 2024-02-06T17:11:26Z

@intel/llvm-reviewers-runtime can we get a review for this, please?

aelovikov-intel · 2024-02-06T17:20:33Z

sycl/source/detail/device_info.hpp

+          if (Item.second == arch)
+            return Item.first;
+        }
+        throw;


What are we throwing here? It's not immediately obvious in this wall of similar patterns.

I changed it to throw sycl::exception with the appropriate error message

aelovikov-intel · 2024-02-06T17:22:48Z

sycl/source/detail/device_info.hpp

+        std::move(sm_70_combinations.begin(), sm_70_combinations.end(),
+                  std::back_inserter(sm_80_combinations));
+        std::move(sm_72_combinations.begin(), sm_72_combinations.end(),
+                  std::back_inserter(sm_80_combinations));


If we were using C++20 I would have requested to rely on constexpr creation of vectors instead.

That's not what I meant, sorry for confusion. What I was thinking about is that maybe we can avoid std::move in runtime altogether in C++20/C++23, and even then I wasn't sure.

Do you know how would std::back_inserter of a constexpr vector would behave? I think I'd prefer the contexper to be dropped for now as it might be unclear for the average reader what happens here.

Yes, I've just realized this is not the way to go. We don't yet have constexpr std::vector in C++ but I removed std::move and used vec.insert() instead

I applied all the changes and all checks are passing, does the last solution with vec.insert() sound okay to you @aelovikov-intel ? And if so, could I get an approve on this please?

aelovikov-intel · 2024-02-06T17:25:46Z

sycl/test/check_device_code/cuda/matrix/matrix-nvptx-compile-query-test.cpp

@@ -0,0 +1,33 @@
+// REQUIRES: cuda


Can you update CODEOWNERS for the new cuda/matrix and hip/matrix directories?

I added intel/llvm-reviewers-cuda to be the owner of those directories

…message

aelovikov-intel · 2024-02-07T16:13:51Z

.github/CODEOWNERS

 sycl/test/check_device_code/cuda/ @intel/llvm-reviewers-cuda
+sycl/test/check_device_code/cuda/matrix @intel/llvm-reviewers-cuda


I think we can drop this, I wasn't aware its parent is already covered here.

aelovikov-intel · 2024-02-07T16:14:08Z

.github/CODEOWNERS

 sycl/test/check_device_code/cuda/ @intel/llvm-reviewers-cuda
+sycl/test/check_device_code/cuda/matrix @intel/llvm-reviewers-cuda
+sycl/test/check_device_code/hip/matrix @intel/llvm-reviewers-cuda


We should limit to a parent hip directory, probably.

aelovikov-intel · 2024-02-07T16:23:42Z

sycl/source/detail/device_info.hpp

+        std::move(sm_70_combinations.begin(), sm_70_combinations.end(),
+                  std::back_inserter(sm_80_combinations));
+        std::move(sm_72_combinations.begin(), sm_72_combinations.end(),
+                  std::back_inserter(sm_80_combinations));


That's not what I meant, sorry for confusion. What I was thinking about is that maybe we can avoid std::move in runtime altogether in C++20/C++23, and even then I wasn't sure.

Do you know how would std::back_inserter of a constexpr vector would behave? I think I'd prefer the contexper to be dropped for now as it might be unclear for the average reader what happens here.

konradkusiak97 · 2024-02-15T14:32:13Z

Friendly ping @intel/llvm-gatekeepers, this is ready to be merged now.

konradkusiak97 requested review from a team as code owners December 5, 2023 10:33

konradkusiak97 requested review from ldrumm and aelovikov-intel December 5, 2023 10:33

konradkusiak97 had a problem deploying to WindowsCILock December 5, 2023 10:53 — with GitHub Actions Failure

mmoadeli reviewed Dec 5, 2023

View reviewed changes

dm-vodopyanov reviewed Dec 5, 2023

View reviewed changes

JackAKirk reviewed Dec 5, 2023

View reviewed changes

dkhaldi reviewed Dec 5, 2023

View reviewed changes

YuriPlyakhin reviewed Dec 5, 2023

View reviewed changes

sycl/test/check_device_code/cuda/matrix/compile-query.cpp Outdated Show resolved Hide resolved

YuriPlyakhin reviewed Dec 5, 2023

View reviewed changes

sycl/test/check_device_code/cuda/matrix/compile-query.cpp Outdated Show resolved Hide resolved

konradkusiak97 had a problem deploying to WindowsCILock December 6, 2023 16:50 — with GitHub Actions Failure

YuriPlyakhin approved these changes Dec 6, 2023

View reviewed changes

konradkusiak97 temporarily deployed to WindowsCILock December 6, 2023 21:53 — with GitHub Actions Inactive

konradkusiak97 temporarily deployed to WindowsCILock December 6, 2023 22:19 — with GitHub Actions Inactive

dm-vodopyanov reviewed Dec 6, 2023

View reviewed changes

JackAKirk reviewed Dec 7, 2023

View reviewed changes

konradkusiak97 temporarily deployed to WindowsCILock December 7, 2023 23:52 — with GitHub Actions Inactive

konradkusiak97 temporarily deployed to WindowsCILock December 8, 2023 00:19 — with GitHub Actions Inactive

Konrad Kusiak and others added 10 commits December 8, 2023 12:23

Added runtime query

9416dc4

clang-format

7ab986d

added test for hip runtime query

9cfcb8b

added part of static query

6202345

added static query

2a539f8

final changes

ca74cb4

clean up

17ac818

clang-format

6160151

Applied requested changes

ab1264d

fixed build issue

05e55a5

konradkusiak97 temporarily deployed to WindowsCILock December 21, 2023 21:52 — with GitHub Actions Inactive

JackAKirk reviewed Jan 5, 2024

View reviewed changes

JackAKirk approved these changes Jan 5, 2024

View reviewed changes

nit changes

0bf4b6f

konradkusiak97 temporarily deployed to WindowsCILock January 8, 2024 17:40 — with GitHub Actions Inactive

konradkusiak97 temporarily deployed to WindowsCILock January 8, 2024 18:11 — with GitHub Actions Inactive

ldrumm requested changes Feb 6, 2024

View reviewed changes

Changed parensing

f2e1966

konradkusiak97 temporarily deployed to WindowsCILock February 6, 2024 13:40 — with GitHub Actions Inactive

konradkusiak97 temporarily deployed to WindowsCILock February 6, 2024 14:07 — with GitHub Actions Inactive

ldrumm approved these changes Feb 6, 2024

View reviewed changes

aelovikov-intel reviewed Feb 6, 2024

View reviewed changes

Changed vectors to constexpt, added codeowners and added throw error …

3abcc41

…message

konradkusiak97 requested a review from a team as a code owner February 7, 2024 14:05

konradkusiak97 had a problem deploying to WindowsCILock February 7, 2024 14:21 — with GitHub Actions Failure

aelovikov-intel reviewed Feb 7, 2024

View reviewed changes

Removed std move

685163a

konradkusiak97 temporarily deployed to WindowsCILock February 7, 2024 17:58 — with GitHub Actions Inactive

konradkusiak97 temporarily deployed to WindowsCILock February 7, 2024 20:31 — with GitHub Actions Inactive

Merge branch 'sycl' into addJointMatrixQueryHipCuda

c06ae6d

konradkusiak97 temporarily deployed to WindowsCILock February 8, 2024 10:53 — with GitHub Actions Inactive

konradkusiak97 temporarily deployed to WindowsCILock February 8, 2024 11:21 — with GitHub Actions Inactive

aelovikov-intel approved these changes Feb 12, 2024

View reviewed changes

martygrant merged commit 00eebe1 into intel:sycl Feb 15, 2024
11 checks passed

		@@ -0,0 +1,118 @@
		// REQUIRES: cuda
		// RUN: %{build} -Xsycl-target-backend --cuda-gpu-arch=sm_70 -o %t.out

	// RUN: %{build} -Xsycl-target-backend --cuda-gpu-arch=sm_70 -o %t.out
	// RUN: %{build} -o %t.out

+else {
+  return 0;
+}

		sycl/test/check_device_code/cuda/ @intel/llvm-reviewers-cuda
		sycl/test/check_device_code/cuda/matrix @intel/llvm-reviewers-cuda

[SYCL][Matrix] Add joint matrix query for CUDA and HIP backends #12075

[SYCL][Matrix] Add joint matrix query for CUDA and HIP backends #12075

Conversation

konradkusiak97 commented Dec 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmoadeli Dec 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmoadeli Dec 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmoadeli Dec 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JackAKirk Jan 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JackAKirk left a comment

Choose a reason for hiding this comment

JackAKirk commented Jan 8, 2024

konradkusiak97 commented Jan 15, 2024 • edited Loading

ldrumm left a comment

Choose a reason for hiding this comment

konradkusiak97 commented Feb 6, 2024

ldrumm commented Feb 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

konradkusiak97 Feb 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

konradkusiak97 commented Feb 15, 2024

konradkusiak97 commented Dec 5, 2023 •

edited

Loading

mmoadeli Dec 5, 2023 •

edited

Loading

mmoadeli Dec 5, 2023 •

edited

Loading

mmoadeli Dec 5, 2023 •

edited

Loading

JackAKirk Jan 5, 2024 •

edited

Loading

konradkusiak97 commented Jan 15, 2024 •

edited

Loading

konradkusiak97 Feb 7, 2024 •

edited

Loading