Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Matrix] Add joint matrix query for CUDA and HIP backends #12075

Merged
merged 22 commits into from
Feb 15, 2024

Conversation

konradkusiak97
Copy link
Contributor

@konradkusiak97 konradkusiak97 commented Dec 5, 2023

This PR adds joint matrix query for CUDA and HIP backends as described in sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc

else
return false;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use only one instance of ((sM == 32 && sN == 32 && sK == 8) || (sM == 16 && sN == 16 && sK == 16))) to be &&ed with ORed std::is_same_vs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just tried it this way but the code now looks quite unreadable due to the one more OR in that case:
This would take shape of: the above conditions ORed with the extra case for double:

  if ((((sM == 32 && sN == 32 && sK == 8) ||
        (sM == 16 && sN == 16 && sK == 16)) &&
           (std::is_same_v<Ta, half> && std::is_same_v<Tc, float>) ||
       (std::is_same_v<Ta, int8_t> && std::is_same_v<Tc, int32_t>) ||
       (std::is_same_v<Ta, bfloat16> && std::is_same_v<Tc, float>)) ||
      ((sM == 16 && sN == 16 && sK == 4) &&
       (std::is_same_v<Ta, double> && std::is_same_v<Tc, double>)))

btw, this is already after applying clang-format. I think for the sake of readability this should be left as is.

((sM == 32 && sN == 32 && sK == 8) ||
(sM == 16 && sN == 16 && sK == 16))) ||
(std::is_same_v<Ta, unsigned short> && std::is_same_v<Tc, float> &&
((sM == 32 && sN == 32 && sK == 8) ||
Copy link
Contributor

@mmoadeli mmoadeli Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsigned short is not a supported input type. It seems bfloat16 is missing here.
bfloat16 is used in joint_matrix_hip_gfx90a.cpp test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

else
return false;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may return the statement without if / else.

!std::is_same_v<Ta, void> && !std::is_same_v<Tb, void> &&
!std::is_same_v<Tc, void> && !std::is_same_v<Td, void> &&
std::is_same_v<Ta, Tb> && std::is_same_v<Tc, Td>)>::type> {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd replace std::enable_if<..>::type with std::enable_if_t<..>
I'd also try to avoid the below static_assert by bringing the required logic into the enable_if above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out! I switched to using std::enable_if_t. Is there a reason for avoiding the static_assert here? I think it gives a more informative error message, giving more context to the user as to why such a combination could be wrong.

"Invalid types for AMD gfx90a, supported types are half, float, "
"int8_t, int32_t, double and bf16 (Note that unsigned short"
"should be used in the DPC++ code to implement bf16) ");

Copy link
Contributor

@mmoadeli mmoadeli Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bfloat16 is used in DPC++ code for instance in joint_matrix_hip_gfx90a.cpp test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to bfloat16


template <typename Ta, typename Tc>
constexpr bool is_combination_valid_amd_gfx90a(size_t sM, size_t sN,
size_t sK) {
Copy link
Contributor

@mmoadeli mmoadeli Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why using sM, sN and sK to represent dimensions. I appreciate you followed them for consistency, though.

!std::is_same_v<Ta, void> && !std::is_same_v<Tb, void> &&
!std::is_same_v<Tc, void> && !std::is_same_v<Td, void> &&
std::is_same_v<Ta, Tb> && std::is_same_v<Tc, Td> && sM != 0 &&
sN != 0 && sK != 0)>::type> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another instance to potentially use std::enable_if_t and also improve it to have no need for below static_assert

@@ -718,6 +722,8 @@ struct get_device_info_impl<
get(const DeviceImplPtr &Dev) {
using namespace ext::oneapi::experimental::matrix;
using namespace ext::oneapi::experimental;
using oneapi_exp_arch = sycl::ext::oneapi::experimental::architecture;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems not used anywhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is actually used in line 814. The macro NVIDIA_AMD_ARCHES defined a few lines above needs it:

auto GetArchNum = [](const architecture &arch) {
        NVIDIA_AMD_ARCHES(CMP_NVIDIA_AMD_ARCH);
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed that line after incorporating the newest changes

// RUN: %{run} %t.out
//
// This tests the joint matrix runtime query for the cuda backend.
// This test must be compiled with -Xsycl-target-backend --cuda-gpu-arch=sm_xx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this statement is actually true. I think that if you compile with default sm_50 the test will pass, even if you run it on e.g. sm_80.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the statement

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add nvidia to the name of the test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that other cuda tests use "_tensorcores" suffix as well. I think we should keep the name as-is. No need to add "nvidia".

Comment on lines 816 to 819
throw sycl::exception(
make_error_code(errc::runtime),
"The current device architecture is not supported by "
"sycl_ext_oneapi_device_architecture.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid this duplication. matrix_combinations query which is part of one extension, should not implement anything from separate extension, the extension should re-use another extension.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the error message completely and left only throw;, since this part of the lambda will never be executed. The matching arch number will be always found for a given DeviceArch, otherwise the error would be thrown earlier while querying for the DeviceArch. Let me know if that looks plausible


template <typename Ta, typename Tc, typename Td>
constexpr bool is_combination_valid_cuda_sm70(size_t sM, size_t sN, size_t sK) {
return (((std::is_same_v<Ta, half> && std::is_same_v<Tc, float> &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Think it would be better to just call are_types_valid_cuda_sm70 here instead of repeating the logic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that could definitely make use of are_types_valid. Changed it now

@@ -0,0 +1,118 @@
// REQUIRES: cuda
// RUN: %{build} -Xsycl-target-backend --cuda-gpu-arch=sm_70 -o %t.out
Copy link
Contributor

@JackAKirk JackAKirk Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, this arch flag isn't necessary for this test, you can use the default which means it will work on all supported devices.

Suggested change
// RUN: %{build} -Xsycl-target-backend --cuda-gpu-arch=sm_70 -o %t.out
// RUN: %{build} -o %t.out

(note also see the below related suggested change)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

std::move(sm_70_combinations.begin(), sm_70_combinations.end(),
std::back_inserter(expected_combinations));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
else {
return 0;
}

Copy link
Contributor

@JackAKirk JackAKirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA part LGTM.

@JackAKirk
Copy link
Contributor

_ No description provided. _

Could you write a short description, which acts as a commit message.

@konradkusiak97
Copy link
Contributor Author

konradkusiak97 commented Jan 15, 2024

Pinging @intel/llvm-reviewers-runtime, is this good to go?

Copy link
Contributor

@ldrumm ldrumm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parens can help to clarify grouping, but at this level they actually make things harder to read. Apart from that, things look sane

sycl/include/sycl/ext/oneapi/matrix/static-query-use.hpp Outdated Show resolved Hide resolved
sycl/include/sycl/ext/oneapi/matrix/static-query-use.hpp Outdated Show resolved Hide resolved
sycl/include/sycl/ext/oneapi/matrix/static-query-use.hpp Outdated Show resolved Hide resolved
sycl/include/sycl/ext/oneapi/matrix/static-query-use.hpp Outdated Show resolved Hide resolved
sycl/include/sycl/ext/oneapi/matrix/static-query-use.hpp Outdated Show resolved Hide resolved
sycl/include/sycl/ext/oneapi/matrix/static-query-use.hpp Outdated Show resolved Hide resolved
sycl/source/detail/device_info.hpp Outdated Show resolved Hide resolved
sycl/test-e2e/Matrix/runtime_query_hip_gfx90a.cpp Outdated Show resolved Hide resolved
sycl/test-e2e/Matrix/runtime_query_tensorcores.cpp Outdated Show resolved Hide resolved
sycl/include/sycl/ext/oneapi/matrix/static-query-use.hpp Outdated Show resolved Hide resolved
@konradkusiak97
Copy link
Contributor Author

Parens can help to clarify grouping, but at this level they actually make things harder to read. Apart from that, things look sane

Thanks for review, changes were applied.

@ldrumm
Copy link
Contributor

ldrumm commented Feb 6, 2024

@intel/llvm-reviewers-runtime can we get a review for this, please?

if (Item.second == arch)
return Item.first;
}
throw;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we throwing here? It's not immediately obvious in this wall of similar patterns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to throw sycl::exception with the appropriate error message

Comment on lines 890 to 893
std::move(sm_70_combinations.begin(), sm_70_combinations.end(),
std::back_inserter(sm_80_combinations));
std::move(sm_72_combinations.begin(), sm_72_combinations.end(),
std::back_inserter(sm_80_combinations));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were using C++20 I would have requested to rely on constexpr creation of vectors instead.

Copy link
Contributor Author

@konradkusiak97 konradkusiak97 Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what I meant, sorry for confusion. What I was thinking about is that maybe we can avoid std::move in runtime altogether in C++20/C++23, and even then I wasn't sure.

Do you know how would std::back_inserter of a constexpr vector would behave? I think I'd prefer the contexper to be dropped for now as it might be unclear for the average reader what happens here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've just realized this is not the way to go. We don't yet have constexpr std::vector in C++ but I removed std::move and used vec.insert() instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied all the changes and all checks are passing, does the last solution with vec.insert() sound okay to you @aelovikov-intel ? And if so, could I get an approve on this please?

@@ -0,0 +1,33 @@
// REQUIRES: cuda
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update CODEOWNERS for the new cuda/matrix and hip/matrix directories?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added intel/llvm-reviewers-cuda to be the owner of those directories

sycl/test/check_device_code/cuda/ @intel/llvm-reviewers-cuda
sycl/test/check_device_code/cuda/matrix @intel/llvm-reviewers-cuda
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can drop this, I wasn't aware its parent is already covered here.

sycl/test/check_device_code/cuda/ @intel/llvm-reviewers-cuda
sycl/test/check_device_code/cuda/matrix @intel/llvm-reviewers-cuda
sycl/test/check_device_code/hip/matrix @intel/llvm-reviewers-cuda
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should limit to a parent hip directory, probably.

Comment on lines 890 to 893
std::move(sm_70_combinations.begin(), sm_70_combinations.end(),
std::back_inserter(sm_80_combinations));
std::move(sm_72_combinations.begin(), sm_72_combinations.end(),
std::back_inserter(sm_80_combinations));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what I meant, sorry for confusion. What I was thinking about is that maybe we can avoid std::move in runtime altogether in C++20/C++23, and even then I wasn't sure.

Do you know how would std::back_inserter of a constexpr vector would behave? I think I'd prefer the contexper to be dropped for now as it might be unclear for the average reader what happens here.

@konradkusiak97
Copy link
Contributor Author

Friendly ping @intel/llvm-gatekeepers, this is ready to be merged now.

@martygrant martygrant merged commit 00eebe1 into intel:sycl Feb 15, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants