-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA early exit on empty grid #1729
Changes from 4 commits
aafed48
923fd89
140f6d2
db5dd93
670777e
f9750b7
30ef04b
5c9e6e5
be5679a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,6 +9,7 @@ | |
*/ | ||
#pragma once | ||
|
||
#include <cassert> | ||
#include <memory> | ||
#include <sstream> | ||
#include <stdexcept> | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,6 +12,7 @@ | |
#include <gtest/gtest.h> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would prefer having this test as a regression test, running for all available backends. You could even just put a new variant of the copy stencil test into There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. …or add a separate one as you also check the scan. I don’t care much, but enabling it for all backends makes sense for sure. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I made a separate regression test (and removed the unit test). |
||
|
||
#include <gridtools/fn/column_stage.hpp> | ||
#include <gridtools/fn/unstructured.hpp> | ||
#include <gridtools/sid/composite.hpp> | ||
#include <gridtools/sid/synthetic.hpp> | ||
|
||
|
@@ -207,5 +208,64 @@ namespace gridtools::fn::backend { | |
bool success = on_device::exec(global_tmp_check_fun(), ptr_holder, strides); | ||
EXPECT_TRUE(success); | ||
} | ||
|
||
struct empty_stencil { | ||
GT_FUNCTION constexpr auto operator()() const { | ||
return []() { return 0.0f; }; | ||
} | ||
}; | ||
|
||
TEST(backend_gpu, empty_domain_stencil) { | ||
constexpr size_t nvertices = 0; | ||
constexpr size_t nlevels = 0; | ||
|
||
using block_sizes_t = meta::list<meta::list<unstructured::dim::horizontal, int_t<32>>, | ||
meta::list<unstructured::dim::vertical, int_t<1>>>; | ||
|
||
auto out = cuda_util::cuda_malloc<int>(nvertices * nlevels); | ||
auto as_synthetic = [](int *x) { | ||
return sid::synthetic() | ||
.set<property::origin>(sid::host_device::simple_ptr_holder(x)) | ||
.set<property::strides>( | ||
hymap::keys<unstructured::dim::horizontal, unstructured::dim::vertical>::make_values(5_c, 1_c)); | ||
}; | ||
auto out_s = as_synthetic(out.get()); | ||
|
||
auto domain = unstructured_domain(tuple{nvertices, nlevels}, tuple{0, 0}); | ||
auto backend = make_backend(backend::gpu<block_sizes_t>(), domain); | ||
backend.stencil_executor()().arg(out_s).assign(0_c, empty_stencil{}).execute(); | ||
} | ||
|
||
struct empty_column : fwd { | ||
static GT_FUNCTION constexpr auto prologue() { | ||
return tuple(scan_pass([](auto acc) { return acc; }, host_device::identity())); | ||
} | ||
|
||
static GT_FUNCTION constexpr auto body() { | ||
return scan_pass([](auto acc) { return acc; }, host_device::identity()); | ||
} | ||
}; | ||
|
||
TEST(backend_gpu, empty_domain_column) { | ||
constexpr size_t nvertices = 0; | ||
constexpr size_t nlevels = 0; | ||
|
||
using block_sizes_t = meta::list<meta::list<unstructured::dim::horizontal, int_t<32>>, | ||
meta::list<unstructured::dim::vertical, int_t<1>>>; | ||
|
||
auto out = cuda_util::cuda_malloc<int>(nvertices * nlevels); | ||
auto as_synthetic = [](int *x) { | ||
return sid::synthetic() | ||
.set<property::origin>(sid::host_device::simple_ptr_holder(x)) | ||
.set<property::strides>( | ||
hymap::keys<unstructured::dim::horizontal, unstructured::dim::vertical>::make_values(5_c, 1_c)); | ||
}; | ||
auto out_s = as_synthetic(out.get()); | ||
|
||
auto domain = unstructured_domain(tuple{nvertices, nlevels}, tuple{0, 0}); | ||
auto backend = make_backend(backend::gpu<block_sizes_t>(), domain); | ||
backend.vertical_executor()().arg(out_s).assign(0_c, empty_column{}, 0.0f).execute(); | ||
} | ||
|
||
} // namespace | ||
} // namespace gridtools::fn::backend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like the following (as the first thing in the function) would be more readable. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and maybe even better to move it outside of gpu.hpp to the common part where we are dispatching to the backends (but didn't check where that is and if it really makes sense).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created an
is_domain_empty
function. I usually use anif
block instead of an early return when the code is simple, but I'm happy with either.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer Hannes early return in this case as I my eyes don’t have to track any braces opening and closing. And my proposed changes would be even a bit more minimalistic, just before the kernel launch:
(of course this does not include the optimization of skipping unnecessary computation of ptr_holder, strides, etc., but I don’t think we need any optimization of such trivial computations for an empty domain)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, uhh, it was with the
blocks
before Hannes suggested to make it a function :) Now that I wrote the empty domain function, I'd rather keep it, because that satisfies the requirements independent of how you calculate the blocks.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll also put an early return