Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA early exit on empty grid #1729

Merged
merged 9 commits into from
Aug 17, 2022
Merged
17 changes: 17 additions & 0 deletions include/gridtools/fn/backend/gpu.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -139,16 +139,27 @@ namespace gridtools::fn::backend {
}
};

template <class Sizes>
bool is_domain_empty(const Sizes &sizes) {
return tuple_util::host::apply([](auto... sizes) { return ((sizes == 0) || ...); }, sizes);
}

template <class BlockSizes, class Sizes, class StencilStage, class MakeIterator, class Composite>
void apply_stencil_stage(gpu<BlockSizes> const &g,
Sizes const &sizes,
StencilStage,
MakeIterator make_iterator,
Composite &&composite) {

if (is_domain_empty(sizes)) {
return;
}

auto ptr_holder = sid::get_origin(std::forward<Composite>(composite));
auto strides = sid::get_strides(std::forward<Composite>(composite));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like the following (as the first thing in the function) would be more readable. What do you think?

Suggested change
if empty_domain(sizes)
return;

Copy link
Contributor

@havogt havogt Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and maybe even better to move it outside of gpu.hpp to the common part where we are dispatching to the backends (but didn't check where that is and if it really makes sense).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created an is_domain_empty function. I usually use an if block instead of an early return when the code is simple, but I'm happy with either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer Hannes early return in this case as I my eyes don’t have to track any braces opening and closing. And my proposed changes would be even a bit more minimalistic, just before the kernel launch:

if (blocks.x ==0 || blocks.y == 0 || blocks.z == 0)
    return;

(of course this does not include the optimization of skipping unnecessary computation of ptr_holder, strides, etc., but I don’t think we need any optimization of such trivial computations for an empty domain)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, uhh, it was with the blocks before Hannes suggested to make it a function :) Now that I wrote the empty domain function, I'd rather keep it, because that satisfies the requirements independent of how you calculate the blocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll also put an early return

auto [blocks, threads] = blocks_and_threads<BlockSizes>(sizes);
assert(threads.x > 0 && threads.y > 0 && threads.z > 0);
cuda_util::launch(blocks,
threads,
0,
Expand Down Expand Up @@ -190,12 +201,18 @@ namespace gridtools::fn::backend {
Composite &&composite,
Vertical,
Seed seed) {

if (is_domain_empty(sizes)) {
return;
}

auto ptr_holder = sid::get_origin(std::forward<Composite>(composite));
auto strides = sid::get_strides(std::forward<Composite>(composite));
auto h_sizes = hymap::canonicalize_and_remove_key<Vertical>(sizes);
int v_size = at_key<Vertical>(sizes);

auto [blocks, threads] = blocks_and_threads<BlockSizes>(h_sizes);
assert(threads.x > 0 && threads.y > 0 && threads.z > 0);
cuda_util::launch(blocks,
threads,
0,
Expand Down
1 change: 1 addition & 0 deletions tests/regression/fn/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,5 @@ gridtools_add_fn_regression_test(fn_copy SOURCES fn_copy.cpp PERFTEST)
gridtools_add_fn_regression_test(fn_unstructured_nabla SOURCES fn_unstructured_nabla.cpp PERFTEST)
gridtools_add_fn_regression_test(fn_tridiagonal_solve SOURCES fn_tridiagonal_solve.cpp PERFTEST)
gridtools_add_fn_regression_test(fn_cartesian_vertical_advection SOURCES fn_cartesian_vertical_advection.cpp PERFTEST)
gridtools_add_fn_regression_test(fn_empty_domain SOURCES fn_empty_domain.cpp)
gridtools_add_fn_regression_test(fn_vertical_indirection SOURCES fn_vertical_indirection.cpp)
55 changes: 55 additions & 0 deletions tests/regression/fn/fn_empty_domain.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
/*
* GridTools
*
* Copyright (c) 2014-2021, ETH Zurich
* All rights reserved.
*
* Please, refer to the LICENSE file in the root directory.
* SPDX-License-Identifier: BSD-3-Clause
*/

#include <gtest/gtest.h>

#include <gridtools/fn/unstructured.hpp>

#include <fn_select.hpp>
#include <test_environment.hpp>

namespace {
using namespace gridtools;
using namespace fn;
using namespace literals;

struct empty_stencil {
GT_FUNCTION constexpr auto operator()() const {
return []() { return 0.0f; };
}
};

GT_REGRESSION_TEST(empty_domain_stencil, test_environment<>, fn_backend_t) {
auto out = TypeParam::make_storage([](...) { return 0; });

auto domain = unstructured_domain(tuple{0, 0}, tuple{0, 0});
auto backend = make_backend(fn_backend_t{}, domain);
backend.stencil_executor()().arg(out).assign(0_c, empty_stencil{}).execute();
}

struct empty_column : fwd {
static GT_FUNCTION constexpr auto prologue() {
return tuple(scan_pass([](auto acc) { return acc; }, host_device::identity()));
}

static GT_FUNCTION constexpr auto body() {
return scan_pass([](auto acc) { return acc; }, host_device::identity());
}
};

GT_REGRESSION_TEST(empty_domain_column, vertical_test_environment<>, fn_backend_t) {
auto out = TypeParam::make_storage([](...) { return 0; });

auto domain = unstructured_domain(tuple{0, 0}, tuple{0, 0});
auto backend = make_backend(fn_backend_t{}, domain);
backend.vertical_executor()().arg(out).assign(0_c, empty_column{}, 0.0f).execute();
}

} // namespace