Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA early exit on empty grid #1729

Merged
merged 9 commits into from
Aug 17, 2022
Merged

CUDA early exit on empty grid #1729

merged 9 commits into from
Aug 17, 2022

Conversation

petiaccja
Copy link
Contributor

@petiaccja petiaccja commented Aug 12, 2022

Launching a kernel with a zero gridSize is not allowed by CUDA. This PR adds a check to exit early and not even attempt to launch the kernel if any of the domain sizes are zero.

@petiaccja
Copy link
Contributor Author

launch jenkins

@gridtoolsjenkins
Copy link
Collaborator

Hi there, this is jenkins continuous integration...
Do you want me to verify this patch?

@petiaccja
Copy link
Contributor Author

launch jenkins

@@ -149,19 +149,22 @@ namespace gridtools::fn::backend {
auto strides = sid::get_strides(std::forward<Composite>(composite));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like the following (as the first thing in the function) would be more readable. What do you think?

Suggested change
if empty_domain(sizes)
return;

Copy link
Contributor

@havogt havogt Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and maybe even better to move it outside of gpu.hpp to the common part where we are dispatching to the backends (but didn't check where that is and if it really makes sense).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created an is_domain_empty function. I usually use an if block instead of an early return when the code is simple, but I'm happy with either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer Hannes early return in this case as I my eyes don’t have to track any braces opening and closing. And my proposed changes would be even a bit more minimalistic, just before the kernel launch:

if (blocks.x ==0 || blocks.y == 0 || blocks.z == 0)
    return;

(of course this does not include the optimization of skipping unnecessary computation of ptr_holder, strides, etc., but I don’t think we need any optimization of such trivial computations for an empty domain)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, uhh, it was with the blocks before Hannes suggested to make it a function :) Now that I wrote the empty domain function, I'd rather keep it, because that satisfies the requirements independent of how you calculate the blocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll also put an early return

@petiaccja
Copy link
Contributor Author

launch jenkins

@petiaccja
Copy link
Contributor Author

jenkins launch

@petiaccja
Copy link
Contributor Author

launch jenkins

@@ -12,6 +12,7 @@
#include <gtest/gtest.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer having this test as a regression test, running for all available backends. You could even just put a new variant of the copy stencil test into tests/regression/fn/fn_copy.cpp I guess. There are already versions with and without domain offsets, so the one with zero domain size would actually fit well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…or add a separate one as you also check the scan. I don’t care much, but enabling it for all backends makes sense for sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a separate regression test (and removed the unit test).

@petiaccja
Copy link
Contributor Author

launch jenkins

@havogt
Copy link
Contributor

havogt commented Aug 16, 2022

good to test

@havogt
Copy link
Contributor

havogt commented Aug 16, 2022

launch perftests

@havogt
Copy link
Contributor

havogt commented Aug 16, 2022

launch perftest

@petiaccja
Copy link
Contributor Author

launch perftest

@petiaccja
Copy link
Contributor Author

launch jenkins

@petiaccja
Copy link
Contributor Author

launch jenkins

@petiaccja
Copy link
Contributor Author

launch perftest

@petiaccja
Copy link
Contributor Author

launch perftest

@petiaccja
Copy link
Contributor Author

launch jenkins

@petiaccja petiaccja merged commit e0202af into GridTools:master Aug 17, 2022
havogt pushed a commit to havogt/gridtools that referenced this pull request Dec 12, 2022
Launching a kernel with a zero gridSize is not allowed by CUDA. This PR adds a check to exit early and not even attempt to launch the kernel if any of the domain sizes are zero.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants