Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gracefully handle overflows #1112

Closed
upsj opened this issue Sep 7, 2022 · 4 comments · Fixed by #1303
Closed

Gracefully handle overflows #1112

upsj opened this issue Sep 7, 2022 · 4 comments · Fixed by #1303
Labels
is:enhancement An improvement of an existing feature. is:good-first-issue Good for newcomers.

Comments

@upsj
Copy link
Member

upsj commented Sep 7, 2022

There are many places where integer computations might overflow in Ginkgo. I think the most important one is the prefix_sum kernel. So I would like to suggest that we add an overflow guard to it, by slightly modifying the addition operation to return -1 on overflow or if one of the operands is also -1. Then we can check the resulting value after the kernel has finished and throw an appropriate exception.
This gives a nicer error message than the cuda: failed to allocate memory block of 18446744066997193496B we usually get.

@upsj upsj added is:enhancement An improvement of an existing feature. is:good-first-issue Good for newcomers. labels Sep 7, 2022
@upsj upsj added this to the Ginkgo 1.5.0 milestone Sep 7, 2022
@yhmtsai
Copy link
Member

yhmtsai commented Sep 8, 2022

it will give wrong result when input is indeed -1?

@upsj
Copy link
Member Author

upsj commented Sep 8, 2022

Can you think of a place where we use prefix_sum and negative inputs make sense? None come to mind for me. If we need it, we can add a kernel without the overflow check later.

@yhmtsai
Copy link
Member

yhmtsai commented Sep 9, 2022

From prefix sum name, I would expect it allows the negative input if the type is signed type.
I would say adding the new postive_prefix_sum for the purpose, not changing the prefix_sum and adding another for the original meaning.
Is the overflow more than the GPU memory or the integer range?

@upsj
Copy link
Member Author

upsj commented Sep 9, 2022

Good point - either add a parameter bool catch_overflows or rename it. Though I don't agree that the original should remain, unless we really need it. This is related to arithmetic overflows, not out-of-memory issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is:enhancement An improvement of an existing feature. is:good-first-issue Good for newcomers.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants