-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Batch jacobi device kernels #1600
Conversation
96c2138
to
5d26c0e
Compare
810bc89
to
c5c918e
Compare
a2e3afa
to
4a6e2ef
Compare
32de29b
to
fcff54f
Compare
c5c918e
to
f966541
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. Mostly minor comments regarding consistency left.
One thing I've noticed is that some of the functions in the device preconditioner classes explicitly synchronize at the end, while others don't. Maybe it would be best to find a consistent behavior. I think I would prefer to not synchronize in the functions, but instead require synchronization after the function call.
// constexpr int subwarp_size = | ||
// gko::kernels::dpcpp::jacobi::get_larger_power(compiled_max_block_size); | ||
// TODO: Find a way to allow smaller block_sizes (<16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sounds like the whole kernel selection approach is unnecessary here. So maybe remove the template parameter for now and add it later.
5504e62
to
83a9280
Compare
05eda68
to
2911666
Compare
dab3875
to
92c6ba0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one major thing left: dealing with d
value in invert_dense_block
are different between sycl and cuda.
common_generate_for_all_system_matrix_types(batch_id); | ||
item_ct1.barrier(sycl::access::fence_space::local_space); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the barrier should take quite some overhead.
If it is unnecessary, maybe remove it.
@@ -257,7 +249,7 @@ TEST_F(BatchJacobi, CanSolveLargeMatrixSizeHpdSystemWithScalarJacobi) | |||
for (size_t i = 0; i < num_batch_items; i++) { | |||
auto comp_res_norm = res.host_res_norm->get_const_values()[i] / | |||
linear_system.host_rhs_norm->get_const_values()[i]; | |||
ASSERT_LE(iter_counts->get_const_data()[i], max_iters); | |||
EXPECT_LT(iter_counts->get_const_data()[i], max_iters); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it possbly equal, maybe use EXPECT_LE
not EXPECT_LT
?
Will merge this only once #1396 has been merged. |
973c77b
to
37e71d8
Compare
@pratikvn I will check if I can find the issue with the SYCL backend. If I'm not successful, we can just not merge that. |
37e71d8
to
d7bc39e
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1600 +/- ##
===========================================
- Coverage 89.97% 89.28% -0.70%
===========================================
Files 752 752
Lines 60445 60467 +22
===========================================
- Hits 54387 53985 -402
- Misses 6058 6482 +424 ☔ View full report in Codecov by Sentry. |
Co-authored-by: Isha Aggarwal <isha.aggarwal2@kit.edu> Co-authored-by: Aditya Kashi <kashia@ornl.gov>
Co-authored-by: Phuong Nguyen <phuong.nguyen@icl.utk.edu>
Co-authored-by: Marcel Koch <marcel.koch@kit.edu>
Co-authored-by: Marcel Koch <marcel.koch@kit.edu>
Co-authored-by: Yu-Hsiang Tsai <yhmtsai@gmail.com>
The failure is fairly minimal as it happens only on some integrated GPUs (some Gen 11). Nonetheless, similarly to the DPC++ non-batch Jacobi kernels, this indicates a bigger investigation and fix of these kernels is necessary.
d7bc39e
to
691a479
Compare
|
This PR adds the device kernels for batch Jacobi preconditioner.