-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release 4.0.01 #1808
Release 4.0.01 #1808
Conversation
* Use the options ENABLE_PERFTEST, ENABLE_EXAMPLES The cmake options KokkosKernels_ENABLE_PERFTESTS and KokkosKernels_ENABLE_EXAMPLES were not actually used, both perf_test/ and example/ were always built as long as KokkosKernels_ENABLE_ALL_COMPONENTS=ON. This makes these options have an effect again. If perftests or examples are enabled but ENABLE_ALL_COMPONENTS=OFF, print a message about why they can't actually be enabled. * From e10harvey: fix typo in perf_test cmake * Add feedback about cmake - Turn ENABLE_PERFTESTS off by default - since both examples and perf tests are off by default, warn if those are ON but can't be enabled because ENABLE_ALL_COMPONENTS=OFF - use ELSE to simplify logic where ENABLE_ALL_COMPONENTS=OFF (cherry picked from commit 834a85e)
Introduce KOKKOSKERNELS_ALL_COMPONENTS_ENABLED variable (cherry picked from commit 76968d3)
Kokkos Kernels version: need to use upper case variables (cherry picked from commit d63de38)
CUSPARSE_MM_ALG_DEFAULT deprecated by cuSparse 11.1 (cherry picked from commit 4f39a18)
blas/blas1: Fix a couple documentation typos. (cherry picked from commit 3a20643)
CUDA 11.4: fixing some failing build while trying to reproduce issue kokkos#1725 (cherry picked from commit 26332ed)
(cherry picked from commit 6088147)
Reduce BatchedGemm test coverage (cherry picked from commit aec946c)
* Fix kk_generate_diagonally_dominant_sparse_matrix hang Use bandwidth to cap the max entries per row, so that the row-filling loop doesn't run forever looking for a column that isn't already present. * Diag-dominant matrix generator: error if bandwidth too small If bandwidth is too small for the requested nnz and row_size_variance, error out with a detailed message. (cherry picked from commit 664bfc4)
This was intended to be a temporary patch, but it will need to stay until 4.1. This means it has to be included in 4.0.1.
MDF: Minor changes to interface for ifpack2 impl (cherry picked from commit 30bd681)
Roc tpls upgrade (cherry picked from commit e35ed21)
For BLAS routines producing a complex scalar result (like zdotc), prefer to get the result via a pointer argument, rather than as a direct return value. Directly returning a std::complex from an "extern C" function is technically not allowed and Clang warns about it. (cherry picked from commit 53599f4)
Adds a better parilut test with gmres (cherry picked from commit 747bb93)
Basically one wants to be very careful about only instantiating View or other object with an execution space only as it might generate a memory type mismatch down the road (cherry picked from commit 1ae5b7d) Conflicts: sparse/src/KokkosSparse_MatrixPrec.hpp Resolved conflict with variable naming "A" vs "_A" in spmv call
* ParIlut: create and destroy spgemm handle for each usage This fixes memory errors on Cuda * Formatting
* Remove deterministic from par_ilut precond test Now that spgemm memory errors have been fixed, it appears to work * Add verbose mode to par_ilut * Fixes for GPU * Fix end_rel_res type to work when scalar is complex * Turn off asynch fixed point on GPU * Reorganize par_ilut handle, group conceptually similar members * Refactor par_ilut deterministic setting Change to async_update and move it to the handle. compute_l_u_factors does not need to run in a serial exespace, that is way overkill. Simply turning async_updates off should allow for deterministic results. Now that we are iterating more than once, it looks like even the hardcoded fixture test works fine with async_updates on since the multiple iterations corrects any "bad" results. * Add comment for par_ilut_precond test settings * Fix ordering warning * par_ilut: add test for nrows=0 (cherry picked from commit aa96a83)
) * par_ilut: make Ut_values view atomic in compute_l_u_factors ... to fix the race issues when async updates are on. * With Ut atomic, no need to avoid async updates on GPU * Remove unnecessary header * Update comments * Fixes for complex scalars * Adjust async update views; default it to off * Fix UtValuesSafeType * Update sparse/impl/KokkosSparse_par_ilut_numeric_impl.hpp * Remove UtViewType in favor of std::conditional (cherry picked from commit 507c29f)
Add and reorder parilut entries Fix broken 4.0.0 changelog url
Please hold merge until successful testing of trilinos/Trilinos#11817 |
GMRES: fixing some type issues related to memory space instantiation (cherry picked from commit f41ff47)
Part of Kokkos C++ Performance Portability Programming EcoSystem 4.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Nathan, this looks good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Nathan!
One final update pushed with c208dac to modify the |
I should have mentioned the snapshot to Trilinos is also updated with the corrected patch version convention |
trilinos/Trilinos#11817 has merged, no longer a blocker |
Thanks Nathan, I am merging this now and creating the tag and release artifact as well. |
Patch release