-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build and install downstream Trilinos packages against pre-installed Kokkos using native CMake build system #11545
Comments
…fig files (trilinos#11545) Just need to move where tribits_package_decl() is call and call tribits_pkg_export_cache_var() inside of kokkos_option().
…iles (trilinos#11545) Just need to move where tribits_package_decl() is called before Kokkos defines its options and then call tribits_pkg_export_cache_var() inside of kokkos_option(). NOTE: We don't export the variables Kokkos_ENABLE_TESTS or Kokkos_ENABLE_EXAMPLES because those are special varaibles defined by TriBITS where the project-level variable value may be different than the cache variable value (which is on purpose) and also we don't want to export these variables because downstream packages should not need to know this info. ToDo: Kokkos really should differentiate what options values it exports and which it does not to provide a better defined API (and downstream customers don't need to grep the installed Kokkos_config.h file to figure out this info).
When Kokkos is supplied as an external package, the modern CMake imported targets from the Kokkos<Subpkg>Config.cmake files also provide the needed flags. Therefore, there should be no special mention of Kokkos in the Trilinos configure logic.
…rilinos#11545) With the TriBITS modernization refactoring (TriBITSPub/TriBITS#299) and the generalizated handling of intenral and external packages (TriBITSPub/TriBITS#63), we need packages like Kokkos to set critical compiler options as target properties so that they will be exported in the generated IMPORTED targets of the Kokkos<Subpkg>Targets.cmake file. This is needed, for example, to pass some critical compiler flags from the pre-installed Kokkos to downstream CMake configures of KokkosKernels and the rest of Trilinos (see trilinos#11545).
…s-tribits (trilinos#11545) I resolved the oneline doc conflict in the file: * cmake/tribits/core/package_arch/TribitsPackageMacros.cmake by going with what is on tribits_github_snapshot.
…#11545) I resolved the one-line doc conflict in the file: * cmake/tribits/core/package_arch/TribitsPackageMacros.cmake by going with what is on tribits_github_snapshot.
Thanks for working on this! I will greatly help our Sandia production code when this is done. I'd be happy to help with making the native Kokkos build system export everything TriBITS expects. |
…03-29 Automatically Merged using Trilinos Pull Request AutoTester PR Title: TriBITS snapshot update 2023-03-29 (#11545) PR Author: bartlettroscoe
…alled-kokkos-tribits Automatically Merged using Trilinos Pull Request AutoTester PR Title: Tweaks to Trilinos to get working with pre-installed Kokkos and KokkosKernels (with TriBITS updates) (#11545) PR Author: bartlettroscoe
…s:develop' (28a7b37). * trilinos-develop: Have Kokkos TriBITS build set compiler options as target properties (trilinos#11545) Update logic for TPL_ENABLE_Kokkos=ON (trilinos#11545) TrilinosInstallTests_find_package_Trilinos: Run in own subdir Move check for ParMETS version for Zoltan2 to Zoltan2 (trilinos#63) Have Kokkos TriBITS build properly export options to package config files (trilinos#11545)
…s:develop' (28a7b37). * trilinos-develop: Have Kokkos TriBITS build set compiler options as target properties (trilinos#11545) Update logic for TPL_ENABLE_Kokkos=ON (trilinos#11545) TrilinosInstallTests_find_package_Trilinos: Run in own subdir Move check for ParMETS version for Zoltan2 to Zoltan2 (trilinos#63) Have Kokkos TriBITS build properly export options to package config files (trilinos#11545)
… packages (trilinos#11545) This script removes the usage of Kokkos subpackages from downstream TriBITS packages.
FYI: I am working to remove subpackages from Kokkos and usage in all downstream TriBITS packages. I am developing this refactoring as a script that you run on downstream TriBITS packages and it will make all of the needed changes automatically (so TriBITS packages outside of Trilinos can run this single script and absorb the changes). NOTE: Removing Kokkos subpackages definingly breaks backward compatibility for both downstream TriBITS and non-TriBITS CMake packages and for users that are configuring Trilinos (as it changes the names of some of the enable vars). (But it should be easy to absorb the changes in both cases given the scripts I am producing.) But this refactoring is not like falling off a log. Update: 4/18/2023 While this change can break backward compatibility for some customers, it may not for many/most customers. If they are just depending on Kokkos then they may not need to change anything. |
* Removed the listing of subpackages from kokkos/cmake/Dependencies.cmake * Remove the now-unused files kokkos/[core,containers,algorithms,simd]/cmake/Dependencies.cmake * Removed TriBITS macros for a package with subpackages and replace with those for a package with no subpackages. Also, removed all subpackage macros. * Changed kokkos_process_subpackage() to just call add_subdirectory(). * Changed the name of old KokkosCore tests "XXX" to "CoreXXX" because the prefix for all tests is now "Kokkos_" instead of "KokkosCore_" * Changed the the name of the containers/unit_test/CMakeLists.txt file test 'TestCompileOnly' to 'ContainersTestCompileOnly' because there is now a 'CoreTestCompileOnly' test (all prefixed with 'Kokkos_'). * Removed the usage of tribits_configure_file() and wrapper kokkos_configure_file() and just call configure_file(). The location of PACKAGE_SORUCE_DIR changed so the calls to tribits_configure_file() no longer worked. (Also, these X_config.h.in files were not using any of the TriBITS-supported features that needed the calling of tribits_configure_file() so there was no reason to not just call raw configure_file().)
…ages (trilinos#11545) This script removes the usage of Kokkos subpackages from downstream TriBITS packages CMake files.
This is the result of running the script remove_kokkos_subpackages_r.sh to absorb the refactoring of Kokkos to remove the usage of TriBITS subpackages. Manual changes may need to be made after this.
… downstream packages (trilinos#11545)
…nos#11545) This makes it a little more robust and reproducable to refactor the packages downstream from Kokkos. This script can also be used by other TriBITS and non-TriBITS CMake packages/projects that depend on Kokkos to adjust to this refactoring.
…ilinos#11545) TODO: Change this to remove_kokkos_subpackages_from_trilinos_packages.sh
…s:develop' (ab899a0). * trilinos-develop: (22 commits) Remove non-existant subdir kokkos-kernels/common/common (trilinos#11921, trilinos#11863) Teuchos: Fixing cmake logic Teuchos: Fixing catch() issues with C++ language drift TrilinosSS: include <omp.h> (Fix trilinos#11867) MueLu hierarchical: Fix build error Tpetra: Changes to StaticView for Kokkos PTHREAD to THREADS change Teuchos: Automatically enabling Tecuhos_ENABLE_THREAD_SAFE if you have Kokkos THREADS or OPENMP for the host Stokhos: Add missing KOKKOS_INLINE_FUNCTION to fix build errors on HIP Phalanx: Remove usage of undefined var Kokkos_INCLUDE_DIRS (trilinos#11545) Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos#11938) Update for removal of Kokkos subpackages and Kokkos test renamings (trilinos#11545, trilinos#11808) KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos#11545) Add test simpleBuildAgainstTrilinos_by_package_build_tree_name (trilinos#11545) Pass in and define compilers before calling find_package(Trilinos) (trilinos#11545) Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (trilinos#6157) Export Kokkos_ENABLE_<OPTION> that are relevant Do not append to Kokkos_OPTIONS variables those in the do not export list Expand list of kokkos options not to export with cmake Tpetra: Don't use std::binary_function Tpetra: Fixing missing HIP tesT ...
…s:develop' (ab899a0). * trilinos-develop: (23 commits) Remove non-existant subdir kokkos-kernels/common/common (trilinos#11921, trilinos#11863) Teuchos: Fixing cmake logic Teuchos: Fixing catch() issues with C++ language drift fastilu: Fix memory leak. TrilinosSS: include <omp.h> (Fix trilinos#11867) MueLu hierarchical: Fix build error Tpetra: Changes to StaticView for Kokkos PTHREAD to THREADS change Teuchos: Automatically enabling Tecuhos_ENABLE_THREAD_SAFE if you have Kokkos THREADS or OPENMP for the host Stokhos: Add missing KOKKOS_INLINE_FUNCTION to fix build errors on HIP Phalanx: Remove usage of undefined var Kokkos_INCLUDE_DIRS (trilinos#11545) Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos#11938) Update for removal of Kokkos subpackages and Kokkos test renamings (trilinos#11545, trilinos#11808) KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos#11545) Add test simpleBuildAgainstTrilinos_by_package_build_tree_name (trilinos#11545) Pass in and define compilers before calling find_package(Trilinos) (trilinos#11545) Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (trilinos#6157) Export Kokkos_ENABLE_<OPTION> that are relevant Do not append to Kokkos_OPTIONS variables those in the do not export list Expand list of kokkos options not to export with cmake Tpetra: Don't use std::binary_function ...
…s:develop' (ab899a0). * trilinos-develop: (23 commits) Remove non-existant subdir kokkos-kernels/common/common (trilinos#11921, trilinos#11863) Teuchos: Fixing cmake logic Teuchos: Fixing catch() issues with C++ language drift fastilu: Fix memory leak. TrilinosSS: include <omp.h> (Fix trilinos#11867) MueLu hierarchical: Fix build error Tpetra: Changes to StaticView for Kokkos PTHREAD to THREADS change Teuchos: Automatically enabling Tecuhos_ENABLE_THREAD_SAFE if you have Kokkos THREADS or OPENMP for the host Stokhos: Add missing KOKKOS_INLINE_FUNCTION to fix build errors on HIP Phalanx: Remove usage of undefined var Kokkos_INCLUDE_DIRS (trilinos#11545) Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos#11938) Update for removal of Kokkos subpackages and Kokkos test renamings (trilinos#11545, trilinos#11808) KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos#11545) Add test simpleBuildAgainstTrilinos_by_package_build_tree_name (trilinos#11545) Pass in and define compilers before calling find_package(Trilinos) (trilinos#11545) Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (trilinos#6157) Export Kokkos_ENABLE_<OPTION> that are relevant Do not append to Kokkos_OPTIONS variables those in the do not export list Expand list of kokkos options not to export with cmake Tpetra: Don't use std::binary_function ...
…iles (trilinos#11545) Just need to move where tribits_package_decl() is called before Kokkos defines its options and then call tribits_pkg_export_cache_var() inside of kokkos_option(). NOTE: We don't export the variables Kokkos_ENABLE_TESTS or Kokkos_ENABLE_EXAMPLES because those are special varaibles defined by TriBITS where the project-level variable value may be different than the cache variable value (which is on purpose) and also we don't want to export these variables because downstream packages should not need to know this info. ToDo: Kokkos really should differentiate what options values it exports and which it does not to provide a better defined API (and downstream customers don't need to grep the installed Kokkos_config.h file to figure out this info).
When Kokkos is supplied as an external package, the modern CMake imported targets from the Kokkos<Subpkg>Config.cmake files also provide the needed flags. Therefore, there should be no special mention of Kokkos in the Trilinos configure logic.
…rilinos#11545) With the TriBITS modernization refactoring (TriBITSPub/TriBITS#299) and the generalizated handling of intenral and external packages (TriBITSPub/TriBITS#63), we need packages like Kokkos to set critical compiler options as target properties so that they will be exported in the generated IMPORTED targets of the Kokkos<Subpkg>Targets.cmake file. This is needed, for example, to pass some critical compiler flags from the pre-installed Kokkos to downstream CMake configures of KokkosKernels and the rest of Trilinos (see trilinos#11545).
…okkos#6104) * Kokkos: Remove TriBITS subpackages (#11545) * Removed the listing of subpackages from kokkos/cmake/Dependencies.cmake * Remove the now-unused files kokkos/[core,containers,algorithms,simd]/cmake/Dependencies.cmake * Removed TriBITS macros for a package with subpackages and replace with those for a package with no subpackages. Also, removed all subpackage macros. * Changed kokkos_process_subpackage() to just call add_subdirectory(). * Added prefix 'Core' to several tests in kokkos/Core/unit_tests/CMakeLists.txt now that prefix is 'Kokkos_' * Added prefix 'Containers' to several tests in kokkos/containers/unit_tests/CMakeLists.txt and kokkos/containers/performance_tests/CMakeLists.txt now that prefix is 'Kokkos_' * Change name of the kokkos/containers/performance_tests/CMakeLists.txt file test 'PerformanceTest_XXX' to 'ContainersPerformanceTest_XXX'. * Added prefix 'Algorithms' to several tests in kokkos/algorithms/unit_tests/CMakeLists.txt now that prefix is 'Kokkos_' * Removed the usage of tribits_configure_file() and wrapper kokkos_configure_file() and just call configure_file(). The location of PACKAGE_SORUCE_DIR changed so the calls to tribits_configure_file() no longer worked. (Also, these X_config.h.in files were not using any of the TriBITS-supported features that needed the calling of tribits_configure_file() so there was no reason to not just call raw configure_file().) SQUASH AGINST: Kokkos: Remove TriBITS subpackages (#11545) * Fix native build of Kokkos after removing subpackages (trilinos/Trilinos#11545) This restores the building of the raw CMake build of Kokkos after the refactoring to remove TriBITS subpackages. * Kokkos: Remove last of subpackage stuff, fix for tests enable (trilinos/Trilinos#11545) This gives a full passing build and tests with the Trilinos PR GenConfig clang-11.0.1 build configuration. * Fixup update target name in python test script that gets configured --------- Co-authored-by: Damien L-G <dalg24@gmail.com>
…os#11545, trilinos#11808) This duplication resulted from running a simple automated script that created a commit in PR trilinos#11808.
…os#11545, trilinos#11808) This duplication resulted from running a simple automated script that created a commit in PR trilinos#11808.
…os#11545, trilinos#11808) This duplication resulted from running a simple automated script that created a commit in PR trilinos#11808.
…os#11545, trilinos#11808) This duplication resulted from running a simple automated script that created a commit in PR trilinos#11808.
With the merge of PR: This is story is not fully complete |
…os#11545, trilinos#11808) This duplication resulted from running a simple automated script that created a commit in PR trilinos#11808.
…os#11545, trilinos#11808) This duplication resulted from running a simple automated script that created a commit in PR trilinos#11808.
FYI: Here is the Spack PR that has the Trilinos Spack package using the Kokkos Spack package: |
FYI: Note that there have been some issues in the Spack packages downstream from Trilinos for this change in: You can see these in:
These are different issues but there was some hard-coded logic that expected Kokkos to be installed along with Trilinos. There was no reason to do that (and that has been the case for at least 10 years). The possible solutions for the problems are described in: In summary, my recommendation was/is:
and @jwillenbring agreed in xsdk-project/xsdk-issues#214 (comment). In my opinion, that is the direction that Trilinos and the package ecosystem should be going. |
I agree with this approach. I had no idea that worked ( |
@sebrowne, this has been supported for a least 10 years (support for which was not written by me). I have added more explicit documentation for this as part of this TriBITS/Trilinos CMake modernization work. For example, see: and see a (tested) example specific to Trilinos at: |
1a3ea28 Merge pull request #6231 from ndellingwood/master 3e85bd9 Fix windows symlink configure issue (#6241) ea7b124 CHANGELOG fixup following merge 25592c5 Update master_history.txt adde1e6 Merge branch 'release-candidate-4.1.00' for 4.1.00 9e84430 Merge pull request #6228 from masterleinad/cherry_pick_6223 dd81ecb Merge pull request #6223 from masterleinad/fix_simd_on_gpus 5c3e683 [4.1.00] Changelog for 4.1.00 (#6226) cd96a74 Merge pull request #6219 from masterleinad/fix_sycl_makefile_4_1_00 23aadf4 Fix compiling SYCL with KOKKOS_IMPL_DO_NOT_USE_PRINTF_USAGE afc1929 Update version to 4.1.00 6ca60c3 Improve OpenMP affinity warning to include MPI concerns (#6185) e200ba1 [HIP] Improve heuristic deciding the number of blocks used in parallel_reduce (#6160) 43a797b Left align demangled stacktrace output. (#6191) a406372 Fix global fence in Kokkos::resize(DynRankView) (#6184) 8661773 Merge pull request #6195 from fnrizzi/is_trait_v 98f9b4c add trait and test e30f040 shortcut value for is_dynamic_view 789b62c Weed out verbose output from `dynamic_view` container unit test (#6173) e2a7f08 Merge pull request #6171 from rgayatri23/openmptarget_nvhpc 8266abd Merge pull request #6183 from ldh4/simd_replace_unavailable_loadu_storeu_instr ad966bd OpenMPTarget: include desul changes. c72615a Merge remote-tracking branch 'upstream/develop' into openmptarget_nvhpc 7b0e378 Replace _mm512_loadu_epi64 and _mm512_storeu_epi64 with _mm512_loadu_si512 and _mm512_storeu_si512 18c5395 Merge pull request #5982 from masterleinad/cleanup_functor_analysis 6c134af Merge pull request #6172 from masterleinad/remove_desul_sycl_extended_namespace 0b7bed5 Allow passing a temporary std::vector to partition_space (#6167) 65ffe4c Also create symlinks for CMake configuration files to cmake_packages/Kokkos for TriBITS (#6163) 915c174 SIMD: make binary op tests to test against all data types (#5913) 62ba94c Merge pull request #6175 from dalg24/changelog_372 502dc03 Merge pull request #6176 from bartlettroscoe/tril-11938-tribits-hwloc 2bc7b96 Clean up FunctorAnalysis 9df5a01 Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos/Trilinos#11938) 1af1379 Cherry-pick v3.7.02 changelog into develop [ci skip] bf34573 OpenMPTarget: Restore desul changes. 925aca1 OpenMPTarget: Replace kokkos macros in desul. 538d18d OpenMPTarget: update fixme comment. e832781 Remove extended_namespace template paramter for SYCLMemoryOrder/Scope c23cfb8 Update Makefile.kokkos d1ecf9a OpenMPTarget: Add a fixme. bbd9a78 OpenMPTarget: Changes for OpenMPTarget backend with nvhpc compiler. ab6f756 Implement `HPX::in_parallel` (#6143) e88537f Allow linking against build tree (#6078) b3f9f78 sorting: add to binsort support for strided views and reorg tests (#6081) 2a5c949 Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (#6157) 2a382b4 Merge pull request #6126 from masterleinad/fix_uninitialized_value_in_combined_reducer 461310d Merge pull request #6156 from masterleinad/fix_cuda_lambda_trilinos 12e9645 KokkosTools: Don't call callbacks before backends are initialized (#6114) f8a2a80 `BinSort`, `BinOp1D`, `BinOp3D`: mark default constructor as deleted (#6131) d92158c Fix bogus warnings in nested CUDA parallel_reduce 31a5f21 Merge pull request #6136 from masterleinad/fix_nd_builtin_reductions_with_loc 5d81422 Merge pull request #6155 from dalg24/fixup_dual_view 85b014b Fix Kokkos_ENABLE_CUDA_LAMBDA for Trilinos 131503d Revert to `DualView<class,class=void,class=void,class=void>` when deprecated code 4 is enabled 382f0be Merge pull request #6150 from dalg24/drop_profiling_load_print_option b2645f8 OpenMPTarget: Enable Cray compiler for the OpenMPTarget backend. (#5889) 6c0adb5 Merge pull request #6149 from dalg24/fixup_cuda_lambda d74df9b [ci skip] Add nightly ci for spack (#6135) 8ede4a4 Merge pull request #6142 from dalg24/cleanup_exported_kokkos_options d92988f Suppress bogus warning about CUDA_LAMBDA being ON 57226c9 Drop Kokkos_ENABLE_PROFILING_LOAD_PRINT option 87c7be9 Merge pull request #6047 from masterleinad/simplify_sycl_reductions 3f565bb Export Kokkos_ENABLE_<OPTION> that are relevant 3c0f9a1 Merge pull request #6148 from dalg24/drop_kokkos_enable_launch_compiler 6b18c2a Drop Kokkos_ENABLE_LAUNCH_COMPILER option c935774 Do not append to Kokkos_OPTIONS variables those in the do not export list 2bcfa51 Expand list of kokkos options not to export with cmake 8f4fb72 Merge pull request #6137 from masterleinad/fix_sycl_bit_cast 3329989 Merge pull request #6123 from e10harvey/floating_point_wrapper ee43d2a Add guards for Cuda c67ddea Try running for other execution spaces bf9c242 Allow deprecated declarations in SYCL+Cuda CI e8dba15 Improve indentation of comments 99161e0 Disable tests for OpenMPTarget cbc7e88 Fix bit_cast for SYCL again f8ed850 Disable tests failing with NVHPC 4197fa8 Merge pull request #6120 from uliegecsm/kokkos-dual-view-template-types 02fb8d4 core/src: Move floating_point_wrapper to private header b86d73a sorting an empty view should exit early and not fail (#6130) 1767bfe dual view: update template types (#6085) df5681d Don't restrict index type in builtin reducers 766f00d Merge pull request #6133 from msimberg/hpx-post-apply-compat 336473d Merge pull request #6132 from msimberg/hpx-version-requirement-1.8.0 d13cc09 Conditionally use hpx::post instead of hpx::apply based on HPX version 12b0c80 Increase minimum required HPX version to 1.8.0 8a541b5 Move half traits to private header and add half/bhalf infinity trait (#6055) 3f602b6 Merge pull request #6129 from masterleinad/remove_unused_attach_texture_object 6422681 Merge pull request #6121 from masterleinad/use_sycl_bit_cast 0018848 Cuda: Remove unused attach_texture_object e94b5dd Kokkos_BitManipulation: KOKKOS_COMPILER_GCC->KOKKOS_COMPILER_GNU (#6119) 7009a28 Merge pull request #6122 from masterleinad/ambiguous_bit_cast 6b2459c Fix nightlies -- workaround compiler bug in GCC 9.1 and 9.2 (#6118) 5f45c30 Qualify calls possibly ambiguous calls to bit_cast 1bc1a51 Import sycl::bit_cast into the Kokkos namespace c62a42e Allow templated functors in parallel_for, parallel_reduce and parallel_scan (#5976) fb0c1b8 Merge pull request #6106 from crtrott/fix-nvhpc-compilation f15b5ab Merge pull request #6116 from rbberger/hpcbind_slurm_bugfix a85923d Merge pull request #6110 from dalg24/fixup_cuda_lambda 531b01d Fix macro guards in test for NVC++ as the CUDA compiler aa7ab5f hpcbind: check for correct Slurm variable b26ee87 Merge pull request #6113 from fnrizzi/use_assert_eq_for_std_algo_tests 6ede773 Merge pull request #6064 from masterleinad/sycl_improve_parallel_scan_new 41d9d06 Reintroduce test skip for nvhpc < 23.3 ce0b78f Merge pull request #6111 from dalg24/drop_unused_cmake_macros 81ce338 use ASSERT_EQ in all std algorithms tests ef5d447 Fixup cmake style b82161b Drop unused cmake macros 417a6ee Work around NVHPC 23.x not dealing with __isGlobal 0954a1b Drop CUDA_LAMBDA guards in Cuda headers cfbaf28 Reorganize ZeroMemset (#6087) 798efc5 Always pass -extended-lambda option to NVCC and force Kokkos_ENABLE_CUDA_LAMBDA ON 1c0e3bf Update the OpenACC parallel_reduce() constructs with Range/MDRange/Team (#6072) cf82edc Merge pull request #6108 from dalg24/drop_algorithms_and_containers_config_files d7c06c4 Revert "Merge pull request #5964 from PhilMiller/cuda-lambda-default" 7ef7d02 Drop pointless Kokkos{Algorithms,Containers}_config.h files 5fa72b5 Kokkos: Remove TriBITS Kokkos subpackages (trilinos/Trilinos#11545) (#6104) 60b982a Work around NVHPC 23.x issues ea134de Work around NVHPC issue with enum types edf63b3 Merge pull request #6101 from dalg24/bit_cast e247508 Added multiple reducers support for team-level parallel reduce (#5727) 8dc8f49 Fix typo and remove accidentally committed assertions 26ae798 change impl of `is_sorted_until` to use reduce (#6097) 7533cb4 Disable tests that fail at runtime with NVHPC (likely not liking the class declaration within the body of the functor) d6944df Merge pull request #6008 from uliegecsm/cuda-uvm-space-instance-fence 5c2d948 view(uvm): fence if need in allocation (#6005) 432988b Clang-format glitch eff2716 Use Kokkos::bit_cast in SIMD instead of rolling its own e8a44e5 Add runtime tests for bit_cast ddf55c1 Add the Experimental:: builtin variant (just defer to regular bit_cast) 71ee48f Add compile time tests for the constraints on the bit_cast function template ab41ef8 Add implementation of bit_cast in <Kokkos_BitManipulation.hpp> 945281a Merge pull request #5964 from PhilMiller/cuda-lambda-default a45cc1e fix ternary op in subset of std algorithms not working with nvhpc (#6095) 7a166d2 Enable OpenMP in CUDA-11.0-NVCC-RDC to test DEPRECATED_CODE_3=ON (#5978) 4b6d971 OpenMPTarget: Update hierarchical parallelism. (#6043) d251954 Work around nvcc issue for view_mapping and add FIXME_NVCC comment 5b1f341 Merge pull request #6098 from ndellingwood/update-changelog-4.0.01 e8067d4 [ci skip] Fixup changelog c28472a Update changelog 4407f7b Remove various test exclusions based on KOKKOS_ENABLE_CUDA_LAMBDA 7e32999 Always expect KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA to be set 51d7c72 Don't fail to define broader 'lambdas are available' macro 4470284 Fix definitions and docs to remove CUDA Lambda option ddded0e Implement CMake messages per team decision ca9fd21 Change Makefile.kokkos too a906356 Tentative arguments switch for nvcc 12+ 4846d47 Unconditionally enable CUDA extended lambda support 62d2b6c Merge pull request #6080 from ndellingwood/master e5490e1 Add support for Darwin 32-bit and PPC (#5916) 56ef02c Disable failed bit manipulation tests when compiled by NVHPC (#6088) bdaa12c Compiling with auto deduction of workgroup sizes 3cc9915 Improve SYCL parallel_scan d30b04d Merge pull request #6065 from masterleinad/fix_join_value_wrapper_for_neutral_element de5c017 Update OpenACC FunctorAdapter (#6077) 55bbd9f Converted a shared_ptr to a host view in UnorderedMap (#6073) 7793406 Merge pull request #6086 from masterleinad/fix_sycl_execution_space d5fa56e Fix up SYCL execution space instance creation for Intel GPUs 0ab1f11 Update master_history.txt 5893754 Update version to 4.0.1 15776f9 Merge pull request #6046 from ajpowelsnl/CHANGELOG-4.0.0/team_thread_sort b3bb4a6 Update changelog (#6058) 24c62bf Merge pull request #6074 from masterleinad/fix_sycl_cuda 220495f Merge pull request #5906 from masterleinad/define_kokkos_compiler_intel_llvm 4b27b7d Fix Kokkos_SIMD with AVX2 on 64-bit architectures (#6075) b72984a Merge branch 'release-candidate-4.0.01' for 4.0.01 2e51c67 Explicitly cast to CombinedFunctorReducerType a92e091 Only pass one wrapper object in SYCL reductions c083089 Merge pull request #6057 from cz4rs/changelog-4.0.01 06dbc15 Fix PerfTests by limiting GramSchmidt 9444634 perf_test is still not working 0c681ed SYCL: Use in-order queue for SYCL+Cuda c09dd1c Merge pull request #6059 from Rombur/fix_ci_host b21b1e4 Merge pull request #6068 from crtrott/fix-makefile-4.0.01 2ac576a Update changelog 6d2e899 Fix typo in Makefile.kokkos 57413bb Merge pull request #6063 from stanmoore1/makefile_typo db802ac Fix join for ValueWrapperForNoNeutralElement 5680563 Fix bug in Makefile.kokkos 4feae9e Reduce size of ScatterView test when using OpenMP 9004274 Merge pull request #6056 from masterleinad/partially_reverse_5504 3eaf13e fix based on comments 318d84c Update changelog to 4.0.01 [ci skip] 079268c Partially reverse #5504 0d96f88 OpenMPTarget: Changes to Makefile.kokkos (#6053) 7645d6c Merge pull request #6052 from masterleinad/fix_unordered_map_shared_space d26f88c Don't create a shared state for size() in UnorderedMap's deep_copy 3b1afb5 Remove libnuma (#6048) bb845f2 Merge pull request #6016 from masterleinad/use_wextra 72687d8 Merge pull request #6049 from dalg24/build_md d0f5777 Remove (outdated) license information [ci skip] 83873a6 Remove Kokkos Keyword Listing section from BUILD.md and refer to the wiki instead bb7ae99 CHANGELOG.md: add threads sort 48b34de Desul atomics: let relocatable device code mode be part of the configuration (#5991) 0702062 Merge pull request #5504 from masterleinad/sycl_remove_enqueue_barrier_memcpy_workaround 8352a11 Merge pull request #5855 from dalg24/num_threads_and_device_id b24dcb4 Merge pull request #5990 from jczhang07/patch-1 3f6a854 Merge pull request #6041 from ldh4/remove_unused_thread_vector_range_ctors d80f580 Define KOKKOS_COMPILER_INTEL_LLVM 140cbd7 Define at most one KOKKOS_COMPILER* macro b6d1dba Merge pull request #6038 from masterleinad/pgi_compiler 38c6476 Merge pull request #6036 from masterleinad/cherry_pick_trilinos 59124ce Merge pull request #6037 from masterleinad/cherry_pick_6036 0126dcb Remove unused constructors for ThreadVectorRangeBoundairesStruct that are not taking in TeamMemberType as an argument. 29826df Try removing _kokkos_pgi_compiler_bug_workaround 39c35a8 KOKKOS_COMPILER_PGI -> KOKKOS_COMPILER_NVHPC c9a9ee0 Cherry-pick TriBITS update from Trilinos 7aabd2d Cherry-pick TriBITS update from Trilinos b57c17b Add -Wextra 9b644e0 Fix OMPT size compare warnings be65fe4 Fix enum warnings 715a6ff Merge pull request #6030 from masterleinad/fix_missing_field_initializers 0ce3895 Fix -Wmissing-field-initializers warning 5e57438 Relax scratch space limits for HIP reductions (#6029) ef1ea93 Add -Wdeprecated-copy warning and fix OMPT scan bug related to assignment operators (#6026) 9b06259 #6027: replace remaining instances of ALL_t with Kokkos::ALL_t (#6028) 8b5881f Merge pull request #6022 from crtrott/4001-cp-support-ada e86c8ea Merge pull request #6021 from dalg24/rc_4_0_01_support_for_amd_gpu_gfx1100 fdb089b Add UnorderedMapInsertOps for coo2crs (#5877) 0476985 Add half_t and bhalf_t limits (#5778) e6b8548 Merge pull request #6018 from dalg24/rc_4_0_01_bug_desul_atomics_numeric_limits_max 82b3905 Merge pull request #6023 from dalg24/rc_4_0_01_fix_changelog eb93bbd Merge pull request #6019 from dalg24/rc_4_0_01_warning_hip_std_memcpy a556f49 Merge pull request #6020 from crtrott/4001-cp-nvcc12-cpp20 ffa4f03 Fixup 4.0 change log (#6015) [ci skip] 89bdbaa Fixup 4.0 change log (#6015) f36c9ae Add KOKKOS_ARCH_ADA89 to print_configuration 3639121 Do not define KOKKOS_ARCH_AMPERE with Ada (compute capability 8.9) 991901b add support to compile Kokkos for Ada generation (sm_89) consumer GPUs (RTX40x0) 6050076 Merge pull request #5986 from masterleinad/cherry_pick_5981 e275a77 Add support for AMDGPU target NAVI31 / RX 7900 XT(X): gfx1100 8207b2e Allow c++20 in nvcc_wrapper for nvcc 12 and above eec1a53 Allow that C++20 is passed to nvcc 1b28263 Merge pull request #6000 from Rombur/fix_memcpy 3cbd2ec Desul atomics: fix bug in `desul::Impl::numeric_limits_max<uint64_t>` value 981d9c3 Merge pull request #6017 from masterleinad/fix_sycl_device_copyable 6f16f41 Fix namespace for is_device_copyable 8270db3 Merge pull request #6003 from masterleinad/fix_team_scratch_1_queues_sycl_cuda 54da8a2 Merge pull request #6000 from Rombur/fix_memcpy 8c3d97e Merge pull request #6013 from masterleinad/cherry_pick_6012 9b6a80f Merge pull request #6012 from aprokop/fix_version 7c7ae9a desul: Move lock_array_copied from global scope (#5999) a7a2d71 SYCL: Make is_device_copyable future-proof (#6009) 4e0d9c7 CMake: update package compatibility mode when building within Trilinos 33b905b CMake: update package compatibility mode when building within Trilinos 79b824e Merge pull request #6010 from masterleinad/fix_sycl_decorated_local_pointers 904fb32 Fix warning in some user code when using std::memcpy dc876ea Merge pull request #6011 from ldh4/release-candidate-4.0.01 69bd7bd Merge pull request #5995 from masterleinad/cleanup_ompy bd69243 Merge pull request #5996 from dalg24/desul_atomics_nvcc_warning 86b70c1 Merge pull request #6001 from dalg24/desul_atomics_warning_numeric_limits_max a6f27bf Pass local_accessor directly instead 8400cbf simd: Fixed an incorrectly returning size for uint64_t in avx2 (#6004) b0cc5a0 simd: Fixed an incorrectly returning size for uint64_t in avx2 (#6004) 3fc7789 Merge pull request #5948 from dalg24/kokkos_arch_nvidia_gpu_macro b097f74 Drive-by fix typos "fix {to -> too} many" f011970 Move Cuda/Kokkos_Cuda_NvidiaGpuArchitectures.hpp -> impl/Kokkos_NvidiaGpuArchitectures.hpp a798ac7 Explain acquire_team_scratch_space c5d2c3d m_team_scratch_pool -> m_team_scratch_event 33a5d60 Fix team_scratch_1_queues for SYCL+Cuda 19a43a6 Fix warning with NVC++ 106a4a3 Fixup NVIDIA GPU arch must be defined potentially for other backends as well 762e3ce Desul atomics: Fix NVCC warning integer conversion resulted in a change of sign 48640d7 Fix compiling OpenMPTarget for AMD GPUs d5244e1 Cleanup OpenMPTaget ParallelReduce 65aa95e Merge pull request #5965 from dalg24/desul_numeric_limits_max 4fde4b0 Support --compiler-options in nvcc_wrapper be14872 Remove workaround for submit_barrier not being enqueued properly 9480cb5 Merge pull request #5962 from masterleinad/host_iterate_tile_combined_functor_reducer 70f6d34 Fix sycl.large_team_scratch_size b04b46a Merge pull request #5984 from uliegecsm/kokkos-graph-hip fc3f7fc Merge pull request #5892 from aprokop/use_std_sort_within_a_bin 65bf47c Merge pull request #5983 from masterleinad/fix_unordered_map_m_size bb5ef8f graph(hip): enable test 0eeb3a4 Merge pull request #5971 from masterleinad/fix_reducer_check_serial_hpx 3c629be Merge pull request #5774 from tcclevenger/refactor_scan_policy_tests 3cb200c Add another test case 6a8e923 Use (non-mutable) std::shared_ptr instead 74e2fe9 UnorderedMap: Ensure size() working in case of copies 260886d Merge pull request #5981 from masterleinad/fix_sycl_large_team_scratch_size 42991f1 Bit manipulation: implement `byteswap` (#5967) 22cc433 Add to HIP tests in Makefile bb8a96b Fix sycl.large_team_scratch_size ee75763 #5641: Fix HIP & CUDA MDRange reduce for sizeof(value_type) < sizeof(int) (#5745) 9786d57 Merge pull request #5977 from j8asic/patch-1 43b0245 Print Kokkos version at configuration time (#5979) 067f74a Allow c++20 in nvcc_wrapper for nvcc 12 and above b000df5 Allow that C++20 is passed to nvcc 82bd4e6 Merge pull request #5963 from masterleinad/fix_partition_master_test 05f644a Merge pull request #5966 from dalg24/cuda_bhalf_conversions_ampere_plus 9f5f762 Merge pull request #5973 from cz4rs/benchmark-add-git-info 63966c1 Merge pull request #5970 from mhalk/feature/add_support_gfx1100 00a24a4 Merge pull request #5972 from aprokop/rename_scoped_profile 3707be7 Merge pull request #5954 from masterleinad/pass_functor_analysis_to_parallel_reduce_ompt 9fe93d4 Merge pull request #5867 from akohlmey/add_cuda_ada_support b10f35e Improve macro name KOKKOS_IMPL_{ARCH_NVIDIA_GPU_AMPERE_PLUS -> NVIDIA_GPU_ARCH_SUPPORT_BHALF} b4de0ac Rename KOKKOS_{ -> IMPL_}ARCH_NVIDIA_GPU 72d39a7 Rename ScopedProfileRegion -> ScopedRegion 9798993 [ci skip] Add a comment 488ff10 Bring back git info to benchmarks output 651ba78 Merge pull request #5968 from kokkos/PhilMiller-patch-1 85ab1bc Add support for AMDGPU target NAVI31 / RX 7900 XT(X): gfx1100 42abe36 Convert OpenMPTarget ParallelScan 6e29e92 Convert OpenMPTarget ParallelReduce f670cae Let KOKKOS_ARCH_NVIDIA_GPU provide the Compute Capability a7ac045 Drop native from performance benchmark build e0eacdd Drop native from macOS build f46889d Drop native from HPX builds 0e302f6 Drop Kokkos_ARCH_NATIVE=ON because it breaks with ccache 1d26ca8 Make CUDA bhalf conversion code more forward compatible 4f18b19 Desul atomics: fix bug max uint64_t value 0b2a956 Merge pull request #5959 from aprokop/scope_guard 2b035de Use CombinedReducer in HostIterateTile 2e667d8 Fix partition_master test 9b18550 Address review comments 0f7b7eb Merge pull request #5953 from masterleinad/pass_functor_analysis_to_parallel_reduce_sycl 787f940 Merge pull request #5894 from masterleinad/pass_functor_analysis_to_parallel_reduce_threads 7fa5a75 Merge pull request #5910 from masterleinad/fix_scan_serial_cuda d7896e6 Add ParallelScanRangePolicy test bab74b0 Merge pull request #5947 from dalg24/desul_hip_rdc 543e971 Merge pull request #5958 from dalg24/fixup_openmptarget_concurrency 62fa442 Add [[nodiscard]] qualifiers 73de258 Add ScopedProfileRegion fb0b94c Fix OpenMPTarget::concurrency() 3c77f6f Also convert SYCL ParallelScan 90836d2 Convert SYCL ParallelReduce a75aa23 Merge pull request #5949 from masterleinad/pass_functor_analysis_to_parallel_reduce_openacc b2ec19d Merge pull request #5952 from dalg24/unused_work_range 51fbd42 Drop unused ParallelX::WorkRange member types 5b1a0e3 Merge pull request #5950 from dalg24/4.0-changelog 952b841 Fix Kokkos_Threads_Parallel_MDRange.hpp 6d24bc0 Update changelog to 4.0.0 4dcb294 Use KOKKOS_ARCH_NVIDIA_GPU macro in SYCL, OpenACC, and OpenMPTarget backends where appropriate 7227127 Convert OpenACC ParallelReduce f967fa9 Provide another constructor in Test16_ParallelScan 5d3bcb1 Define KOKKOS_ARCH_NVIDIA_GPU macro when targeting an NVIDIA GPU architecture fc4a9ce Merge pull request #5942 from dalg24/print_config_disabled_atomics 65a6f9a Add comments testing for non-device-callable destructors 7b598eb Fix reducer result check for Threads ParallelReduce 9a33347 Use local "reducer" variable 1bfd0cc Convert Threads ParallelReduce implementations 79f8144 Fix reducer result check for Serial+HPX ParallelReduce 659baf6 Drop DESUL_HIP_RDC compile definition 554032e Desul atomics: prefer __CLANG_RDC__ macro 7e4665d Merge pull request #5944 from dalg24/drop_kokkos_enable_rfo_prefetch_macro ee2ddae Drop KOKKOS_ENABLE_RFO_PREFETCH macro 40c40a7 Convert OpenMP ParallelReduce (#5893) 5c5ac72 Tell when Kokkos atomics are disabled in print_configuration d9fc6cb Merge pull request #5940 from dalg24/drop_kokkos_enable_atomics_macros 4bf2c5c RangePolicyRequire was not using require e98766b Merge pull request #5936 from dalg24/drop_kokkos_arch_turing_macro e69b796 Remove mention of the KOKKOS_ENABLE_*_ATOMICS macros in <Kokkos_Macros.hpp> header 6d10edc Drop KOKKOS_ENABLE_CUDA_ASM* macros aafe20c Drop `KOKKOS_ENABLE_*_ATOMICS` macros when printing configuration 08dc180 Merge pull request #5923 from dalg24/drop_kokkos_memory_order d303e40 Merge pull request #5935 from PhilMiller/intel-macro-cleanup 1528cd4 Merge pull request #5932 from shaomeng/improve_vector 32868fa Merge pull request #5931 from tcclevenger/cleanup_unit_test_cmake 537f62e Do not define KOKKOS_ARCH_TURING macro with generated GNU makefiles 6dd4800 Add KOKKOS_ARCH_ADA89 to print_configuration 0e99902 Do not define KOKKOS_ARCH_AMPERE with Ada (compute capability 8.9) 61620e8 Revert "Revert "Fix intel hang"" f419b73 Add missing <atomic> header include b132b9b add cbegin() and cend() to Kokkos::Vector 2bbe1df Cleanup unit_test/CMakeLists.txt 5f8d0e3 Update clang-format CI build (#5930) 8b80bd0 Merge pull request #5925 from dalg24/kokkos_hip_architectures 310812b Remove extra double quote in CUDA and HIP allocation error messages (#5926) b9e423e Export Kokkos_HIP_ARCHITECTURES variable with CMake 569a609 Export `Kokkos_CUDA_ARCHITECTURES` variable with CMake (#5919) 1abf653 Drop Kokkos memory oder classes 3bcf389 Use directly memory order from desul in Impl:: atomic funtion templates 2a7629d Prefer non Impl:: atomic_{load,store} in AtomicDataElement since using relaxed memory order 416d7b7 New OpenACC backend implementation for parallel_scan with a range policy (#5876) 1cf8907 Use std::sort for sorting within a bin when possible db890c9 Add test case f93e48a Don't call the functor's destructor on the device for Serial and Cuda d177f61 Merge pull request #5918 from PhilMiller/intel-macro-cleanup 12708a1 Use insertion sort for sort within a bin in BinSort (#5890) 90286ca Merge pull request #5911 from masterleinad/pass_functor_analysis_to_parallel_reduce_hip 33e5ef6 Revert "Fix intel hang" 54e4396 containers: Remove workaround for Intel older than the required 19.0.5 and GCC < 5 6a3b1d6 algorithms: Remove workaround for Intel older than the required 19.0.5 1d08f6f Merge pull request #5915 from dalg24/drop_host_lock_arrays 4f871be Convert HIP ParallelScan 70a0af5 Convert HIP ParallelReduce cba99e8 Remove misplaced and commented host lock array code in OpenMPTarget backend 63879db Drop host lock array b4655f9 Drop (unused) HBW lock array c5fe10e Merge pull request #5817 from dalg24/drop_kokkos_lock_arrays 3c06ffe Merge pull request #5907 from dalg24/bit_rotate fcdedf7 Do not bother with sycl::rotate 771c956 Merge pull request #5895 from masterleinad/pass_functor_analysis_to_parallel_reduce_hpx bc1138f Merge pull request #5884 from rbberger/amd_rocm_hpcbind a2181fc Merge pull request #5901 from etiennemlb/fix/cmake-deduplication-issue 0691619 Convert HPX ParallelReduce ba19572 Use CombinedFunctorReducerType in ParallelReduce (#5874) 22ee14e Implement `rot{l,r}` function templates 4ec9fb6 Add AMD ROCm support to hpcbind 0f51821 Merge pull request #5905 from crtrott/fix_msvc_cuda 8921317 Silence unused parameter warning 66e1437 Apply clang-format c74aa41 Split math function test further, to work around compilation issue with MSVC/CUDA 43ec33e Work around a bug in MSVC/CUDA in a function. 4075009 Work around a failing CTAD occurance on MSVC/CUDA 47844ce Fix more rank style changes in MSVC/CUDA build 5b9f300 Fix another error with MSVC where we need to use rank() e53f224 Cleanup prefer {traits:: -> }rank[_dynamic] fb3d754 Merge pull request #4577 from dalg24/bit_manip e146fc9 Merge pull request #5870 from dalg24/view_rank_member_function 75a3e80 Disable uchar test to work around broken sycl::ctz on NVIDIA GPUs b166d77 Add `Experimental::*_builtin` counterpart to the bit manipulation template functions c256a98 Merge pull request #5881 from msimberg/update-hpx-print-configuration dff272f Fix CMake deduplication issue when linking with hip::device c4a5ad0 Update HPX::print_configuration b3a8182 Backport function templates from <bit> standard library header 03aae9a Merge branch 'develop' into view_rank_member_function 3be7ae2 Add compile-only test for View::rank[_dynamic] 948c6c6 Merge pull request #5620 from cz4rs/core-perf-tests-benchmark-conversion af89aa7 Merge pull request #5878 from masterleinad/aligned_subview 25ff05b Fix warning pointless comparison of unsigned integer with zero 3d2dc6a Merge pull request #5887 from msimberg/nvhpc-version-macro-more-digits 2969679 Fix MSVC CI build d3eac2b Cleanup prefer {traits:: -> }rank[_dynamic] 60ba1e1 Add one more digit for KOKKOS_COMPILER_NVHPC version components 4ca0340 Add comment in test c4b81ec Try fixing Cuda 11 CI 86bbae3 Deprecate subview overload taking a template argument for MemoryTraits 0b94343 MemoryTraits::value -> MemoryTraits::impl_value 6e36acf Add comment in test c43e45e Remove Aligned memory trait when creating subviews 4286774 Fix warning comparison of integers of different signs a7daa59 Fix printing extents and rank in error message when copying views 10fae1f Fixup update Kokkos::rank(View) free function and drop outdated comment 314b966 Add View::rank[_dynamic] static constexpr data members 2e53f1c Add Impl::integral_constant 15989dd Merge pull request #5882 from dalg24/deprecate_view_rank_uppercase_r e348b69 Deprecate View::Rank 05416c9 View::{R -> r}ank in perf tests 8487a96 View::{R -> r}ank in unit tests 2840e8d View::{R -> r}ank in algorithms and containers 9fb2bbc Prefer View::{R -> r}ank 2b532d1 Fix cache configuration in CI (#5871) d39885a Merge pull request #5873 from masterleinad/fix_version_macro_develop b6cdada Also test the KOKKOS_VERSION_{LESS,GREATER,EQUAL} 3175011 Add compile-only test to make sure version macros are defined 1d228fa Fix version macros 2caf641 add support to compile Kokkos for Ada generation (sm_89) consumer GPUs (RTX40x0) 2272d3b Merge pull request #5865 from msimberg/hpx-concurrency-non-static-member-function b6c49a9 Make HPX::concurrency() a non-static member function b9d405a Fix unused function warning (SYCL) d25b94b Remove unused variable 204b085 Remove obsolete warning pragmas b4bd01d Use double quotes instead of <angled> include eb18f1d Port Atomic tests 6ab2791 Clean up perf_test CMakeLists 1b9a67f Port Mempool performance test aa20b2b Avoid multiple `main()` definitions ab55654 Disable unsupported benchmarks in OpenMPTarget e3324b3 Port ExecSpacePartitionig tests a45165d Merge pull request #5861 from msimberg/hpx-header-to-subdir 4da9dd9 Move Kokkos_HPX.hpp header into HPX subdirectory cd8e67f Merge pull request #5857 from dalg24/rm_unsused_files cd107dd Merge pull request #5856 from dalg24/destruct_delete 36bc91e Port GramSchmidt tests 62b8421 Remove duplicated helper 90b71cb Use correct license headers b6c619a Add missing tests to Atomic minmax benchmark e250ce3 Move command line helpers implementation into a header 7dd33f8 Remove ported benchmarks from Makefile e0b5846 Measure only allocation time 5534a8f Remove redundant include 076d931 Use named constants 924600b Reduce repetition in ViewFill benchmarks b1a3135 Reduce repetition in ViewResize benchmarks 25876cf Port Custom Reduction tests 5635e13 Use common helper for reporting results 372d03e Fix units - Fill 4b8e0e1 Port Atomic MinMax tests 063fe9a Port HexGrad tests 7c9f640 Port ViewAllocate tests 66e53a9 Remove redundant include 9126797 Clean-up Benchmark_Context and hide implementation details 1b2d07a Port ViewResize tests 5235c89 Port ViewFill performance tests bbde3b1 Remove pointless dummy source file in core 3448260 Remove unused impl/CMakeLists.txt 8b19e2d Drop (unused) Impl::destruct_delete utility d8d9c58 Check Kokkos::num_threads and device_id in tests a4af6f7 Add Kokkos::num_threads() and Kokkos::device_id() 2aa2576 Dispatch Kokkos::sort(Kokkos::View) to SYCL oneDPL (#5229) 6f12ca2 Merge pull request #5852 from rgayatri23/OpenMPTarget_intel_pvc_edits e7aeb9b Merge pull request #5816 from dalg24/tpetra_atomics_max_abs fa54c97 Merge pull request #5850 from crtrott/no-deprecated-3-in-makefile 446532e Update core/unit_test/TestNumericTraits.hpp 86a4427 Drop (deprecated) KokkosCore_UnitTest_DefaultDeviceTypeInit_* from the makefile cfb7b2f Merge pull request #5854 from dalg24/house_keeping 14f9425 OpenMPTarget: Replace KOKKOS_ARCH_INTEL with KOKKOS_COMPILER_INTEL to protect declare target on Intel GPUs. 34a21cb Merge pull request #5847 from dalg24/fixup_omp_thread_pool_size 387de48 Move { -> Threads/}Kokkos_Threads.hpp be83e9a Move { -> Serial/}Kokkos_Serial.hpp 7436256 Move { -> Cuda/}Kokkos_Cuda[Space].hpp 1d8dd90 OpenMPTarget: Enable declare target for all Intel GPUs. 8b2bf33 Merge pull request #5849 from dalg24/hpx_asyn_dispatch_warning f2ec98d Fix clang+cuda compiler warning about cudaDeviceSynchronize (#5846) 4c878a0 OpenMPTarget: Adding declare target for constexpr variables. 568bc2c Don't enable deprecated code 3 in Makefile builds anymore c005e60 Pass *this to in_parallel in OpenMP::impl_thread_pool_size() f68098b Fix CMake warning when HPX is not enabled cba11a1 Merge pull request #5841 from dalg24/desul_atomics_source_files 5bb7e0a Fixup deprecated code 3 code path OpenMP::impl_thread_pool_size 5ea96bc Update HPX backend to use HPX's sender/receiver functionality (#5628) 97ad51b Fix unused parameter warning in SYCL lock array and add comment 879d607 Make OpenMP::concurrency and impl_thread_pool_size non-static (#5836) 46185fe Merge pull request #5840 from dalg24/nvhpc_arch_native 153aa59 Merge pull request #5838 from dalg24/typo_deprecared 41166e1 Merge pull request #5833 from masterleinad/sycl_device_global_static_only 43ccea6 Desul atomics: Drop `DESUL_HAVE_{GPU_LIKE,FORWARD}_PROGRESS` macros 1d19328 Desul atomics: SYCL lock arrays out of sync 37bcd41 Desul atomics: cleanup macro guards in CUDA/HIP lock guard files 23e2d85 Desul atomics: conditionally append the CUDA/HIP/SYCL source files 93487cf Fix flag passed to NVHPC when `Kokkos_ARCH_NATIVE` is `ON` ccbfb00 Set native flags according to CMAKE_SYSTEM_PROCESSOR (#5831) b8603a7 Fixup typo `#ifdef KOKKOS_ENABLE_DEPRECA{R -> T}ED_CODE_3` c10edf3 Skip Tpetra reproducer with NVHPC compiler f9f1808 Merge pull request #5834 from masterleinad/fix_unprefixed_macros_kokkos_host_mdpsan a62aa40 Refactor OpenMPTarget backend (#5726) f3d9efb Fix unprefixed macros on KokkosExp_Host_IterateTile.hpp dac21c7 Add non-standard `rsqrt` math function (#5644) 073ce8b Try using oneAPI 2023.0.0 in SYCL+Cuda CI (#5813) b477f99 Merge pull request #5832 from PhilMiller/fix-crs-define d41a6df HIP: Drop obsolete macro definition 87535d8 ViewLayoutTiled: Be scrupulous about macro naming and undefining f4c8f8d OpenMPTarget: Be scrupulous about macro naming and undefining ae585b7 CUDA: Fix up comment fbceafd CUDA: Convert simple value macro to constexpr 71e0eca CRS: Use Kokkos device function macros rather than duplicating code when compiling for GPU targets ba4ebc4 Restrict KOKKOS_IMPL_SYCL_DEVICE_GLOBAL_SUPPORTED feature macro detection to static libraries 52586ef Merge pull request #5825 from dalg24/device_ptr_to_lock_array_in_constant_memory 0130a3f Initial OpenACC parallel_reduce implementation for Team policy (#5610) 59067d4 Use raw literal string to avoid having to escape characters in git commit message (#5823) 333157f Merge pull request #5742 from rgayatri23/OpenMP_regression_fix 8103d82 SIMD backend of ARM NEON (#5775) fb7d9f2 SYCL: Pass Xsycl-target-backend* only to the linker (#5705) 04e3437 Further update to CUDA occupancy calculation (#5739) a564953 Desul atomics: let pointer to the device lock arrays (HIP and CUDA) be in constant memory without RDC as well 22380c7 Merge pull request #5819 from dalg24/deprecate_kokkos_active_execution_memory_space_macros 92895ff Merge pull request #5818 from masterleinad/fix_all_t_deprecations e8381d8 Add TODO comment to replace fully-qualified name when possible ecd23e4 Spell out Kokkos::ALL_t to avoid deprecation warnings 789dfa7 Merge pull request #5821 from masterleinad/fix_sycl_ci_device_global 9d7257a Fixup turns out Tpetra "abs max" operation does not preserve the sign 2e1a559 Merge pull request #5820 from crtrott/fix-intel-ice-dev eabd0e4 Disable global device variables in SYCL+Cuda CI cd8eb9c Remove Cuda and HIP lock arrays altogether f78d87a Unwire initializing/finalizing Kokkos lock arrays bd86fe9 Change `#ifdef KOKKOS_ENABLE_DEPRECATED_CODE_{4 -> 3}` 2b5c31a Intel ICE Sacado: turn off support for nested OpenMP with ICPC 6701772 Intel ICE Sacado: use new HostIterateTile API in OpenMP 6688cad Intel ICE Sacado: use new HostIterateTile API in HPX b98e824 Intel ICE Sacado: use new HostIterateTile API in Threads 80c770d Intel ICE Sacado: use new HostIterateTile API in Serial 6935f70 Intel ICE Sacado: rewrite HostIterateTile a6a0237 Deprecate `KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_*` macros 60f80e5 Merge pull request #5613 from masterleinad/sycl_extended_atomics 2f07a04 Fix initial value (identity element) for max abs 258bac6 Add unit test capturing Tpetra custom atomics use case 7cccd74 Merge pull request #5707 from masterleinad/sycl_update_2023 d63af25 Merge pull request #5814 from dalg24/scratch_locks 5d93865 Break lock array dependence of Cuda and HIP teams impl 5d87aa9 Merge pull request #5811 from dalg24/rm_desul_atomic_helper 8b80616 Merge pull request #5810 from masterleinad/move_sycl_headers 61d8569 Update Dockerfile used for SYCL+Cuda CI 05d008d Address deprecations in oneAPI 2023.0.0 b5b0504 Update minimal compiler requirements for SYCL 0180ff5 Update architecture flags for SYCL+Cuda b0be8e6 Disable tests failing with SYCL+Cuda after update to oneAPI 2023.0.0 3369267 Merge pull request #5800 from masterleinad/improve_comment_test_team 7a13414 Merge pull request #5767 from masterleinad/fix_scratch_again 84a336a Merge pull request #5807 from dalg24/all_t adb3141 Drop desul_* helper functions in tasking 94d9c9e Merge pull request #5804 from dalg24/purge_legacy_atomics ddefe61 Issue warnings when using Kokkos::Impl::ALL_t 236e892 Fixup GH Actions compiler warnings (#5780) 6d90db3 Move all SYCL headers into SYCL directory 05f6a9a Per review dropped superfluous const-qualifiers 4519e4c Drop anonymous namespace around definitions of ALL, WithoutInitializing, and AllowPadding e91f7e8 Guard using-declaration in Impl:: namespace with #ifdef KOKKOS_ENABLE_DEPRECATED_CODE_4 5304a40 Stay off Kokkos::Impl::ALL_t 7869915 Move Kokkos::{Impl:: -> }::ALL_t definition and add using-declaration in Impl:: namespace for backward compatibility 1668cf4 Merge pull request #5802 from ibaned/avx512-mask-fix aeab5bd Merge pull request #5805 from dalg24/fixup_rocm54_force_global_launch_launch d745c31 Fixup deleted wrong branch in HIP locks 796e964 Drop `KOKKOS_ENABLE_IMPL_DESUL_ATOMICS` macro define altogether 7f5ea60 Update diff_files (might be worth revisiting logic) 52953c8 Remove a whole bunch of Kokkos leagacy atomics headers 44140f7 Get rid of #ifdef KOKKOS_ENABLE_IMPL_DESUL_ATOMICS in unit tests c3fe1d6 Purge macro guards for desul atomics being enabled or not c54547e Fixup ROCm 5.4 ImplForceGlobalLaunch{Launch -> }_t typo in unit tests 153b4c1 remove const_cast with some code duplication f253bc4 Print KOKKOS_IMPL_SYCL_DEVICE_GLOBAL_SUPPORTED in print_configuration 4c03c8d KOKKOS_SYCL_DEVICE_GLOBAL_SUPPORTED->KOKKOS_IMPL_SYCL_DEVICE_GLOBAL_SUPPORTED 05cb3f5 Purge logic around desul atomics being enabled at configuration time cb67caf Warn at configuration time if attempting to disable desul atomics and force using it (#5801) 5212d90 Fix a bug in AVX512 simd_mask::operator[] aa0f81e Replace HIP_LOCK_ARRAYS macros by functions (#5770) a75c613 Merge pull request #5796 from Rombur/force_global_launch 53cc297 Improve comments in TestTeam.hpp e4b3c82 SYCL: Add support for arbitrary size atomics 5b3b6e7 Rename ImplForceGlobal to ImplForceGlobalLaunch de37fc2 Merge pull request #5784 from masterleinad/drop_KOKKOS_IMPL_WORKAROUND_INTEL_LLVM_DEFAULT_FLOATING_POINT_MODEL d50bdd0 Merge pull request #5797 from cz4rs/container-options cf6d43d Merge pull request #5786 from dalg24/cleanup_rm_eliminate_warning_for_lock_array 478f087 Fix typo b99fb31 Use GTEST_SKIP to skip test c9929fc Merge pull request #5795 from dalg24/reduction_identity_char 0f8b7ca Skip test and add comment explaining why 2902035 Fix tests when using ROCm 5.3 d7aa278 Remove obsolete container configuration 13c4de2 Merge pull request #5793 from dalg24/fixup_jenkins_gnu_generated_makefile cf67ab4 Force GlobalMemory launch for some Bessel tests when using ROCm 5.4 f9d9505 Add parameter to force using GlobaLMemory launch mechanism using HIP 2e6c238 Drop KOKKOS_IMPL_WORKAROUND_INTEL_LLVM_DEFAULT_FLOATING_POINT_MODEL e8c08e2 Fix sycl.scratch_align test de26b23 Add missing ReductionIdentity<char> specialization 2bde8fc Merge pull request #5792 from masterleinad/improve_assert_macros 24ef794 Fixup warning in Jenkins CI build with GNU generated makefile d2a73f9 Merge pull request #5791 from dalg24/dead_omp_test_source_file 73b4ca8 Prefer ASSERT_EQ over ASSERT_TRUE with == aa7865e Remove unused OpenMPTarget test source file 7475b89 Remove dead OpenMP test source file c304818 Merge pull request #5755 from Rombur/hip-fix-global-launch 9f09e2b Drop unused Kokkos::Impl::eliminate_warning_for_lock_array CUDA/HIP functions 7f08b95 Desul atomics cleanup remove unused Impl::eliminate_warning_for_lock_array() 6e73a35 Merge pull request #5785 from masterleinad/replace_sprintf 0e2fda8 Merge pull request #5642 from cz4rs/enable-flang 20b609a sprintf -> snprintf b5bd709 Merge pull request #5779 from cz4rs/upgrade-github-actions 4bd3e85 Upgrade GitHub actions 7652228 Use `flang-new` for Fedora builds 48e0874 Merge pull request #5777 from junghans/patch-5 619ed2d Fix build on Fedora rawhise 910d43e OpenMP: Adding an ifdef around chunksize for static schedule for GCC compiler. 728e3d3 Merge pull request #5762 from masterleinad/fix_scratch_space_for_sycl 0db3bd8 Fix a typo 4829fb2 Add a mutex to protect scratchFunctor 8f4f31d Merge pull request #5764 from dalg24/desul_atomics_config ba0ad25 Merge pull request #5765 from ldh4/hpx_team_reduce_sfinae 7a3bfe0 Fix macro typo used in the OpenACC backend parallel_reduce(MDRange). (#5766) 97287f6 Remove unnecessary header 9f24f55 Merge pull request #5763 from masterleinad/fix_openmp_with_deprecated_code_3 20abee9 Let increment be of type uintptr_t fixing warning 1758196 Generate <desul/atomics/Config.hpp> file from the generated Makefiles 51aa904 Desul atomics configure library based what the user enabled 45acff3 Fix reviewers' comments a9c997c Fix ScratchSpace pointer comparison for SYCL aad8792 Merge pull request #5757 from dalg24/desul_atomics_drop_cuda_arch_macro_guards 02941a0 Merge pull request #5760 from dalg24/desul_atomics_gnu_and_msvc 7f883bc Merge pull request #5756 from dalg24/desul_atomics_sycl_macro e5e8742 Added missing enable_ifs to hpx team parallel_reduce 33d7fce Fix compiling with OpenMP and Kokkos_ENABLE_DEPRECATED_CODE_3 1f68ab4 Desul atomics cleanup enable GCC or MSVC atomics cd0b631 Encapsulate staging inside scratch_functor c6d7662 Merge pull request #5759 from dalg24/cmake_package_version_compatibility 49b00de CMake: change package COMPATIBILITY mode {SameMajorVersion -> AnyNewerVersion} 0986a3a Desul atomics: drop unnecessary macro guard that checks for__CUDA_ARCH__ in PTX assembly code 0e3848f Desul atomics: drop unnecessary macro guard that checks for__CUDA_ARCH__ in compare exchange 46aae0f Desul atomics fixup detect use of SYCL 989d996 Merge pull request #5751 from masterleinad/update_kokkos_version_develop 296de12 Return host functor instead of device one 487deee Apply clang-format cde661d Update Kokkos version on develop b475233 Merge pull request #5722 from dalg24/openacc_parallel_reduce_mdrange cf4358e Add more comments d0d6404 Merge pull request #5747 from dalg24/fixup_omp_makefile 00ab763 Fixup forgot to add new OpenMP source file in Makefile a84c7a5 Merge pull request #5741 from ndellingwood/update-testallsandia 57504c4 Merge pull request #5698 from masterleinad/static_assert_reducer 761ffda Fix HIP Global Launch with HSA_XNACK=1 dafb577 Merge pull request #5738 from Rombur/refactor_openmp 459e881 Merge pull request #5740 from seyonglee/openacc_cmake_make_bugfix cf04bb5 [ci skip] update test_all_sandia 74a7988 Minor bug fixes on CMake and Make configurations for the OpenACC backend. fb47be7 Merge pull request #5730 from tkordenbrock/tkordenbrock/fix-DynamicView-deep_copy-dp-sp fbfa01e Move OpenMP UniqueToken to its own file 2f7e94a Move OpenMP functions out of Kokkos_OpenMP_Instance.hpp f92270b Move part of Kokkos_OpenMP_Instance.cpp into Kokkos_OpenMP.cpp 48e8692 Move Kokkos_OpenMP.hpp to OpenMP/Kokkos_OpenMP.hpp 5d136cc Static asserts for reducers d2e574c Apply clang-format 77d57d2 Merge pull request #5731 from dalg24/cleanup_cuda_blocksize_deduction 3697d45 Merge pull request #5735 from crtrott/remove-kokkos-cxx-standard-from-buildmd-develop e0ebaa5 Merge pull request #5733 from ndellingwood/fix-intel19-werror edfb1e3 Fix -Werror with intel/19 6aa7bf6 Remove KOKKOS_CXX_STANDARD mentioning from BUILD.md 67dff62 fix broken DynamicView test case #4 1f4468b fix src/dst Properties in deep_copy(DynamicView,View) d4bd012 Revert "Drop pre CUDA 11 macro guards in occupancy calculation" 1fd8589 Drop now unsused `get_shmem_per_sm_prefer_l1` function d34c751 Drop pre CUDA 11 macro guards in occupancy calculation 4954ce2 Merge pull request #5689 from cz4rs/performance-results-visualization a23580e Temporarily disable unsupported reduction tests in core/unit_test/incremental/Test14_MDRangeReduce.hpp for the OpenACC backend. 7e651ca Group similar options together ef7fd60 Configure `ccache` for benchmark builds 1134a1f Simplify Kokkos configuration 64d9b44 Use maximum available level of build parallelism 9fd7187 Use correct GitHub access token d179453 Use correct branch for destination repo 9fbd78a Configure `ccache` correctly 9018621 Initial implementation of MDRange parallel_reduce c6fae3f Move definitions of `OpenACCIterate{Left,Right}` and `OpenACCMDRange{Begin,End,Tile}` 604dc86 Remove commented out code 327aac5 Add comment for PerformanceTest_* executables 3a1769b Build on pull request 176ae8b Use double quotes instead of <angled> include 67a92d3 Do not build tests and examples 92906bf Remove security options 2e09341 Use separate .yml file for benchmarking 07b01ef Use correct header guards git-subtree-dir: tpls/kokkos git-subtree-split: 1a3ea28
25a31f881 Merge pull request #1877 from ndellingwood/master b6a2db921 Update master_history.txt 14ad220a9 Merge branch 'release-candidate-4.1.00' for 4.1.00 1592d9ed9 Merge pull request #1874 from ndellingwood/fix-compatibility-kokkos-4.0 9620913d1 Merge pull request #1873 from kokkos/update-changelog-4.1.00 9e9351bd1 CHANGELOG: small updates a3c07dfad CHANGELOG: organizing enhancements section 2579c4e3c CHANGELOG: reorganizing the new features section c1176142b Update changelog for 4.1.00 a0d99bf69 Merge pull request #1868 from lucbv/MKL_INT 7871bd233 Merge pull request #1867 from bartlettroscoe/tril-11966-bad-batched-incl-dir e624a7d3b Update to version 4.1.00 af312b9a0 Merge pull request #1850 from e10harvey/issue1764 340895119 Merge pull request #1865 from ndellingwood/update-testall ec4a4cb09 Merge pull request #1864 from vqd8a/streams-tests-fix-small-numthreads 77745756f Add tests for nstreams=1 98eb68eda Merge branch 'develop' into streams-tests-fix-small-numthreads 4dbb5838e Check concurrency with nstream instead c62d07442 cm_test_all_sandia: updates for blake cec953f37 Merge pull request #1861 from cwpearson/fix/rocm-5.2.0-hang-quick 22b5f4ef1 Merge pull request #1862 from e10harvey/workaround_gnu_bug_81429 03998f350 Merge branch 'develop' into streams-tests-fix-small-numthreads b2581bb2d Apply clang format ba75b4b58 Remove redundant file 6a71179ab Restore orig. KokkosSparse_BsrMatrix.hpp 71f04ce8a Workaround checking OMP_NUM_THREADS with number of streams f75ec31ce sparse/src: Add ifdef for doxgen < v1.9.7 ce8bb989f Benchmark cleanup for par_ilut and spmv (#1853) 6d79eaf5d sparse/src: Work around gnu compiler bug 478a56b53 use host pointer mode in rocBLAS scal 232b5bdac Merge pull request #1814 from e10harvey/issue1804 8b3c95135 Merge pull request #1856 from e10harvey/enable_sphinx_werror 8fae08018 Merge pull request #1783 from e10harvey/batched_gemm_eti 7865e88ac Merge pull request #1857 from e10harvey/issue1673 8b62c3851 Merge pull request #1855 from ndellingwood/issue-1749 eb92728a6 batched/unit_test: Optionally skip simd dcomplex4 558dbe4a9 docs: Update trmm. Add trtri. 24d259b0d docs: Fix blas rst files dec2bcb8d Remove TestDeviceType c5b2305aa docs: Enable sphinx -werror 07dc82a8d docs: Fix sphinx warnings d88ad3523 sparse: Various doxygen fixes 9d723f6fe batched/dense: Add gesv DynRankView runtime checks a907ca594 Merge pull request #1854 from ndellingwood/patch-match-trilinos-11921 87a384657 Address PR feedback fea22d883 Revert ".github/workflows: Print out arch in osx CI" 341a4779f Revert ".github/workflows: Print out arch in osx CI" 91c0b606a Revert ".github/workflows: Print out arch in osx CI" 0f54c3da9 Merge pull request #1852 from e10harvey/docs_parilut_handle_fix 48d67ff62 CMakeLists.txt: Add all_libs alias a8884845a CMakeLists.txt: Add alias to match what is exported from Trilinos 127c28198 Remove non-existant subdir kokkos-kernels/common/common (#11921, #11863) d7c9a0771 docs/developer: Add Experimental namespace fa2bdef62 Merge pull request #1843 from e10harvey/docs_compiler_profiling b43d47557 Merge pull request #1844 from bartlettroscoe/remove-nonexistant-incl-dir b3328390e KokkosKernels: Remove non-existent common/src/[impl,tpls] include dirs (trilinos/Trilinos#11545) ef98cb76a Merge pull request #1848 from e10harvey/fix_typos 4b3bab673 Merge pull request #1849 from ndellingwood/update-cmake-option-naming 5b369abef Update cmake option naming in docs/comments 723ab23aa blas/tpls: Fix gemm include guard typo c5302a1ca docs: Add profiling for compile times ac60cd4e2 Merge pull request #1841 from cwpearson/fix/spot-check-tpls-rocm 9292be86d batched/dense/impl: Fix headers 407e31a99 Merge pull request #1835 from dalg24/cuda_uvm 3a1ea766b batched/dense: cleanup gemm handle 5ece26d89 batched/dense: cleanup and move ETI into spec file 90c8a5ed1 batched/eti: Use Trans from KokkosBlas 48d647966 cmake: Fix batched eti args 557002b53 batched/CMakeLists.txt: ETI valid args only d55ba7bf3 batched/dense/unit_test: Add TEST SKIPPED prints 1c256b1a3 batched: fix eti avail and wrapper 033a75e27 Merge pull request #1820 from vqd8a/sptrsv-solve-streams 940217b31 Merge pull request #1840 from ndellingwood/update-caraway-queues c0349db3a .github/workflows: Print out arch in osx CI 611641996 .github/workflows: Print out arch in osx CI 9d4de5dbe add rocblas and rocsparse to --spot-check-tpls 9ad25c9b9 batched: note that tpl struct is unused f663066d6 batched: Remove empty decl ETI files 721f388f9 batched: Populate avail eti files d55fb1054 .github/workflows: Print out arch in osx CI f64d6361a batched/dense: Add HostLevel Gemm unification layer 62b863de5 batched/dense/impl: Remove forward decls dca6ee561 batched/dense/src: Add KokkosBatched_HostLevel_Gemm.hpp 40d76ebc6 perf_test/blas/blas3: Add compile-time checks for BatchLayout 57bfb3f0b batched/dense/unit_test: Run tests if ETI_ONLY is disabled 7ad0ede54 Start moving into HostLevel headers 813d02967 minor cleanup 60ddbb25a Fix constexpr branch 7b6073bb9 batched/eti: ETI host-level interfaces 237597a00 cm_test_all_sandia: update to add caraway queues for MI210, MI250 3917bd320 Merge pull request #1821 from lucbv/spmv_benchmark 82d93a25c Support rocSparse in rocm 5.2.0 (#1833) 5070d87b5 Merge pull request #1824 from e10harvey/issue1823 5ea1c3c32 Update perf_test/sparse/KokkosSparse_spmv_benchmark.cpp 2b3a070c1 applying clang-format 1a69ed2ae SpMV benchmark: adding logic for spmv algorithm 29c24f2bd SpMV: applying clang-format to benchmark e3b6eb19e SpMV: adding logic in benchmark to chose algorithm to test. 09dc9ff27 SpMV: applying clang-format to benchmark source file f75527cd6 SpMV: adding benchmark for spmv 7df961ef9 Merge pull request #1836 from dalg24/cleanup_kokkos_enable_pthread 49b0c491d Merge pull request #1834 from dalg24/remove_dead_code 08f4a4613 Merge pull request #1828 from ndellingwood/fix-cusparse-version-check c74db8cc0 Merge pull request #1826 from brian-kelley/FixRhelNightly 058f099e1 Merge pull request #1827 from lucbv/Kokkos_ALL_t 6f26e1527 Drop outdated workarounds for backward compatibility with now unsupported Kokkos versions e329be8dc Do not bother querying the value of Kokkos_ENABLE_CUDA_UVM 3273a031b Do not adjust KokkosKernels_INST_MEMSPACE_CUDA[UVM]SPACE default value ebd1406fb Remove dead code guarded by `#ifdef KOKKOSKERNELS_INST_MEMSPACE_CUDAHOSTPINNEDSPACE` abe8558b1 Remove remaining decl.hpp files b1e22208f Remove includes of decl.hpp files ad541587d sparse/eti: Remove unused decl.hpp.in files 2f0ce87ca Merge pull request #1830 from ndellingwood/weaver-update 4a8667228 scripts/cm_test_all_sandia: Update cuda11 modules 09a4820b3 cm_test_all_sandia: updates for weaver d0f4a9ca0 Merge branch 'develop' into sptrsv-solve-streams ea3321c2f Apply clang format bf498cd4a Remove unnecessary code b3ef19c74 Applying clang-format 28e813086 Sparse: fixing a few issues related to coo2csr and par_ilut benchmark f30291cd1 spmv cusparse version check modified for cuda/11.1 1424f8aef Kokkos 4 compatibility: modifying the preprocessor logic 990d7db76 Fix errors and warnings in sems-rhel nighly 2bb633d46 .github/workflows: Summarize github-DOCS errors and warnings 69d0a8b5b Add BsrMatrix SpMV in rocSparse TPL, rewrite BsrMatrix SpMV unit tests (#1769) 63eab04f5 Merge pull request #1819 from ndellingwood/fix-rocblas-build-2 86784956b Merge pull request #1816 from cwpearson/ci/KokkosKernels_PullRequest_VEGA908_Tpls_ROCM520 7c9e7b433 Merge pull request #1822 from lucbv/ger_doc 3794a36be Merge pull request #1818 from jgfouca/jgfouca/par_ilut_perf_test_refactor2 af4688919 Ger: adding documentation stubs in apidocs 5b1c1f4fa Remove unused variable 19333668c Merge branch 'develop' into sptrsv-solve-streams 89d67ff14 Apply clang format 924cdee42 Add unit test for sptrsv via streams 787c711bb Merge pull request #1686 from e10harvey/coo2crs 725b46b89 apply clang-format b8a22cc6c blas: fixups for ger exec space instances 146ce522f blas: various rocblas execspace fixes 4f1abd794 apply clang-format 954750d0c rocblas tpl spec: add missing comma separating vars in some macros 42ef78393 Merge pull request #1756 from eeprude/ger2 6e80b37f9 formatting b60e681da Reorganize par_ilut performance test bf06fef9a Merge pull request #1810 from ndellingwood/fix-rocblas-build 28a0421c2 Merge pull request #1812 from lucbv/blas2_3_on_stream 98c6509eb Merge pull request #1817 from bartlettroscoe/tril-11545-kokkos-no-subpks-develop ab0f774cd Workaround for #1777 - cusparse spgemm test hang (#1811) 6c514ff1c Merge pull request #1813 from ndellingwood/update-changelog-4.0.01 4e6c85c39 Docs: adding stubs for trsm and trmm and updating gemv and gemm cd242ba2f New performance test for par_ilut, ginkgo::par_ilut, and spill (#1799) 0ba9eaa3a Manually remove redundant Kokkos dep (#11545) 099d05784 Run script remove_kokkos_subpackages_from_trilinos_packages_r.sh (#11545) 9d95d49d1 only enable KokkosBlas gesv test for CUDA+MAGMA and HOST+BLAS ff664866d cm_test_all_sandia: load openblas/0.3.20/rocm/5.2.0 for TPL spot check on caraway 28254863f Apply clang format 166716a87 No need to fence after each level 415deb091 Update changelog 1ae83cf16 Update changelog 4fc4831fb Update changelog 2f78417b7 Some changes in sptrsv_solve_streams for cuSPARSE < 11.3 5d027ccec Add sptrsv_solve_streams for cuSPARSE < 11.3 db991036e BLAS2/3: applying clang-format c00c8a6e3 BLAS2/3: fixing some TPLs issues with execution space code path a725974a3 Minor fixes for sptrsv cuSPARSE 1331baf11 Merge pull request #1808 from ndellingwood/master e65f61147 sparse/unit_test: Use host mirror of RandCsMatrix map 005530354 Minor compilation error. Thanks to Luc for the proper suggestion. 19903279d Formatting 7720c8199 Added explanations 0e26dd1a1 Tests passing now at blake 99cbf779d Possible corrections for test on blake 31f2b0555 Fix name mismatch with rocblas tpl spec layer 5a5a2946c sparse: Encapsulate CooMatrix. Cleanup coo2crs TODO. c208dacae Update master_history.txt 8809e41ca Update to version 4.0.1 9cee1a3d7 sparse/unit_test: Check last entry of col_map. Improve readability. 946f29a63 Merge branch 'develop' into sptrsv-solve-streams 29034f31a Minor changes to match L solve and U solve implementations 311157f62 Merge pull request #1795 from lucbv/norms_on_stream 2087e7009 BLAS3: starting to add stream support for TPL code path of trmm/trsm 4231677db Formatting 89eab5240 Changes made for compilation in blake 961b6362a Changes for testing in blake 6e06af03c Backup 215a00692 Formatting ab59a34cf Another typo ac307232e Typo 5cf9c3ea9 Formatting 0453f0d02 Forgot some spots that need a template parameter for the execution space f27e4d034 Formatting 792bd5fa8 Correcting compilation errors on blake a368dd3cb Formatting 6742ef3bf Solving compilation issues on the automatic tests 52a2a2de2 Corrections for some automatic tests that are failing 3a91bb0e5 Proper formatting 9f49fb972 Addressing new feedbacks from Luc. d94c0139e Minor corrections 629337c26 Needed to format two extra files in kokkos-dev-2 in order for the automatic 'check' step to pass 7ce9d9f83 The clang formatting from kokkos-dev-2 puts a space into these 3 files, which needed (the space) to be removed in my Mac in order for the compilation to work. Tests pass in my Mac. e41861865 All files formatted with clang 8.0 414210378 Addressed all feedbacks from Luc and Kim b21194af4 Handling compilation warnings and errors at weaver 99a3b9dac All changes again, because previous branch got changes beyond those related to ger 13c5d8633 BLAS2/3: adding proper execution space interfaces to gemv and gemm 31e00593f Merge branch 'release-candidate-4.0.01' for 4.0.01 a46ebd5e9 Merge pull request #1719 from lucbv/gmres_type_fixes d3b8bc823 BLAS1: adding final fences for code path that return host results cb9fc79da Merge pull request #1768 from e10harvey/more_sparse_docs f83016589 BLAS1: applying clang-format e36c50e4b BLAS1: nrm2w adding support for execution space overload 20463f2a4 BLAS1: nrm1/nrm2 update CUBLAS calls a0d52184d BLAS1: nrm2(_squared) updated to have executions_space overload f0088ab94 BLAS1: nrminf fix in the TPL layer for execution space overload be556c08a BLAS1 nrminf: adding execution space overload a760a1d60 BLAS nrm1: fixing issues with TPLs 4538fc446 Blas1: updating nrm1 interface to accept execution space instance ccf8f1557 Merge pull request #1805 from lucbv/blas1_on_stream_docs 03d678724 BLAS1: clang-format for documentation... : ( 6606dde03 BLAS1: documentation adding default space info and non-block statement daf1edce6 BLAS1: updating documentation for changes in PR #1803 3ce7f2985 Merge pull request #1803 from lucbv/blas1_on_stream 6d673920c Merge branch 'develop' into sptrsv-solve-streams 5f89a772f sparse: Fix intel build error bb0e2fef3 BLAS1: fix documentation for fill and mult and apply clang-format bf09ba19b BLAS1: fix CUBLAS TPL layer for axpby and scal fa03d4884 Update blas1.rst fb6318907 Merge pull request #15 from brian-kelley/GS_Docs ffefb5386 BLAS1: applying clang format 9d45383d2 BLAS1: fix some Host BLAS TPL issue with execution space overload b3d73f1d0 Add doxygen for user-facing Gauss-Seidel functions 2949394c0 BLAS1: apply clang-format 93986fd68 sparse: coo2crs add RandomAccess to BmapViewType 2d3c2c4f4 Update sparse/src/KokkosSparse_par_ilut.hpp 4ad4962c5 Update docs/developer/apidocs/sparse.rst 8a35f819a Update docs/developer/contrib.rst 4ce5d2a4e sparse: coo2crs and crs2coo updates 394409fb4 docs: build_doc 4c6d55b11 docs: Update contrib 6e150ac9d sparse: CooMatrix 6016771b3 sparse: CooMatrix 82e13ca28 Update changelog 0dcbd6a17 par_ilut: make Ut_values view atomic in compute_l_u_factors (#1781) 710a2396b Jgfouca/remove par ilut limitations (#1755) 8233f7330 ParIlut: create and destroy spgemm handle for each usage (#1736) 957298552 GMRES: fixing some type issues related to memory space instantiation 49339eb3f Merge pull request #1661 from jgfouca/jgfouca/par_ilut_test 8077e640b Update changelog 4df81e5d0 Fix #1758 (#1762) 221495705 Merge pull request #1763 from lucbv/roc_tpls_upgrade 98c72b5f4 Merge pull request #1759 from tmranse/tmranse/mdfInterface cfd5928e2 Update changelog 8928788a4 Update version to 4.0.01 229608457 Patch Trilinos #11663 48ca11b50 Fix kk_generate_diagonally_dominant_sparse_matrix hang (#1689) 99654d8cf Merge pull request #1737 from e10harvey/reduce_test_coverage 8e90d005f Remove unused variable (#1734) db917b2f4 Merge pull request #1727 from lucbv/cuda_11_4_fixes 6cfc547ce Merge pull request #1704 from e10harvey/doc_typos 1f266de0d Merge pull request #1698 from cwpearson/fix/kk-1692 4b731c4fb Merge pull request #1801 from e10harvey/include_omp_settings 01547c447 Blas1: supporting execution space on BLAS1 kernels 1d33c6f9b scripts: Include OMP settings f78e4eb74 sparse: specify memory space for coo2crs ea9db31d1 Merge pull request #1800 from brian-kelley/Fix1798 40eac2958 Fix #1798 790c9f506 Blas1: adding execution space instance interface for abs f69755715 Merge pull request #1797 from kokkos/cwpearson/docs-apt-update 81477dc0d Update docs.yml ec611fe92 Blas1: adding execution space overload of axpy and axpby 038def615 sparse: Add coo2crs, crs2coo and CooMatrix a2a741da2 Merge pull request #1649 from e10harvey/get_ci_back_up 6dc008e11 Merge pull request #1796 from e10harvey/fix-docs-check 0b871d129 Remove deprecated code 2b63c1a61 scripts: Fix github-DOCS 26dac2932 scripts: Final changes for clang 10 a176b931b Fix #1786: check that work array is contiguous in SVD (#1793) 03f48fae6 BLAS: fixes and testing for LayoutStride (#1794) e3a42e418 Fix compile errors f3ec3b464 Merge branch 'develop' into sptrsv-solve-streams bcaa37fc8 Merge pull request #1751 from NexGenAnalytics/benchmark-blas3-tests 507c29f68 par_ilut: make Ut_values view atomic in compute_l_u_factors (#1781) 1a6f22b1c Report layouts used c025caacd Port blas3 gemm test 5015a2cdf Merge pull request #1733 from NexGenAnalytics/5-google-benchmark-blas2-tests e2d1a1d69 Merge pull request #1790 from kliegeois/fixUnusedVar ec392dc43 Merge pull request #1789 from NexGenAnalytics/benchmark-openmp-context b654dd63b Merge pull request #1784 from masterleinad/fix_sycl_printf 7c798ae97 cuSPARSE trisolve with streams 0fd4f2878 Fix unused variable warnings b1185f3a9 Include OpenMP environment variables in benchmark context 97187c3af Allow passing additional arguments 20ad98ac6 Add execution space to policies 15d616983 Reduce duplication 5d237f8b6 Support all command line parameters 35ee9ee7e Fix formatting 332485486 Add registration wrapper 34a228689 Parse blas2 custom command line parameters f38b56ab1 Let benchmark decide number of iterations 03728a8b8 Use CMake helper for ODE_RK benchmark 1d70e7aeb Parse common parameters 10dc298b5 Move warm-up out of benchmarking loop 24923b79e Use separate executable 6c21c4df2 Revert changes to blas1 benchmark 278d18fac Use stored time value b3da12558 Use correct header 7336d9c2f Add a benchmark for LayoutRight 6d027010a Let benchmark calculate FLOP/s 0678b55b1 Include scalar type in the output e87d532c2 Let benchmark decide the number of repetitions 3b8c2da3d Remove redundant output bfc68039d #5: Create blas2 gemv benchmark test 8154037ff Merge pull request #1779 from NexGenAnalytics/8-refactor-cmake-mkl 9f12713ad Add --enable-docs option to cm_generate_makefile (#1785) 0a95fff2d Merge pull request #1776 from tmranse/mdfComplex 31ef8f6bf Intial stream interface 837bf841d Merge branch 'develop' into sptrsv-solve-streams a645960c9 Merge pull request #1773 from brian-kelley/SortAndMergeEarlyExit 005822bcf Merge pull request #1728 from vqd8a/spiluk_numeric-streams 0564b18d3 Merge branch 'develop' into spiluk_numeric-streams 4ca54ed15 Use KOKKOS_IMPL_DO_NOT_USE_PRINTF in Test_Common_UpperBound.hpp e2ca0694a Merge branch 'develop' into sptrsv-solve-streams 6dc2a6a53 Re-enable and clean up triangle counting perf test (#1752) 378ffb32e Merge pull request #1770 from kliegeois/device_blas2 dc6f763f3 Remove the printf inside the team kernels. 0ae0d31e1 Formatting & remove unused typedefs 17b71d2b3 Add compile-time checks for SortCrs functions 893132ccd Allowed template arg deduction for sort_, sort_and_merge d49004f77 Remvoe deprecated KokkosKernels::Impl:: sort functions f666fba99 Sort and merge improvements 47322fbe5 Merge pull request #1778 from lucbv/fix_gesv_uninitialized ec7ce2133 Gesv: using a value-initialization after all 397a3c660 Gesv: adding small comment for clarity 2114d03b6 Merge pull request #1754 from lucbv/ode_explicit 2bd997ae3 #8 added SYCL path for MKL in FindTPLMKL.cmake file 788018fd4 Batched Gesv: initializing variable to make compiler happy 6b4b8bb17 ODE: fix small typo and rebase error 22cd43ce1 ODE: adding support for adaptive time stepping 9ff29b38d ODE: adding new component for time integration 51ac81620 use crs_matrix view traits for magnitude view 1c2105bb1 remove deprecated Rank call 8ef7d05e8 Move TeamSpmv and TeamVectorSpmv to KokkosSparse 70db534be add support for complex data types in MDF 8f3574e33 spgemm handle: check that A,B,C graphs never change (#1742) a975fa3e0 #8 Updated FindTPLMKL.cmake to support SYCL option from kokkos aa96a83ad Jgfouca/remove par ilut limitations (#1755) 7d6485eaa Formatting 43bf36595 Make Werror build happy f8b2a5e5a Update docs/developer/apidocs/sparse.rst 4dd7e613c Add par_ilu numeric docs 53599f47d Fix #1758 (#1762) 6c003deb3 Fix the doc of KokkosBlas2_team_spmv.hpp bebcf360d Using Kokkos::ArithTraits instead of Kokkos::Details::ArithTraits 24cb9017b Add calls to KokkosBlas Gemv and Spmv for team batched kernels when m==1 5edb51a45 #8 update FindTPLMKL.cmake to use find_package(MKL) c9d22ca1b #8: made functionnal current version (v1) for MKL 5ece7b3dd Merge branch 'develop' into spiluk_numeric-streams e35ed210b Merge pull request #1763 from lucbv/roc_tpls_upgrade 30bd681ff Merge pull request #1759 from tmranse/tmranse/mdfInterface 75c14cd0b Add par_ilut symbolic docs a2b18d73e Merge pull request #1765 from e10harvey/host_level_docs 1b123b177 Merge pull request #1767 from e10harvey/update_actions_checkout a9189f56a clang-format... 3065eb31c ROCSPARSE: fix unused variable in unit-test 01c49a8d2 docs: Add stubs for some sparse APIs f2c217d57 .github: Update to actions/checkout@v3 3d28a4730 Merge pull request #1711 from cwpearson/feature/search aaadaa0dd docs: Include BatchedGemm a0a928194 Merge branch 'develop' into spiluk_numeric-streams 1491bd433 Add exec instance support to sort/sort_and_merge utils (#1744) 8e77c01cc TPLs: replicating changes made in Trilinos for ROCBLAS/ROCSPARSE 45a8d3baf address reviewer comments and run clang-format b079a4e2d Merge pull request #1672 from brian-kelley/FixSpaddPerftest 25dbdcb9b #7 Removed V2 and V1. f49d41ead #7: V3: simplest way to get rocsparse and rocblas 5c8d760a3 #7: V2 Added hybrid version for rocblas and rocsparse 8efb0356c #7: (v1): old way for rocsparse and rocblas 27ec2cdb8 Spgemm perf test enhancements (#1664) a94163cbc Patch Trilinos #11663 (#1757) 0e615295f Merge pull request #1753 from kliegeois/device_blas_refact a2c1610a8 accept r-value A matrix f11a70ab6 Merge branch 'develop' into get_ci_back_up 6bcfac5bd Adds team- and thread-based lower-bound and upper-bound search and predicates. 9f2399310 Merge branch 'develop' into spiluk_numeric-streams b483cfce3 Merge pull request #1732 from cwpearson/fix/kk-1731 c77395716 Add calls to KokkosBlas Dot and Axpy for team batched kernels when m==1 11d442b51 Deprecate Kokkos::Details::ArithTraits (#1748) a3c919474 Merge pull request #1750 from NexGenAnalytics/1718-print-google-benchmark-version 5595b4a92 Leverage std library in BsrMatrix constructor 943cfc6bb add access to inv permutations to mdf handle 38789c2cc add ability to generate compile_commands.json for clangd 252fbf8a2 Clarify comments for context helper functions d2f9e0113 Mark functions as inline where appropriate 0912b67ac Include google benchmark lib version in benchmark output 1554ee7a8 Extract benchmark CMake code into a separate file 0e507ae38 openblas is now in standard modulepath aec946c28 Merge pull request #1737 from e10harvey/reduce_test_coverage 873e2a8b1 Merge pull request #1693 from NexGenAnalytics/5-print-get-CUSPARSE-CUBLAS-versions 2a5309b39 Use concurrency() rather than impl_thread_pool_size() bf9ed2aee ParIlut: create and destroy spgemm handle for each usage (#1736) fd7f6e515 cm_test_all_sandia: Add llvm/10.0.1 55f24857e perf test utils: fix device ID parsing (#1739) a7e7bcb74 Merge pull request #1722 from NexGenAnalytics/5-add-git-info 664bfc4d3 Fix kk_generate_diagonally_dominant_sparse_matrix hang (#1689) 60881471b Remove unused variable (#1734) 2cfc5082b spadd perf test: use common infrastructure 2dff92063 Avoid errors about not finalizing Kokkos 1e0fb0249 Fix/enhance backend issues on spadd perftest ee059d078 Improve readability 323cefa5d Do not print CUBLAS_VER_BUILD b6f4c80e9 Rename functions 9cc9328c7 #5: added TplsVersion file and print methods 54d70dc83 Remove sample benchmark 72de68a8d Revert "Enable benchmarks in CI" a21ce0982 Enable benchmarks in CI e8b2d6cd0 Use constexpr variables for git info 2f9352acc Switch to header-only implementation c32f3ad06 Include git information in benchmark context 3b466361c Generate git information during build bc9265b0d Fix typo ff097ec63 Merge pull request #1636 from NexGenAnalytics/5-google-bench-dot-test be9310d97 Reduce BatchedGemm test coverage 221f7abc0 Work around instance resource limits 4e6c1d76e Merge branch 'develop' into spiluk_numeric-streams 560f37286 Fix unused-parameter nstreams error cb11f0cff Use clang modules 950f633b7 pull in mkl 5c8067c93 More cleanup. fa5bdf509 More cleanup 4b4e7b82f Cleanup. Need clang toolchain f2184cf60 Use openblas tpl 3ac5a6fe1 Use stdlibc++ from gnu 8.2.1 678783275 Get a C++17 stdlibc++ in the path b8ebb9564 scripts/cm_test_all_sandia: - Add boiler plate for gnu/10.2.1 and intel/19.0.5.281. afd686eb5 Merge pull request #1723 from kokkos/docs/cwpearson-html-only 26332eda6 Merge pull request #1727 from lucbv/cuda_11_4_fixes 9b0dfbd0f CUDA 11.4: fixing some failing build while trying to reproduce issue #1725 26bf33311 Merge pull request #1726 from e10harvey/ci_format_docs ff31df01e .github: Automation reminder 5c2702283 Make Sphinix optional a9877dc6f Install doxygen-latex for HTML docs 3ec0cb7fc #5: Rebased on develop and added kernels print_configuration call 8be303261 #5: Added better name for benchmark tests 56ef2095f #5: Added team dot benchmark test 4fc790848 #5: Fixed clang-format e9c968cdd #5: Added dot_mv benchmark test 7be07e5a4 #5: Fixed clang-format errors 7dfe9efde #5: generalized execution space and removed unused include 0361d1d32 #5: Added benchmark dot perf test 482cc00f6 clang format d83c123ea Add nstreams to symbolic call 08e3824f3 Apply clang format to Test_Sparse_spiluk.hpp 004c1c041 Fix undefined reference errors and clean up printf statements d17877163 Apply clang format 1f74d4399 Add nstreams to avail_byte calculation f055b6977 Merge branch 'develop' into spiluk_numeric-streams f658cc4dd Add spiluk_numeric_streams interface 048155245 Merge pull request #1720 from dalg24/drop_pre_kokkos_36_workaround f41ff478c Merge pull request #1719 from lucbv/gmres_type_fixes d9df4fd6b Drop obsolete workaround checking whether KOKKOS_IF_ON_{HOST,DEVICE} macros are defined e5c8da8fc Merge pull request #1710 from cwpearson/feature/iota ba311291c Adding fix for LUPrec 3831a680a Merge pull request #1707 from lucbv/kk_config_version b209a157c Merge pull request #1691 from cwpearson/fix/cmake-force 2f069c4fc Use the options ENABLE_PERFTEST, ENABLE_EXAMPLES (#1667) 4414f46c1 GMRES: fixing some type issues related to memory space instantiation fa3dd4e13 Merge pull request #1717 from ndellingwood/update-changelog-4.0 b202dcbfd Merge pull request #1714 from cwpearson/ci/format-diff abcf8d4d1 Merge pull request #1716 from ndellingwood/issue-1715 4f39a18ec Merge pull request #1698 from cwpearson/fix/kk-1692 6ce7ea4ec Merge pull request #1695 from kokkos/update-changelog-to-4.0.0 4abf2a3a8 rocsparse spmv tpl: Fix rocsparse_spmv call for rocm < 5.4.0 8ed861214 Adds KokkosKernels::Impl::Iota, a view-like where iota(i) = i + offset 50758c1b2 Merge pull request #1712 from cwpearson/tests/spmv-controls 813626471 Merge pull request #1701 from cwpearson/fix/kk-issue-1700 3a2064350 Merge pull request #1704 from e10harvey/doc_typos fcf349d33 print the patch that clang-format-8 wants to apply 6ead86002 add explicit tests of opt-in algorithms a3ab61082 CUSPARSE_MM_ALG_DEFAULT deprecated by 11.1 8697db1e4 Merge pull request #1709 from lucbv/comp_4_0_0 d63de38b5 Merge pull request #1707 from lucbv/kk_config_version 76968d3f7 Merge pull request #1691 from cwpearson/fix/cmake-force 7f3acf133 Compatibility upgrade: adding compatibility branch in code 8469d478f Kokkos Kernels version: need to use upper case variables f40aabfea Merge pull request #1706 from lucbv/fix_team_mult db0071a43 team mult: applying clang-format 562aaffd9 team mult: fix type issue in max_error calculation d692d3585 Merge pull request #1703 from cwpearson/fix/kk-1702 f1dd58cf7 Merge pull request #1694 from lucbv/test_eti_only_off 0b88c05ed test mixed scalars: adding more comments and sending msg to cerr 31190a68c blas/blas1: Add mult docs f46b24258 blas/blas1: Fix a couple documentation typos. e4b324c8c test mixed scalars: incorporate Evan's comments 016384fff View::Rank -> View::rank feb9f9ae6 use rocsparse_spmv_ex for rocm >= 5.4.0 e9ec43800 Introduce KOKKOSKERNELS_ALL_COMPONENTS_ENABLED variable 5153da336 Merge pull request #1697 from cwpearson/fix/kk-1696 8aa7fa23e cast Kokkos::Impl::integral_constant to int 602c526d7 Tested mixed scalars: removing temporary output 557e62a67 Test mixed scalars: more fixes related to mixed scalar tests 6d73c141e Merge pull request #1687 from lucbv/version_integration_fix 45ffc0849 Versions: fixing the CMake logic to export Kokkos Kernels version 5f5b9e0c5 Merge pull request #1685 from e10harvey/test_eti_only 37efc3bff Merge pull request #1665 from NexGenAnalytics/5-print-configuration 8206953f5 scripts: add --disable-test-eti-only d2386da91 Merge pull request #1615 from lucbv/gemm_mixed_scalars 1fccf4a27 Mixed Scalars: fixing typo 31a756661 Mixed Scalars: fixing some type conversion in unit-tests 92b82ef88 Mixed Scalars: modifying one more test according to review comment 1507de8dc Mixed Scalars: modifying according to PR comments. e9f463439 Mix Scalars: fixing the tolerance in axpby d76e8e18a BLAS: mixed gemm 4a29cafbe Merge pull request #1683 from vqd8a/spiluk-nondeterministic-numeric 2140e99b0 #5 Fixed typo 7f579fb5c #5 rebased on develop and updated print_version method for kernels d12158be6 #5: Fixed mistake in filename and updated Kernels version key 3ddf1dea0 #5: Fixed clang format and removed form this PR benchmark modification 8c1a89e0e #5: Added inline to avoit multiple define problem e3c311bd7 #5: updated key verification 95b9ddcb5 #5 Updated print_configuration content format 32d58f6c3 #5: fixed previous commit mistake 634b2cad7 #5: added print_configuration file and its test b60e9913f #5: moved print_configuration to header only file and added its test cc11c6d7a #5: Added basis for print_configuration method 9455f6505 BLAS: fix build with KokkosKernels_TEST_ETI_ONLY=OFF 747bb9303 Merge pull request #1661 from jgfouca/jgfouca/par_ilut_test 9ff35198d Add utility KokkosSparse::removeCrsMatrixZeros(A, tol) (#1681) c7765bc1d Merge pull request #1680 from lucbv/export_version_info a66a5d6d6 Fix uninitialized error a67bc42ce Apply clang format d7ca7e7a4 Merge branch 'develop' into spiluk-nondeterministic-numeric e2b8df3fd Make hlevel_ptr a separate allocation 6d02704ad Remove one unnecessary barrier 0b5bc7a61 Fix race condition when read and write L_values at the same k 76d9ed4ab formatting 7f78fceb1 Support alpha and beta in LUPrec::apply 304bcdaea Merge pull request #1676 from lucbv/perf_test_wrapper fd8bf8ae4 Update perf_test/sparse/KokkosSparse_mdf.cpp a936394f1 Merge branch 'develop' into spiluk-nondeterministic-numeric b0965b7d4 Spgemm non-reuse: unification layer and TPLs (#1678) d3ffe8214 Perf Tests: adding utilities and instantiation wrapper c9e631b61 Version: applying clang-format c284ef4ac Version: adding unit-test to verify that version info is available c2486ab14 Merge pull request #1679 from dalg24/view_rank 6a6a51045 Fix warnings 1d19eeabb CMake: export version and subversion to config file a05f21e3b Prefer View::{R->r}ank 91222dba2 formatting 6a4bf14ce Address GH feedback 9586dd948 Use sptrsv instead of blas::trsm aac450bd3 Merge pull request #1624 from lucbv/MDF_alg_upgrade 9095beb5c MDF: improving performance and adding performance test 61ba79b8a Merge pull request #1677 from masterleinad/update_sycl 167ad420e Update SYCL docker file to include oneDPL 566570a87 Temporary workaround for Kokkos #5860 (#1675) d1ee1a43e format fix e50849b37 Fix for openmp-only 771f0f2cf Fix warnings 6615b77c0 Merge remote-tracking branch 'origin/develop' into jgfouca/par_ilut_test 08b71b3ff Fix @file tags in a few headers d83b0649c Turn off main par_ilut+gmres test if kokkos::serial is not enabled a89349ddb formatting dd930a662 Fixes: trsm expects host views b9bcc5f49 Add new assert/require macros. Other minor fixes 834a85ece Use the options ENABLE_PERFTEST, ENABLE_EXAMPLES (#1667) 6d6ed244e Merge pull request #1670 from masterleinad/update_sycl a3ee83b55 Merge pull request #1666 from brian-kelley/FixOmpImplThreads e72bc3859 SYCL CI: Specify the full path to the compiler 16c97ddb6 Call concurrency(), not impl_thread_pool_size() dec1753fc Testing working in serial and openmp (IF I force determinism on parIlut) da90033b7 Merge pull request #1654 from dalg24/clock_tic 56cdbd2c6 Merge pull request #1653 from dalg24/drop_pre_kokkos_36_workaround 55eb42008 Merge pull request #1652 from dalg24/are_integral f0229f902 Merge pull request #1651 from masterleinad/fix_sycl_printf 5b30c5a05 Merge pull request #1662 from ndellingwood/update-version-4.x 40474b7d9 Merge pull request #1660 from masterleinad/update_sycl 03180cdf1 Merge pull request #1659 from kliegeois/fix_documentation_typo b4d8ca8bf Update nightly SYCL setup 4df9db90a Hands off Kokkos::Impl::are_integral f33376a54 Add Impl::are_integral_v helper variable template 4ee798d83 Drop pre Kokkos 3.6 workaround a4bea4798 Replace printf in device code for SYCL 12e1b814b Do not use Kokkos::Impl::clock_tic, prefer std::chrono to get a random seed 839453184 Merge pull request #1647 from e10harvey/issue1571 93ecefbc9 Fix LUPrec license 3074b4b01 CMakeLists.txt: update version to 4.0.99 45630287b Merge remote-tracking branch 'origin/develop' into jgfouca/par_ilut_test b4f3dd0eb Fix documentation regressions 51c3c5a0c Fix whitespace cc38f32ef Add deprecated code disable to docs build. 10d155ad4 Merge branch 'develop' into issue1571 78833d6ca Merge pull request #1658 from lucbv/kokkos_deprecate_ALL_t 86edac3b1 Minor fixes ead0712ef Merge branch 'develop' into issue1571 d2f273c02 osx-ci: adding option to disable deprecated_code_4 in Kokkos b846db97e Apply suggestions from code review fed582cb5 Fix an error in Krylov Handle documentation 6c5744fd6 Applying clang-format 215c6beb0 Benchmarks: for some reason the current version fails to build 11be16b61 Fixing deprecated usage of Kokkos::Impl::ALL_t in favore of Kokkos::ALL_t e04475d55 Things building 1ea3a7b90 .github/workflows: - Added docs.yml - Save cycles with -DKokkos_ENABLE_TESTS=OFF 25b4fb815 Add new par_ilut test f87b7d566 Clean up numeric and symbolic 547a6608a Clean up spiluk numeric 432c9541c Fix for VOLTA 81f77d0fb Prefer team size 32 0b4b667f1 Use atomic_add again 3adaa70ba Not use atomic_add 916100baf Initial fix git-subtree-dir: tpls/kokkos-kernels git-subtree-split: 25a31f8812330cec6e8ac5d8ea99bb9a2045cbab
Description
This issue will track Trilinos efforts to pre-build/install Kokkos (using TriBITS) and the build downstream packages against pre-built/installed Kokkos by setting (
TPL_ENABLE_Kokkos=ON
). This uses the new support in TriBITS for this in:which is part of:
Initially we will use the TriBITS build system for Kokkos because it provides the subpackages expected by downstream Trilinos packages and provides the rest that is needed by TriBITS (since it will automatically be a TriBITS-compliant external package). Also, initially we can comment out the special logic about pulling in compiler options from Kokkos and will not try a CUDA build to make this easier.
Then we can get this working for the native Kokkos CMake build system.
Tasks
The text was updated successfully, but these errors were encountered: