-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StreamHPC 2023-11-21 (DeviceMemcpy::Batched) #314
Merged
stanleytsang-amd
merged 5 commits into
ROCm:develop
from
StreamHPC:develop_stream_2023_11_21
Nov 21, 2023
Merged
StreamHPC 2023-11-21 (DeviceMemcpy::Batched) #314
stanleytsang-amd
merged 5 commits into
ROCm:develop
from
StreamHPC:develop_stream_2023_11_21
Nov 21, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This allows the build job to be performed by any runner configured for building, instead of the ROCm-specialized builder. As the target architectures are specified ahead of time, the GPU is not needed during the build process, and may be performed by any builder.
ci: use build instead rocm-build and nvcc-build tags See merge request amd/libraries/hipCUB!168
Add interface for batched memcpy from rocPRIM and CUB Closes ROCm#181 See merge request amd/libraries/hipCUB!167
stanleytsang-amd
approved these changes
Nov 21, 2023
stanleytsang-amd
added a commit
that referenced
this pull request
Dec 6, 2023
* Develop stream 2023-10-27 (#309) * Accumulator types changed for reduce and test_hipcub_device_reduce fixed for new thread operators * Add thread operators test * Bump CUB and Thrust versions to 2.1.0 * change how we use the rocprim::host_warp_size * update changelog * move host_warp_size_wrapper out of the HIPCUB_HOST_WARP_THREADS macro * update changelog to be clearer * add changes related to __int128_t support * finish int128 support add tests for block and device_radix_sort add assert_bit_eq for (u)int128 vectors * Test large indices for DeviceReduce * Fix clang format * Include FetchContent in new ROCmCMakeBuildToolsDependency cmake file * Use _ENABLE_EXTENDED_ALIGNED_STORAGE for windows build in rmake.py * Update CHANGELOG to ROCm 6.1 --------- Co-authored-by: Bence Parajdi <bence@streamhpc.com> * StreamHPC 2023-11-21 (DeviceMemcpy::Batched) (#314) * ci: use build instead rocm-build and nvcc-build tags This allows the build job to be performed by any runner configured for building, instead of the ROCm-specialized builder. As the target architectures are specified ahead of time, the GPU is not needed during the build process, and may be performed by any builder. * feat: Add interface for batched memcpy from rocPRIM and CUB * style(device_memcpy): improve formatting --------- Co-authored-by: Robin Voetter <robin@streamhpc.com> Co-authored-by: Gergely Mészáros <gergely@streamhpc.com> * Bump cryptography from 41.0.4 to 41.0.6 in /docs/.sphinx (#316) Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.4 to 41.0.6. - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst) - [Commits](pyca/cryptography@41.0.4...41.0.6) --- updated-dependencies: - dependency-name: cryptography dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Beatriz Navidad Vilches <61422851+Beanavil@users.noreply.github.com> Co-authored-by: Bence Parajdi <bence@streamhpc.com> Co-authored-by: Nara <nara@streamhpc.com> Co-authored-by: Robin Voetter <robin@streamhpc.com> Co-authored-by: Gergely Mészáros <gergely@streamhpc.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
stanleytsang-amd
added a commit
that referenced
this pull request
Jan 17, 2024
* Develop stream 2023-10-27 (#309) * Accumulator types changed for reduce and test_hipcub_device_reduce fixed for new thread operators * Add thread operators test * Bump CUB and Thrust versions to 2.1.0 * change how we use the rocprim::host_warp_size * update changelog * move host_warp_size_wrapper out of the HIPCUB_HOST_WARP_THREADS macro * update changelog to be clearer * add changes related to __int128_t support * finish int128 support add tests for block and device_radix_sort add assert_bit_eq for (u)int128 vectors * Test large indices for DeviceReduce * Fix clang format * Include FetchContent in new ROCmCMakeBuildToolsDependency cmake file * Use _ENABLE_EXTENDED_ALIGNED_STORAGE for windows build in rmake.py * Update CHANGELOG to ROCm 6.1 --------- Co-authored-by: Bence Parajdi <bence@streamhpc.com> * StreamHPC 2023-11-21 (DeviceMemcpy::Batched) (#314) * ci: use build instead rocm-build and nvcc-build tags This allows the build job to be performed by any runner configured for building, instead of the ROCm-specialized builder. As the target architectures are specified ahead of time, the GPU is not needed during the build process, and may be performed by any builder. * feat: Add interface for batched memcpy from rocPRIM and CUB * style(device_memcpy): improve formatting --------- Co-authored-by: Robin Voetter <robin@streamhpc.com> Co-authored-by: Gergely Mészáros <gergely@streamhpc.com> * Bump cryptography from 41.0.4 to 41.0.6 in /docs/.sphinx (#316) Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.4 to 41.0.6. - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst) - [Commits](pyca/cryptography@41.0.4...41.0.6) --- updated-dependencies: - dependency-name: cryptography dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update rocm-docs-core to 0.30.3 (#319) * Update rocm-docs-core to 0.30.3 * Update link to hipCUB docs in README * Remove doc artifacts * Bump gitpython from 3.1.37 to 3.1.41 in /docs/.sphinx (#320) Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.37 to 3.1.41. - [Release notes](https://github.com/gitpython-developers/GitPython/releases) - [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES) - [Commits](gitpython-developers/GitPython@3.1.37...3.1.41) --- updated-dependencies: - dependency-name: gitpython dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * 6.0 final mergeback to develop (#321) * Separate gfx942 specific code (#289) Co-authored-by: Stanley Tsang <stanley.tsang@amd.com> * Split rocm-cmake dependency out before hip include (#293) * Split rocm-cmake dependency out before hip include * Update comments * Fix cpp-check reported issues Fixed a number of issues that static analysis picked up: - Made some functions const since they don't modify member state - Made some parameters const, since they're never modified - Fixes for several benchmark/test functions - Removed unused variable declarations - Added missing input data transfer from host to device - Added some member variables to constructor initializer list - Added override keyword in several places - Fixed up item placeholders in some printf statements * Fix cpp-check reported issues * Removed host to data transfer from memcpy benchmark. Since this benchmark only tests memcpy performance between device buffers, we don't really need to copy data into these from the host. * update googlebenchmark version (#302) * Avoid a segmentation fault when clearing cached blocks (#297) (#310) Co-authored-by: Tom Benson <benson31@llnl.gov> * Include FetchContent before usage (#308) * 6.0 cherry pick for changelog and version update (#313) * Update documentation and version for 6.0 * Fix version --------- Co-authored-by: Eiden Yoshida <47196116+eidenyoshida@users.noreply.github.com> Co-authored-by: Lauren Wrubleski <Lauren.Wrubleski@amd.com> Co-authored-by: Wayne Franz <wayfranz@amd.com> Co-authored-by: Tom Benson <benson31@llnl.gov> * Bump jinja2 from 3.1.2 to 3.1.3 in /docs/.sphinx (#322) Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.2 to 3.1.3. - [Release notes](https://github.com/pallets/jinja/releases) - [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst) - [Commits](pallets/jinja@3.1.2...3.1.3) --- updated-dependencies: - dependency-name: jinja2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Adding CODEOWNERS file (#324) * Bump rocm-docs-core[api_reference] in /docs/.sphinx (#326) Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.3 to 0.31.0. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](ROCm/rocm-docs-core@v0.30.3...v0.31.0) --- updated-dependencies: - dependency-name: rocm-docs-core[api_reference] dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Standardize documentation for ReadtheDocs (#325) * Update links in README.md - Update the links to other ROCm repositories that are now in the ROCm org. - Replace link to "rocm.github.io" with "rocm.docs.amd.com". * Update package version --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Beatriz Navidad Vilches <61422851+Beanavil@users.noreply.github.com> Co-authored-by: Bence Parajdi <bence@streamhpc.com> Co-authored-by: Nara <nara@streamhpc.com> Co-authored-by: Robin Voetter <robin@streamhpc.com> Co-authored-by: Gergely Mészáros <gergely@streamhpc.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Sam Wu <sam.wu2@amd.com> Co-authored-by: Eiden Yoshida <47196116+eidenyoshida@users.noreply.github.com> Co-authored-by: Lauren Wrubleski <Lauren.Wrubleski@amd.com> Co-authored-by: Wayne Franz <wayfranz@amd.com> Co-authored-by: Tom Benson <benson31@llnl.gov> Co-authored-by: David Galiffi <dgaliffi@amd.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implements the batch memcpy interface added in ROCm/rocPRIM#485
Notable commits:
DeviceMemcpy::Batched
. Closes Missing cub::DeviceMemcpy::Batched #261