Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StreamHPC 2023-11-21 (DeviceMemcpy::Batched) #314

Merged
merged 5 commits into from
Nov 21, 2023

Conversation

Naraenda
Copy link
Member

Implements the batch memcpy interface added in ROCm/rocPRIM#485

Notable commits:

Snektron and others added 5 commits November 21, 2023 12:22
This allows the build job to be performed by any runner configured
for building, instead of the ROCm-specialized builder. As the
target architectures are specified ahead of time, the GPU is not
needed during the build process, and may be performed by any builder.
ci: use build instead rocm-build and nvcc-build tags

See merge request amd/libraries/hipCUB!168
Add interface for batched memcpy from rocPRIM and CUB

Closes ROCm#181

See merge request amd/libraries/hipCUB!167
@stanleytsang-amd stanleytsang-amd merged commit f459480 into ROCm:develop Nov 21, 2023
10 checks passed
stanleytsang-amd added a commit that referenced this pull request Dec 6, 2023
* Develop stream 2023-10-27 (#309)

* Accumulator types changed for reduce and test_hipcub_device_reduce fixed for new thread operators

* Add thread operators test

* Bump CUB and Thrust versions to 2.1.0

* change how we use the rocprim::host_warp_size

* update changelog

* move host_warp_size_wrapper out of the HIPCUB_HOST_WARP_THREADS macro

* update changelog to be clearer

* add changes related to __int128_t support

* finish int128 support
add tests for block and device_radix_sort
add assert_bit_eq for (u)int128 vectors

* Test large indices for DeviceReduce

* Fix clang format

* Include FetchContent in new ROCmCMakeBuildToolsDependency cmake file

* Use _ENABLE_EXTENDED_ALIGNED_STORAGE for windows build in rmake.py

* Update CHANGELOG to ROCm 6.1

---------

Co-authored-by: Bence Parajdi <bence@streamhpc.com>

* StreamHPC 2023-11-21 (DeviceMemcpy::Batched) (#314)

* ci: use build instead rocm-build and nvcc-build tags

This allows the build job to be performed by any runner configured
for building, instead of the ROCm-specialized builder. As the
target architectures are specified ahead of time, the GPU is not
needed during the build process, and may be performed by any builder.

* feat: Add interface for batched memcpy from rocPRIM and CUB

* style(device_memcpy): improve formatting

---------

Co-authored-by: Robin Voetter <robin@streamhpc.com>
Co-authored-by: Gergely Mészáros <gergely@streamhpc.com>

* Bump cryptography from 41.0.4 to 41.0.6 in /docs/.sphinx (#316)

Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.4 to 41.0.6.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](pyca/cryptography@41.0.4...41.0.6)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Beatriz Navidad Vilches <61422851+Beanavil@users.noreply.github.com>
Co-authored-by: Bence Parajdi <bence@streamhpc.com>
Co-authored-by: Nara <nara@streamhpc.com>
Co-authored-by: Robin Voetter <robin@streamhpc.com>
Co-authored-by: Gergely Mészáros <gergely@streamhpc.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
stanleytsang-amd added a commit that referenced this pull request Jan 17, 2024
* Develop stream 2023-10-27 (#309)

* Accumulator types changed for reduce and test_hipcub_device_reduce fixed for new thread operators

* Add thread operators test

* Bump CUB and Thrust versions to 2.1.0

* change how we use the rocprim::host_warp_size

* update changelog

* move host_warp_size_wrapper out of the HIPCUB_HOST_WARP_THREADS macro

* update changelog to be clearer

* add changes related to __int128_t support

* finish int128 support
add tests for block and device_radix_sort
add assert_bit_eq for (u)int128 vectors

* Test large indices for DeviceReduce

* Fix clang format

* Include FetchContent in new ROCmCMakeBuildToolsDependency cmake file

* Use _ENABLE_EXTENDED_ALIGNED_STORAGE for windows build in rmake.py

* Update CHANGELOG to ROCm 6.1

---------

Co-authored-by: Bence Parajdi <bence@streamhpc.com>

* StreamHPC 2023-11-21 (DeviceMemcpy::Batched) (#314)

* ci: use build instead rocm-build and nvcc-build tags

This allows the build job to be performed by any runner configured
for building, instead of the ROCm-specialized builder. As the
target architectures are specified ahead of time, the GPU is not
needed during the build process, and may be performed by any builder.

* feat: Add interface for batched memcpy from rocPRIM and CUB

* style(device_memcpy): improve formatting

---------

Co-authored-by: Robin Voetter <robin@streamhpc.com>
Co-authored-by: Gergely Mészáros <gergely@streamhpc.com>

* Bump cryptography from 41.0.4 to 41.0.6 in /docs/.sphinx (#316)

Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.4 to 41.0.6.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](pyca/cryptography@41.0.4...41.0.6)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update rocm-docs-core to 0.30.3 (#319)

* Update rocm-docs-core to 0.30.3

* Update link to hipCUB docs in README

* Remove doc artifacts

* Bump gitpython from 3.1.37 to 3.1.41 in /docs/.sphinx (#320)

Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.37 to 3.1.41.
- [Release notes](https://github.com/gitpython-developers/GitPython/releases)
- [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES)
- [Commits](gitpython-developers/GitPython@3.1.37...3.1.41)

---
updated-dependencies:
- dependency-name: gitpython
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* 6.0 final mergeback to develop (#321)

* Separate gfx942 specific code (#289)

Co-authored-by: Stanley Tsang <stanley.tsang@amd.com>

* Split rocm-cmake dependency out before hip include (#293)

* Split rocm-cmake dependency out before hip include

* Update comments

* Fix cpp-check reported issues

Fixed a number of issues that static analysis picked up:
  - Made some functions const since they don't modify member state
  - Made some parameters const, since they're never modified
  - Fixes for several benchmark/test functions
    - Removed unused variable declarations
    - Added missing input data transfer from host to device
    - Added some member variables to constructor initializer list
    - Added override keyword in several places
    - Fixed up item placeholders in some printf statements

* Fix cpp-check reported issues

* Removed host to data transfer from memcpy benchmark.
Since this benchmark only tests memcpy performance between device buffers,
we don't really need to copy data into these from the host.

* update googlebenchmark version (#302)

* Avoid a segmentation fault when clearing cached blocks (#297) (#310)

Co-authored-by: Tom Benson <benson31@llnl.gov>

* Include FetchContent before usage (#308)

* 6.0 cherry pick for changelog and version update (#313)

* Update documentation and version for 6.0

* Fix version

---------

Co-authored-by: Eiden Yoshida <47196116+eidenyoshida@users.noreply.github.com>
Co-authored-by: Lauren Wrubleski <Lauren.Wrubleski@amd.com>
Co-authored-by: Wayne Franz <wayfranz@amd.com>
Co-authored-by: Tom Benson <benson31@llnl.gov>

* Bump jinja2 from 3.1.2 to 3.1.3 in /docs/.sphinx (#322)

Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.2 to 3.1.3.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](pallets/jinja@3.1.2...3.1.3)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Adding CODEOWNERS file (#324)

* Bump rocm-docs-core[api_reference] in /docs/.sphinx (#326)

Bumps [rocm-docs-core[api_reference]](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.30.3 to 0.31.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v0.30.3...v0.31.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Standardize documentation for ReadtheDocs (#325)

* Update links in README.md

- Update the links to other ROCm repositories that are now in the ROCm org.
- Replace link to "rocm.github.io" with "rocm.docs.amd.com".

* Update package version

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Beatriz Navidad Vilches <61422851+Beanavil@users.noreply.github.com>
Co-authored-by: Bence Parajdi <bence@streamhpc.com>
Co-authored-by: Nara <nara@streamhpc.com>
Co-authored-by: Robin Voetter <robin@streamhpc.com>
Co-authored-by: Gergely Mészáros <gergely@streamhpc.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>
Co-authored-by: Eiden Yoshida <47196116+eidenyoshida@users.noreply.github.com>
Co-authored-by: Lauren Wrubleski <Lauren.Wrubleski@amd.com>
Co-authored-by: Wayne Franz <wayfranz@amd.com>
Co-authored-by: Tom Benson <benson31@llnl.gov>
Co-authored-by: David Galiffi <dgaliffi@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing cub::DeviceMemcpy::Batched
3 participants