Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport branch/2.3.x] Add cuda::ptx::st_async #1093

Merged
merged 4 commits into from
Nov 14, 2023

Conversation

github-actions[bot]
Copy link
Contributor

Description

Backport of #1078 to branch/2.3.x.

@github-actions github-actions bot requested review from a team as code owners November 13, 2023 20:12
@github-actions github-actions bot mentioned this pull request Nov 13, 2023
2 tasks
Copy link

copy-pr-bot bot commented Nov 13, 2023

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

(cherry picked from commit 30ce5ab)
(cherry picked from commit dc1d934)
Because the size can be either 32 or 64 bit, this can catch a lot of
errors.

For instance:

uint64_t * remote_buffer;
uint64_t * remote_bar;
cuda::ptx::st_async(remote_buffer, 1, remote_bar);

would previously use the .b32 path because the `1` is an integer and
determines the type resolution.

Now, this will result in a compiler error.

Resolution is to either (a) change the value type, or (b) change the
buffer type.

a)
uint64_t * remote_buffer;
cuda::ptx::st_async(remote_buffer, uint64_t(1), remote_bar);

b)
int32_t * remote_buffer;
cuda::ptx::st_async(remote_buffer, 1, remote_bar);

(cherry picked from commit 76044b7)
The type may be misleading on this one, so I added a note on alignment
of the destination address.

(cherry picked from commit 8230836)
@jrhemstad
Copy link
Collaborator

/ok to test

@jrhemstad jrhemstad merged commit eec0c04 into branch/2.3.x Nov 14, 2023
516 checks passed
gevtushenko pushed a commit to gevtushenko/cccl that referenced this pull request Dec 4, 2023
* Add st.async

(cherry picked from commit 30ce5ab)

* Fix usage example

(cherry picked from commit dc1d934)

* Use typed instead of void pointers

Because the size can be either 32 or 64 bit, this can catch a lot of
errors.

For instance:

uint64_t * remote_buffer;
uint64_t * remote_bar;
cuda::ptx::st_async(remote_buffer, 1, remote_bar);

would previously use the .b32 path because the `1` is an integer and
determines the type resolution.

Now, this will result in a compiler error.

Resolution is to either (a) change the value type, or (b) change the
buffer type.

a)
uint64_t * remote_buffer;
cuda::ptx::st_async(remote_buffer, uint64_t(1), remote_bar);

b)
int32_t * remote_buffer;
cuda::ptx::st_async(remote_buffer, 1, remote_bar);

(cherry picked from commit 76044b7)

* Add note on alignment

The type may be misleading on this one, so I added a note on alignment
of the destination address.

(cherry picked from commit 8230836)

---------

Co-authored-by: Allard Hendriksen <ahendriksen@nvidia.com>
@miscco miscco deleted the backport-1078-to-branch/2.3.x branch March 7, 2024 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants