Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEST/CUDA: Test CUDA Memcpy with various contexts #10531

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

tvegas1
Copy link
Contributor

@tvegas1 tvegas1 commented Mar 4, 2025

What?

Exercise cuda copies with various combinations of parameters (device/ctx/stream/memcpy).

@tvegas1 tvegas1 requested review from rakhmets and brminich March 4, 2025 18:41
@tvegas1 tvegas1 force-pushed the cuda_ctx_memcpy branch 2 times, most recently from 837f312 to 700702c Compare March 5, 2025 07:13
printf("Using %d CUDA device(s)\n", device_count);

for (i = 0; i < MAX_THREADS; i++) {
thread[i].tid = i;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alignment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed



/* Context and associated resources */
struct context {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use typedef for both structs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

void *mem_managed;
};

struct context context[MAX_CTX * MAX_THREADS * MAX_DEV];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe better to have an independent context per each thread?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they share the content


ptr_a = context[j].mem;
ptr_b = context[k].mem;
ptr_a_managed = context[j].mem_managed;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alignment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which one?

#include <unistd.h>


#define MAX_THREADS 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we pass these from cmd line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as it is scale is already huge so i will add if we need other parameters?

for (k = 0; k < MAX_CTX * MAX_THREADS * count; k++) {
/* each stream (every thread every device) */
for (l = 0; l < MAX_CTX * MAX_THREADS * count; l++) {
CHECK_D(cuCtxSetCurrent(context[i].ctx));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to set CUDA context here. Even in case of CUDA Driver API.
The context is required for stream creation using CUDA Driver API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intent was to confirm that we do not care about current ctx being set, even if it is different from stream's ctx

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context is required only for stream creation. Memory copies can be executed even without setting any context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed setting the context

Comment on lines +118 to +121
test_cuda_ctx_memcpy_DEPBASE = $(DEPDIR)/test_cuda_ctx_memcpy
test_cuda_ctx_memcpy_COMPILE = \
$(NVCC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) \
$(test_cuda_ctx_memcpy_CPPFLAGS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make your file .c and remove this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


for (j = 1; j < MAX_CTX; j++) {
CHECK_D(cuCtxCreate(&context[index + j].ctx, 0, i));
CHECK_D(cuCtxSetCurrent(context[index + j].ctx));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line can be removed since cuCtxCreate pushes the newly created context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

CUcontext ctx;

for (i = 0; i < count; i++) {
CHECK(cudaSetDevice(i));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call retains the primary device context, and sets it as current context.
Maybe replace by cuDeviceGet(&device, i). And pass device to cuCtxCreate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants