-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TEST/CUDA: Test CUDA Memcpy with various contexts #10531
base: master
Are you sure you want to change the base?
Conversation
837f312
to
700702c
Compare
test/apps/test_cuda_ctx_memcpy.cu
Outdated
printf("Using %d CUDA device(s)\n", device_count); | ||
|
||
for (i = 0; i < MAX_THREADS; i++) { | ||
thread[i].tid = i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alignment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
test/apps/test_cuda_ctx_memcpy.cu
Outdated
|
||
|
||
/* Context and associated resources */ | ||
struct context { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use typedef for both structs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
test/apps/test_cuda_ctx_memcpy.cu
Outdated
void *mem_managed; | ||
}; | ||
|
||
struct context context[MAX_CTX * MAX_THREADS * MAX_DEV]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe better to have an independent context per each thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they share the content
test/apps/test_cuda_ctx_memcpy.cu
Outdated
|
||
ptr_a = context[j].mem; | ||
ptr_b = context[k].mem; | ||
ptr_a_managed = context[j].mem_managed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alignment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which one?
test/apps/test_cuda_ctx_memcpy.cu
Outdated
#include <unistd.h> | ||
|
||
|
||
#define MAX_THREADS 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we pass these from cmd line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as it is scale is already huge so i will add if we need other parameters?
test/apps/test_cuda_ctx_memcpy.cu
Outdated
for (k = 0; k < MAX_CTX * MAX_THREADS * count; k++) { | ||
/* each stream (every thread every device) */ | ||
for (l = 0; l < MAX_CTX * MAX_THREADS * count; l++) { | ||
CHECK_D(cuCtxSetCurrent(context[i].ctx)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to set CUDA context here. Even in case of CUDA Driver API.
The context is required for stream creation using CUDA Driver API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intent was to confirm that we do not care about current ctx being set, even if it is different from stream's ctx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The context is required only for stream creation. Memory copies can be executed even without setting any context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed setting the context
test_cuda_ctx_memcpy_DEPBASE = $(DEPDIR)/test_cuda_ctx_memcpy | ||
test_cuda_ctx_memcpy_COMPILE = \ | ||
$(NVCC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) \ | ||
$(test_cuda_ctx_memcpy_CPPFLAGS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make your file .c
and remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
test/apps/test_cuda_ctx_memcpy.cu
Outdated
|
||
for (j = 1; j < MAX_CTX; j++) { | ||
CHECK_D(cuCtxCreate(&context[index + j].ctx, 0, i)); | ||
CHECK_D(cuCtxSetCurrent(context[index + j].ctx)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line can be removed since cuCtxCreate
pushes the newly created context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
test/apps/test_cuda_ctx_memcpy.cu
Outdated
CUcontext ctx; | ||
|
||
for (i = 0; i < count; i++) { | ||
CHECK(cudaSetDevice(i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call retains the primary device context, and sets it as current context.
Maybe replace by cuDeviceGet(&device, i)
. And pass device
to cuCtxCreate
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
What?
Exercise cuda copies with various combinations of parameters (device/ctx/stream/memcpy).