Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

64-bit indexing Adam #1765

Merged
merged 3 commits into from
Jan 5, 2024
Merged

64-bit indexing Adam #1765

merged 3 commits into from
Jan 5, 2024

Conversation

eqy
Copy link
Contributor

@eqy eqy commented Dec 26, 2023

#1654

I think the tests pass without the changes to multi_tensor_apply.cuh, but leaving it as-is makes me a bit nervous...

TODOs: graph-capturable Adam, and all other optimizers if people really need 64-bit indexing there...

CC @crcrpar

@@ -85,9 +85,9 @@ void multi_tensor_apply(
tl.addresses[d][loc_tensor_info] = tensor_lists[d][t].data_ptr();
loc_tensor_info++;

int chunks_this_tensor = (tensor_lists[0][t].numel() + chunk_size - 1)/chunk_size;
auto chunks_this_tensor = (tensor_lists[0][t].numel() + chunk_size - 1)/chunk_size;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would chunks_this_tensor tend to be int64_t given chunk_size being so?

@crcrpar crcrpar merged commit 87c4deb into NVIDIA:master Jan 5, 2024
github-merge-queue bot pushed a commit to microsoft/DeepSpeed that referenced this pull request Apr 22, 2024
## The Issue

Applying `FusedAdam` on large tensors will cause an error `CUDA error:
an illegal memory access was encountered`.

#3429

NVIDIA/apex#1654

## PR Content

Following the solution in the apex repository
(NVIDIA/apex#1765), changing indexing type to
`int64` if necessary.

---------

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
github-merge-queue bot pushed a commit to microsoft/DeepSpeed that referenced this pull request Apr 22, 2024
## The Issue

Applying `FusedAdam` on large tensors will cause an error `CUDA error:
an illegal memory access was encountered`.

#3429

NVIDIA/apex#1654

## PR Content

Following the solution in the apex repository
(NVIDIA/apex#1765), changing indexing type to
`int64` if necessary.

---------

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
rraminen pushed a commit to ROCm/DeepSpeed that referenced this pull request May 9, 2024
## The Issue

Applying `FusedAdam` on large tensors will cause an error `CUDA error:
an illegal memory access was encountered`.

microsoft#3429

NVIDIA/apex#1654

## PR Content

Following the solution in the apex repository
(NVIDIA/apex#1765), changing indexing type to
`int64` if necessary.

---------

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
umchand pushed a commit to umchand/DeepSpeed that referenced this pull request May 20, 2024
## The Issue

Applying `FusedAdam` on large tensors will cause an error `CUDA error:
an illegal memory access was encountered`.

microsoft#3429

NVIDIA/apex#1654

## PR Content

Following the solution in the apex repository
(NVIDIA/apex#1765), changing indexing type to
`int64` if necessary.

---------

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
dbyoung18 pushed a commit to dbyoung18/DeepSpeed that referenced this pull request Jun 11, 2024
## The Issue

Applying `FusedAdam` on large tensors will cause an error `CUDA error:
an illegal memory access was encountered`.

microsoft#3429

NVIDIA/apex#1654

## PR Content

Following the solution in the apex repository
(NVIDIA/apex#1765), changing indexing type to
`int64` if necessary.

---------

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants