Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream sync #6

Merged
merged 147 commits into from
Oct 18, 2024
Merged

Upstream sync #6

merged 147 commits into from
Oct 18, 2024

Conversation

gurpreet-dhami
Copy link
Collaborator

No description provided.

ko3n1g and others added 30 commits September 11, 2024 12:20
ci: Bump reference sha

See merge request ADLR/megatron-lm!2092
ci: Disable broken test

See merge request ADLR/megatron-lm!2093
…nto 'main'

Multimodal sequence length optimizations

See merge request ADLR/megatron-lm!1985
tests: Disable flaky test

See merge request ADLR/megatron-lm!2094
tests: Repeat MRs 5 times

See merge request ADLR/megatron-lm!2004
…t_process_group, it causes hangs

Co-authored-by: Szymon Migacz <1934379+szmigacz@users.noreply.github.com>
Don't pass device_id to torch.distributed.init_process_group, it causes hangs

See merge request ADLR/megatron-lm!2091
ci: Add release tests for 0.9

See merge request ADLR/megatron-lm!2059
fix: allow merge request CI for non-protected branches to fail

See merge request ADLR/megatron-lm!2106
chore: Fix autoformatter for release branches

See merge request ADLR/megatron-lm!2107
Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Fixing broken links

See merge request ADLR/megatron-lm!2104
Add video handling into multimodal mcore

See merge request ADLR/megatron-lm!2072
Enable optional kwargs with CUDA graph

See merge request ADLR/megatron-lm!1715
Resolve "Fix TE version in TELinear"

Closes NVIDIA#318

See merge request ADLR/megatron-lm!2077
Update path to MMMU to use new repos structure

See merge request ADLR/megatron-lm!2112
…STIC_ALGO

Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com>
Removing env variable NVTE_ALLOW_NONDETERMINISTIC_ALGO

See merge request ADLR/megatron-lm!1880
RayWang96 and others added 23 commits October 7, 2024 14:22
Fix upcycling issues.

See merge request ADLR/megatron-lm!2089
tests: Fix ENV export

See merge request ADLR/megatron-lm!2189
tests: Fix ENV export

See merge request ADLR/megatron-lm!2194
…ChainedOptimizer Support for distributed checkpointing
GroupedMLP DistOpt Resharding and add UTs to ChainedOptimizer Support for distributed checkpointing

See merge request ADLR/megatron-lm!1790
ci: Always upload artifacts

See merge request ADLR/megatron-lm!2197
Data parallel inference

See merge request ADLR/megatron-lm!2141
Remove CUDA requirement from cpu test.

See merge request ADLR/megatron-lm!2199
Support padding between subsequences of Packed Sequence

See merge request ADLR/megatron-lm!2096
Revert "Merge branch 'vitalyk/testfix' into 'main'"

See merge request ADLR/megatron-lm!2206
Standard interface for getting offsets from tokenizers

See merge request ADLR/megatron-lm!1909
tests: Use flaky instead of skip marker

See merge request ADLR/megatron-lm!2208
@lcskrishna
Copy link
Collaborator

@gurpreet-dhami Can we re-run the Unit tests once again, just to see if we are not regressing on anything.

@gurpreet-dhami
Copy link
Collaborator Author

Able to run llama2 70b on this.

@gurpreet-dhami
Copy link
Collaborator Author

gurpreet-dhami commented Oct 18, 2024

Verbally approved by Chaitanya @lcskrishna .
Need this update to add rope support.

@gurpreet-dhami gurpreet-dhami merged commit 42b34ba into rocm_dev Oct 18, 2024
4 checks passed
@gurpreet-dhami gurpreet-dhami deleted the upstream_sync branch December 6, 2024 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.