Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream sync #6

Merged
merged 147 commits into from
Oct 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
147 commits
Select commit Hold shift + click to select a range
2130890
ADLR/megatron-lm!2092 - ci: Bump reference sha
ko3n1g Sep 11, 2024
6664dc6
Merge branch 'ko3n1g/ci/bump-sha-3' into 'main'
ko3n1g Sep 11, 2024
32949f2
ADLR/megatron-lm!2093 - ci: Disable broken test
ko3n1g Sep 11, 2024
df1418a
Merge branch 'ko3n1g/ci/disable-broken-test' into 'main'
ko3n1g Sep 11, 2024
f8b7c3f
ADLR/megatron-lm!1985 - Multimodal sequence length optimizations
trintamaki Sep 11, 2024
6151709
Merge branch 'trintamaki/multi-image-multi-tile-dataloader-seq-len' i…
jaredcasper Sep 11, 2024
3005d02
ADLR/megatron-lm!2094 - tests: Disable flaky test
ko3n1g Sep 11, 2024
9ec2337
Merge branch 'ko3n1g/tests/flaky-test-2' into 'main'
ko3n1g Sep 11, 2024
e5fb1fa
ADLR/megatron-lm!2004 - tests: Repeat MRs 5 times
ko3n1g Sep 12, 2024
028b777
Merge branch 'ko3n1g/ci/repeat-mrs' into 'main'
ko3n1g Sep 12, 2024
dcc6634
ADLR/megatron-lm!2091 - Don't pass device_id to torch.distributed.ini…
szmigacz Sep 12, 2024
76f9f48
Merge branch 'no_dist_device_id' into 'main'
ko3n1g Sep 12, 2024
bf7b978
ADLR/megatron-lm!2059 - ci: Add release tests for 0.9
ko3n1g Sep 14, 2024
21924d8
Merge branch 'ko3n1g/ci/release-tests' into 'main'
ko3n1g Sep 14, 2024
e6f1d81
ADLR/megatron-lm!2106 - fix: allow merge request CI for non-protected…
terrykong Sep 17, 2024
6562666
Merge branch 'terryk/ci-can-fail-on-unprotected-targets' into 'main'
ko3n1g Sep 17, 2024
0902af0
ADLR/megatron-lm!2107 - chore: Fix autoformatter for release branches
ko3n1g Sep 17, 2024
72008a0
Merge branch 'ko3n1g/chore/formatting-on-release-branch' into 'main'
ko3n1g Sep 17, 2024
2a8d8af
ADLR/megatron-lm!2104 - Fixing broken links
shanmugamr1992 Sep 17, 2024
3f10ff6
Merge branch 'docFix' into 'main'
shanmugamr1992 Sep 17, 2024
71d8ce7
ADLR/megatron-lm!2072 - Add video handling into multimodal mcore
Sep 17, 2024
0bda578
Merge branch 'add-video-handling' into 'main'
jon-barker Sep 17, 2024
ab7f706
ADLR/megatron-lm!1715 - Enable optional kwargs with CUDA graph
vasunvidia Sep 18, 2024
77b4bfe
Merge branch 'lora_cg' into 'main'
ko3n1g Sep 18, 2024
0cffc6b
ADLR/megatron-lm!2077 - Resolve "Fix TE version in TELinear"
Victarry Sep 18, 2024
461b06c
Merge branch '318-fix-te-version-in-telinear' into 'main'
ericharper Sep 18, 2024
6b78cb1
ADLR/megatron-lm!2112 - Update path to MMMU to use new repos structure
Sep 18, 2024
d350231
Merge branch 'fix_mmmu_mmodal' into 'main'
jon-barker Sep 18, 2024
cedd415
ADLR/megatron-lm!1880 - Removing env variable NVTE_ALLOW_NONDETERMINI…
shanmugamr1992 Sep 18, 2024
6b35ca8
Merge branch 'bertflash' into 'main'
shanmugamr1992 Sep 18, 2024
63be779
ADLR/megatron-lm!2033 - Online eval
trintamaki Sep 19, 2024
835af44
Merge branch 'trintamaki/online-eval' into 'main'
ericharper Sep 19, 2024
2c9bcac
ADLR/megatron-lm!1973 - MMMU multi-image support
trintamaki Sep 19, 2024
905de33
Merge branch 'trintamaki/multi-image-mmmu' into 'main'
jon-barker Sep 19, 2024
5c0697c
ADLR/megatron-lm!2113 - build: Use multi-stage for parallel builds
ko3n1g Sep 20, 2024
c394f78
Merge branch 'ko3n1g/build/pip' into 'main'
ko3n1g Sep 20, 2024
cf596b9
ADLR/megatron-lm!2126 - Only print warning when relevant
deepakn94 Sep 21, 2024
640e62f
Merge branch 'dnarayanan/warning_fix' into 'main'
jaredcasper Sep 21, 2024
3eeb932
ADLR/megatron-lm!2124 - tests: Fix location of megatron
ko3n1g Sep 21, 2024
205f946
Merge branch 'ko3n1g/tests/fix-location-of-megatron' into 'main'
ko3n1g Sep 21, 2024
d210eb0
ADLR/megatron-lm!2127 - ci: Bump sha
ko3n1g Sep 21, 2024
811a26a
Merge branch 'ko3n1g/chore/bump-sha' into 'main'
ko3n1g Sep 21, 2024
405135a
ADLR/megatron-lm!2128 - ci: Improve cherry pick workflow
ko3n1g Sep 22, 2024
fba615f
Merge branch 'ko3n1g/ci/improve-cherry-pick-workflow' into 'main'
ko3n1g Sep 22, 2024
95be3cb
ADLR/megatron-lm!2034 - ci: Introduce JET Python SDK
ko3n1g Sep 22, 2024
e79808c
Merge branch 'ko3n1g/ci/convergence-tests-with-jet' into 'main'
ko3n1g Sep 22, 2024
e10a9f4
ADLR/megatron-lm!2130 - ci: Improve cherry pick MR description
ko3n1g Sep 22, 2024
8e69382
Merge branch 'ko3n1g/ci/improve-cherry-pick-workflow' into 'main'
ko3n1g Sep 22, 2024
e35818d
ADLR/megatron-lm!2119 - Huvu/t5 te10 fix nemoci pr482
huvunvidia Sep 23, 2024
dbd2d18
Merge branch 'huvu/t5_TE10_fix_nemoci_PR482' into 'main'
ericharper Sep 23, 2024
8c666c2
ADLR/megatron-lm!2134 - ci: Set author and milestone for cherry-picks
ko3n1g Sep 23, 2024
6d8dc80
Merge branch 'ko3n1g/ci/cherry-pick-authro' into 'main'
ko3n1g Sep 23, 2024
c45f951
ADLR/megatron-lm!2135 - ci: Send alerts on unit-tests-extended
ko3n1g Sep 23, 2024
08e80b0
Merge branch 'ko3n1g/ci/notify-ut' into 'main'
ko3n1g Sep 23, 2024
643e60a
ADLR/megatron-lm!2133 - tests: Minor improvements to JET
ko3n1g Sep 23, 2024
8ec4617
Merge branch 'ko3n1g/ci/fixes-to-jet' into 'main'
ko3n1g Sep 23, 2024
5ade91a
ADLR/megatron-lm!2136 - tests: Fix GPT test
ko3n1g Sep 23, 2024
1f2d556
Merge branch 'ko3n1g/tests/fix-gpt-release-samples' into 'main'
ko3n1g Sep 23, 2024
e464e94
ADLR/megatron-lm!2139 - ci: Fix cherry-pick strings
ko3n1g Sep 23, 2024
0fd4617
Merge branch 'ko3n1g/ci/cherry-pick-strip-chars' into 'main'
ko3n1g Sep 23, 2024
ede39b8
ADLR/megatron-lm!2110 - Use torch dataloader in multimodal evaluation
trintamaki Sep 23, 2024
2065c35
Merge branch 'trintamaki/multimodal-eval-dataset' into 'main'
jon-barker Sep 23, 2024
697ea61
ADLR/megatron-lm!2137 - ci: Enable dev container for new features
ko3n1g Sep 23, 2024
075c727
Merge branch 'ko3n1g/ci/dev-container' into 'main'
ko3n1g Sep 23, 2024
5e23e72
ADLR/megatron-lm!2005 - Fix performance regression brought by torch.b…
xxuwenc Sep 24, 2024
884b087
Merge branch 'revert_bincount' into 'main'
ko3n1g Sep 24, 2024
ad38459
ADLR/megatron-lm!2073 - Multimodal batched bug fix
trintamaki Sep 24, 2024
162b82d
Merge branch 'trintamaki/multimodal_batch_bugfix' into 'main'
jon-barker Sep 24, 2024
32eac88
ADLR/megatron-lm!1581 - Add MLA support into MCore
BoxiangW Sep 24, 2024
dcf9e77
Merge branch 'boxiangw/mla' into 'main'
jaredcasper Sep 24, 2024
d207755
ADLR/megatron-lm!1995 - Add freeze options to pretrain_vlm
trintamaki Sep 25, 2024
891b8f9
Merge branch 'trintamaki/pretrain_vlm_freeze_option' into 'main'
jon-barker Sep 25, 2024
31c23f5
ADLR/megatron-lm!2145 - Improve logging when decreasing batch size
deepakn94 Sep 25, 2024
78bef1c
Merge branch 'dnarayanan/improve_logging' into 'main'
ericharper Sep 25, 2024
5aceacb
ADLR/megatron-lm!2148 - Add model.eval() to run_text_generation_serve…
mathemakitten Sep 25, 2024
4158084
Merge branch 'hn-set-model-eval-mode' into 'main'
jaredcasper Sep 25, 2024
368f561
ADLR/megatron-lm!2111 - Mcore llama3.1 support
jon-barker Sep 26, 2024
c1c19d1
Merge branch 'jbarker/llama3.1' into 'main'
ericharper Sep 26, 2024
1265399
ADLR/megatron-lm!2151 - ci: Run experimental UTs on dev image
ko3n1g Sep 26, 2024
c025cec
Merge branch 'ko3n1g/ci/uts-on-dev' into 'main'
ko3n1g Sep 26, 2024
f0d7120
ADLR/megatron-lm!1953 - Mcore export to export models to TRTLLM (GPU …
shanmugamr1992 Sep 26, 2024
45bf4c1
Merge branch 'final_export' into 'main'
shanmugamr1992 Sep 26, 2024
f5171f2
ADLR/megatron-lm!2154 - ci: Prune docker cache of `mcore-docker-node-…
ko3n1g Sep 26, 2024
e38d92a
Merge branch 'ko3n1g/ci/prune-container-cache-mcore-docker-node-jet' …
ko3n1g Sep 26, 2024
c31452c
ADLR/megatron-lm!2155 - Resolve release test failure caused by Groupe…
xxuwenc Sep 26, 2024
d55d61a
Merge branch 'xuwenc/release_perf_bugfix' into 'main'
ko3n1g Sep 26, 2024
3beefb5
ADLR/megatron-lm!2156 - tests: Set better name for Wandb logging
ko3n1g Sep 26, 2024
5553fc1
Merge branch 'ko3n1g/tests/better-logging-to-wandb' into 'main'
ko3n1g Sep 26, 2024
0976661
ADLR/megatron-lm!1950 - Remove pkg_resources package
ksivaman Sep 27, 2024
1585be2
Merge branch 'fix_version_checks' into 'main'
ko3n1g Sep 27, 2024
2bad957
ADLR/megatron-lm!2142 - ci: Onboard CW
ko3n1g Sep 27, 2024
12c2696
Merge branch 'ko3n1g/ci/onboard-cw' into 'main'
ko3n1g Sep 27, 2024
3428cd9
ADLR/megatron-lm!2158 - Small changes to export
shanmugamr1992 Sep 28, 2024
b3375a0
Merge branch 'new_export' into 'main'
ericharper Sep 28, 2024
5b7374a
ADLR/megatron-lm!2152 - Fix rope backward compatibility
BoxiangW Sep 30, 2024
6ad11b0
Merge branch 'boxiangw/mla_backwards_comp' into 'main'
jaredcasper Sep 30, 2024
ca6d170
ADLR/megatron-lm!2140 - [Bug fix] Don't trace graphs during inference
jiemingz Oct 1, 2024
dddecd1
Merge branch 'auto_cudagraph_val_fix' into 'main'
ericharper Oct 1, 2024
5ab659b
ADLR/megatron-lm!2109 - Adding more MR tests for T5 (e.g., transforme…
huvunvidia Oct 1, 2024
3efa8c2
Merge branch 'huvu/t5_dist_checkpoint_mrtests' into 'main'
ko3n1g Oct 1, 2024
f07581b
ADLR/megatron-lm!2164 - ci: Download artifacts
ko3n1g Oct 1, 2024
85cd99b
Merge branch 'ko3n1g/ci/artifacts' into 'main'
ko3n1g Oct 1, 2024
858694f
ADLR/megatron-lm!2165 - ci: Bump version
ko3n1g Oct 2, 2024
065260b
Merge branch 'ko3n1g/ci/backwards-tag' into 'main'
jaredcasper Oct 2, 2024
f76b465
ADLR/megatron-lm!2153 - Add the interface to set TP communication boo…
erhoo82 Oct 3, 2024
25f7da2
Merge branch 'tp_bootstrap_backend' into 'main'
ericharper Oct 3, 2024
50042ff
ADLR/megatron-lm!2095 - Add support for SigLIP vision encoder to mult…
Oct 3, 2024
4d5f94d
Merge branch 'convert_siglip_model' into 'main'
jaredcasper Oct 3, 2024
2aaf85d
ADLR/megatron-lm!2175 - adding cu_seqlens_padded support in MCore
Oct 4, 2024
c02b335
Merge branch 'add_cu_seqlens_padded_support' into 'main'
ericharper Oct 4, 2024
ee9dba2
ADLR/megatron-lm!2181 - Fixing attention mask dimenions to support TE…
shanmugamr1992 Oct 4, 2024
fde8bb1
Merge branch 'fixattnmask' into 'main'
ericharper Oct 4, 2024
843a22e
ADLR/megatron-lm!2180 - rotary_scaling fix for llama3.1 and 3.2
yueshen2016 Oct 4, 2024
b98ec86
Merge branch 'yueshen/rotary_scaling_fix_llama3_1' into 'main'
ericharper Oct 4, 2024
827d5b6
ADLR/megatron-lm!2185 - chore: Improve generator for launch scripts
ko3n1g Oct 4, 2024
31fe61a
Merge branch 'ko3n1g/ci/fix-launch-script-generator' into 'main'
ko3n1g Oct 4, 2024
e2a1c52
ADLR/megatron-lm!2160 - Adding Inference pipeline for T5
huvunvidia Oct 5, 2024
0acda93
Merge branch 'huvu/t5_generate' into 'main'
ericharper Oct 5, 2024
2f9ac3c
ADLR/megatron-lm!2182 - ci: Group runs by model
ko3n1g Oct 5, 2024
edb51fc
Merge branch 'ko3n1g/ci/group-runs' into 'main'
ko3n1g Oct 5, 2024
cf0d855
ADLR/megatron-lm!1862 - Cpu init te
wdykas Oct 5, 2024
0e6bef1
Merge branch 'cpu-init-te' into 'main'
ko3n1g Oct 5, 2024
6939737
ADLR/megatron-lm!2186 - ci: Run script after export
ko3n1g Oct 5, 2024
73e7b58
Merge branch 'ko3n1g/ci/run-script-after-export' into 'main'
ko3n1g Oct 5, 2024
6ca379e
ADLR/megatron-lm!2089 - Fix upcycling issues.
RayWang96 Oct 7, 2024
ff5cee9
Merge branch 'runtime-upcycling' into 'main'
ericharper Oct 7, 2024
a559ec1
ADLR/megatron-lm!2189 - tests: Fix ENV export
ko3n1g Oct 7, 2024
3f90b98
Merge branch 'ko3n1g/ci/fix-env-export' into 'main'
ko3n1g Oct 7, 2024
e108535
ADLR/megatron-lm!2194 - tests: Fix ENV export
ko3n1g Oct 9, 2024
3f43927
Merge branch 'ko3n1g/ci/fix-env-export' into 'main'
ko3n1g Oct 9, 2024
fbdc916
ADLR/megatron-lm!1790 - GroupedMLP DistOpt Resharding and add UTs to …
hxbai Oct 9, 2024
b1218b9
Merge branch 'hongxiaob/moe_dist_ckpt' into 'main'
ko3n1g Oct 9, 2024
5776d06
ADLR/megatron-lm!2197 - ci: Always upload artifacts
ko3n1g Oct 9, 2024
bf74129
Merge branch 'ko3n1g/ci/always-artifacts' into 'main'
ko3n1g Oct 9, 2024
0e3eaa5
ADLR/megatron-lm!2141 - Data parallel inference
trintamaki Oct 9, 2024
fcdbf90
Merge branch 'trintamaki/data-parallel-inference' into 'main'
jon-barker Oct 9, 2024
37a2116
ADLR/megatron-lm!2199 - Remove CUDA requirement from cpu test.
Oct 9, 2024
228dc20
Merge branch 'vitalyk/testfix' into 'main'
ko3n1g Oct 9, 2024
f462160
ADLR/megatron-lm!2096 - Support padding between subsequences of Packe…
parthmannan Oct 10, 2024
7e90ec0
Merge branch 'packed_seq_padded_support' into 'main'
ericharper Oct 10, 2024
566d9cd
ADLR/megatron-lm!2206 - Revert "Merge branch 'vitalyk/testfix' into '…
ko3n1g Oct 10, 2024
b60f5d0
Merge branch 'revert-228dc204' into 'main'
ko3n1g Oct 10, 2024
13c39ac
ADLR/megatron-lm!1909 - Standard interface for getting offsets from t…
sancha Oct 11, 2024
47bb8d1
Merge branch 'sasatheesh/tokenizer_offsets' into 'main'
ericharper Oct 11, 2024
8c018ca
ADLR/megatron-lm!2208 - tests: Use flaky instead of skip marker
ko3n1g Oct 11, 2024
772faca
Merge branch 'ko3n1g/ci/flaky-marker' into 'main'
ko3n1g Oct 11, 2024
e8c077c
Merge remote-tracking branch 'upstream/main' into upstream_sync
gurpreet-dhami Oct 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
50 changes: 25 additions & 25 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,22 +13,28 @@ workflow:
FUNCTIONAL_TEST: "no"
- if: $CI_MERGE_REQUEST_LABELS =~ /Run tests/ && $CI_MERGE_REQUEST_TARGET_BRANCH_SHA != ""
variables:
FUNCTIONAL_TEST: "yes"
FUNCTIONAL_TEST_SCOPE: mr
UNIT_TEST_REPEAT: 5
UNIT_TEST_TIMEOUT: 50
FUNCTIONAL_TEST: "yes"
FUNCTIONAL_TEST_SCOPE: mr
FUNCTIONAL_TEST_CLUSTER_A100: ""
FUNCTIONAL_TEST_CLUSTER_H100: ""
- if: $CI_MERGE_REQUEST_LABELS =~ /Run nightly/ && $CI_MERGE_REQUEST_TARGET_BRANCH_SHA != ""
variables:
FUNCTIONAL_TEST: "yes"
FUNCTIONAL_TEST_SCOPE: nightly
UNIT_TEST_REPEAT: 5
UNIT_TEST_TIMEOUT: 50
FUNCTIONAL_TEST: "yes"
FUNCTIONAL_TEST_SCOPE: nightly
FUNCTIONAL_TEST_CLUSTER_A100: ""
FUNCTIONAL_TEST_CLUSTER_H100: ""
- if: $CI_MERGE_REQUEST_LABELS =~ /Run weekly/ && $CI_MERGE_REQUEST_TARGET_BRANCH_SHA != ""
variables:
FUNCTIONAL_TEST: "yes"
FUNCTIONAL_TEST_SCOPE: weekly
UNIT_TEST_REPEAT: 5
UNIT_TEST_TIMEOUT: 50
FUNCTIONAL_TEST: "yes"
FUNCTIONAL_TEST_SCOPE: weekly
FUNCTIONAL_TEST_CLUSTER_A100: ""
FUNCTIONAL_TEST_CLUSTER_H100: ""
- if: $CI_PIPELINE_SOURCE == "merge_request_event" && $CI_MERGE_REQUEST_TARGET_BRANCH_SHA != ""
variables:
FUNCTIONAL_TEST: "no"
Expand Down Expand Up @@ -58,29 +64,23 @@ variables:
- "mr"
- "nightly"
- "weekly"
- "pre-release"
- "release"
description: "Testsuite to run (only for FUNCTIONAL_TEST=yes)"
FUNCTIONAL_TEST_CLUSTER:
FUNCTIONAL_TEST_CLUSTER_A100:
value: "dgxa100_dracooci"
options:
- "dgxa100_dracooci"
- "dgxa100_dracooci-ord"
- "dgxh100_eos"
description: '"dgxa100_dracooci" for OCI-IAD, "dgxh100_eos" for EOS'
CONVERGENCE_TEST:
value: "no"
description: 'Cluster for A100 workloads'
FUNCTIONAL_TEST_CLUSTER_H100:
value: "dgxh100_eos"
options:
- "yes"
- "no"
description: To run a convergence test
CONVERGENCE_TEST_SCOPE:
value: "release"
options:
- "release"
- "pre-release"
description: "Test suite to run (only for CONVERGENCE_TEST=yes)"
CONVERGENCE_TEST_RUN_NAME:
value: "pre-release-$$CI_PIPELINE_ID"
description: "Run directory of convergence test"
- "dgxh100_coreweave"
- "dgxh100_eos"
description: 'Cluster for H100 workloads'
FUNCTIONAL_TEST_NAME:
description: "Name of functional test run (only for pre-release and release)"
PUBLISH:
value: "no"
options:
Expand All @@ -96,6 +96,7 @@ variables:

# CI wide variables
CI_MCORE_IMAGE: ${GITLAB_ENDPOINT}:5005/adlr/megatron-lm/mcore_ci
CI_MCORE_DEV_IMAGE: ${GITLAB_ENDPOINT}:5005/adlr/megatron-lm/mcore_ci_dev
CI_NEMO_IMAGE: ${GITLAB_ENDPOINT}:5005/adlr/megatron-lm/nemo_ci
LINTING_IMAGE: ${GITLAB_ENDPOINT}:5005/adlr/megatron-lm/mcore_linting
UNIT_TEST_TIMEOUT: 15
Expand All @@ -105,5 +106,4 @@ include:
- .gitlab/stages/00.pre.yml
- .gitlab/stages/01.tests.yml
- .gitlab/stages/02.functional-tests.yml
- .gitlab/stages/03.convergence-tests.yml
- .gitlab/stages/04.publish.yml
- .gitlab/stages/03.publish.yml
42 changes: 27 additions & 15 deletions .gitlab/stages/00.pre.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,15 @@
include:
- template: Security/Secret-Detection.gitlab-ci.yml

.pre_mr_rules:
rules:
- if: $CI_PIPELINE_SOURCE == 'merge_request_event' && $CI_MERGE_REQUEST_TARGET_BRANCH_PROTECTED != "true"
allow_failure: true
when: always
- if: $CI_PIPELINE_SOURCE == 'merge_request_event'
- when: never
stage: .pre

mirror_to_github:
rules:
- if: '$CI_COMMIT_REF_PROTECTED == "true" && $CI_PIPELINE_SOURCE == "push"'
Expand Down Expand Up @@ -35,14 +44,11 @@ create_ci_branches:
GIT_STRATEGY: "clone"
script:
- git remote set-url origin "https://gitlab-ci-token:${PROJECT_ACCESS_TOKEN_MCORE}@${GITLAB_ENDPOINT}/adlr/megatron-lm.git"
- git switch --force-create $branch;
- git switch --force-create $branch
- git push --force -u origin $branch

label_merge_request:
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
- when: never
stage: .pre
extends: [.pre_mr_rules]
image: golang:1.22
tags:
- mcore-docker-node-small
Expand All @@ -62,17 +68,18 @@ label_merge_request:
curl --header "PRIVATE-TOKEN: ${PROJECT_ACCESS_TOKEN_MCORE}" --url "https://${GITLAB_ENDPOINT}/api/v4/projects/${CI_PROJECT_ID}/merge_requests/${CI_MERGE_REQUEST_IID}" --data-urlencode "add_labels=$LABELS" -X PUT

clean_docker_node:
stage: .pre
extends: [.pre_mr_rules]
image: docker:26.1.4-dind
tags:
- ${node}
parallel:
matrix:
- node: 8xL40S
- node: mcore-docker-node-small
- node: mcore-docker-node-jet
script:
- export DOCKER_HOST='unix:///var/run/docker.sock'
- docker system prune -a --filter "until=48h" -f || true
- docker system prune -a --filter "until=36h" -f || true

maybe_cherry_pick_commit:
rules:
Expand All @@ -95,8 +102,13 @@ maybe_cherry_pick_commit:
- git config --global user.email "mcore-bot@nvidia.com"
- git config --global user.name "Mcore Bot"
- |
LABELS=$(curl --header "PRIVATE-TOKEN: ${PROJECT_ACCESS_TOKEN_MCORE}" --url "https://${GITLAB_ENDPOINT}/api/v4/projects/${CI_PROJECT_ID}/merge_requests/${MR_ID}" | jq '.labels | join(",")' | tr -d '"')

MR=$(curl --header "PRIVATE-TOKEN: ${PROJECT_ACCESS_TOKEN_MCORE}" --url "https://${GITLAB_ENDPOINT}/api/v4/projects/${CI_PROJECT_ID}/merge_requests/${MR_ID}")

LABELS=$(echo -E $MR | jq '.labels | join(",")' | tr -d '"')
AUTHOR_ID=$(echo -E $MR | jq '.author.id' | tr -d '"')
AUTHOR_NAME=$(echo -E $MR | jq '.author.username' | tr -d '"')
TITLE=$(echo -E $MR | jq '.title' | tr -d '"')
MILESTONE_ID=$(echo -E $MR | jq '.milestone.id' | tr -d '"')
TARGET_BRANCHES=$(echo "$LABELS" | grep -o 'core_[^,]*')

if [[ $TARGET_BRANCHES == "" ]]; then
Expand Down Expand Up @@ -128,8 +140,11 @@ maybe_cherry_pick_commit:
--url https://${GITLAB_ENDPOINT}/api/v4/projects/${CI_PROJECT_ID}/merge_requests \
-d "source_branch=cherry-pick-$MR_ID-$RELEASE_BRANCH" \
-d "target_branch=$RELEASE_BRANCH" \
-d "title=Cherry-pick $MR_ID into $RELEASE_BRANCH" \
-d "labels=cherry-pick"
-d "title=Cherry pick \`$TITLE ($MR_ID)\` into \`$RELEASE_BRANCH\`" \
-d "labels=cherry-pick" \
-d "reviewer_ids=$AUTHOR_ID" \
-d "milestone_id=$MILESTONE_ID" \
-d "description=[🤖]: Hi @$AUTHOR_NAME 👋,<br><br>we've cherry picked \`$TITLE ($MR_ID)\` into \`$RELEASE_BRANCH\` for you! 🚀<br><br>Please review and approve this cherry pick by your convenience\!"

else
URL=https://${GITLAB_ENDPOINT}/ADLR/megatron-lm/-/merge_requests/$MR_ID
Expand All @@ -154,10 +169,7 @@ maybe_cherry_pick_commit:
interruptible: false

check_milestone:
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
- when: never
stage: .pre
extends: [.pre_mr_rules]
image: ${GITLAB_ENDPOINT}:5005/adlr/megatron-lm/mcore_ci:buildcache
tags:
- mcore-docker-node-small
Expand Down
Loading