Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPT extrapolatable position embedding (xpos/sandwich/alibi/kerple) and Flash Attention #6666

Merged
merged 571 commits into from
Jun 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
571 commits
Select commit Hold shift + click to select a range
a8564d3
move to nvidia megatron repo (#6465) (#6475)
github-actions[bot] Apr 24, 2023
7a17f73
Megatron KERPLE positional embeddings (#6478) (#6480)
github-actions[bot] Apr 24, 2023
a67b00f
Fix an invalid link in get_data.py of ljspeech (#6456)
pythinker Apr 24, 2023
1e1fbbe
1. Added external index sample. (#6462) (#6483)
github-actions[bot] Apr 25, 2023
4561e12
Update README to add core installation (#6488) (#6489)
github-actions[bot] Apr 25, 2023
599f522
Fix cache aware hybrid bugs (#6466) (#6484)
github-actions[bot] Apr 25, 2023
ae4a4dd
Fix typos (#6494) (#6495)
github-actions[bot] Apr 26, 2023
df2b870
Add disclaimer about dataset for ASR (#6496)
titu1994 Apr 26, 2023
0c85e21
fix (#6502)
Jorjeous Apr 26, 2023
24c77d0
fix broken links r1.18.0 (#6501) (#6504)
github-actions[bot] Apr 26, 2023
07f6533
[TTS] Create functions for TTS preprocessing without dataloader (#6317)
rlangman Apr 27, 2023
8bffc80
Cache aware streaming nfa (#6209)
Slyne Apr 27, 2023
6b84a8a
[BugFix] Force _get_batch_preds() to keep logits in decoder timestamp…
tango4j Apr 28, 2023
56ce2a6
[TTS] Fix FastPitch energy code (#6511)
rlangman Apr 28, 2023
b460716
fix custom forward_torch_softmax (#6512) (#6517)
github-actions[bot] Apr 28, 2023
319b191
[TTS] fixed broken path. (#6514) (#6518)
github-actions[bot] Apr 28, 2023
2dd91fa
Fix normalization of impulse response in ImpulsePerturbation (#6505)
anteju Apr 28, 2023
d0e2f5a
Add interleaved pp support (#6498)
titu1994 Apr 28, 2023
3cff6ce
Fix typos (#6523)
titu1994 May 1, 2023
c2a4264
New noise_norm perturbation based on Riva work (#6445)
trias702 May 2, 2023
669a8c2
[TTS] Add script for computing feature stats (#6508)
rlangman May 2, 2023
798978d
Add Frame-VAD model and datasets (#6441)
stevehuang52 May 2, 2023
cb53ede
Support dynamic length batches with GPT SFT (#6510)
aklife97 May 2, 2023
1217668
added back the fast emit section to the configs. (#6540) (#6542)
github-actions[bot] May 3, 2023
5090a94
removing unnessary avoid_bfloat16_autocast_context (#6481)
bmwshop May 3, 2023
b2f23bd
FC models in menu (#6473)
bmwshop May 3, 2023
6c77583
[TTS] Add tutorials for FastPitch TTS speaker adaptation with adapter…
hsiehjackson May 3, 2023
ce84b1f
[TTS] Create initial TTS dataset feature processors (#6507)
rlangman May 3, 2023
8bbc140
fix (#6529) (#6546)
github-actions[bot] May 3, 2023
dc0c332
Add FastConformer Hybrid ASR models for EN, ES, IT, DE, PL, HR, UA, B…
github-actions[bot] May 4, 2023
42691c3
Add scores for FastConformer models (#6557) (#6558)
github-actions[bot] May 4, 2023
e7f2210
Fix fp16 (#6543) (#6544)
github-actions[bot] May 4, 2023
69b2c34
Patch transcribe and support offline transcribe for hybrid model (#65…
github-actions[bot] May 4, 2023
24076ca
Fix notebook bad json (#6561)
titu1994 May 4, 2023
b41a511
Change Megatron Enc Dec model to use persistent_workers (#6548) (#6552)
github-actions[bot] May 4, 2023
77369ef
Make KenLM with PC for AggregateTokenizer and merge it (#6081)
karpnv May 4, 2023
fa62794
fix for running on 1 GPU.
khcs May 4, 2023
3817d41
temp rtd fix (#6568) (#6569)
github-actions[bot] May 4, 2023
a57ec70
[TTS] Add script for mapping speaker names to indices (#6509)
rlangman May 5, 2023
5fd9c7f
whitespace (#6574)
karpnv May 5, 2023
04c1b72
Update manifest.py for speedup (#6565) (#6573)
github-actions[bot] May 5, 2023
c13ffb9
More streaming conformer export fixes (#6567) (#6578)
github-actions[bot] May 5, 2023
846fc83
user selected max_seq_len should be less than model's max_seq_len (#6…
github-actions[bot] May 5, 2023
c19aac5
Framework for PEFT via mixins (#6391)
arendu May 5, 2023
fba50b8
cache and reuse inputs (#6422) (#6452)
github-actions[bot] May 7, 2023
d0785d5
Add patches for Virtual Parallel conversion (#6589)
titu1994 May 8, 2023
c7f58d8
Pass `.scale` instead of scaler object to core (#6551)
github-actions[bot] May 8, 2023
58440fb
Documentation for ASR-TTS models (#6594) (#6595)
github-actions[bot] May 8, 2023
aa2b9b8
[TTS] Fix aligner nan loss in fp32 (#6435)
hsiehjackson May 8, 2023
cf60b6c
Update SDP docs (#6485) (#6596)
github-actions[bot] May 8, 2023
3c1147f
Bug/typo fixes (#6599)
Kipok May 9, 2023
08ab1a7
Manual garbage collection with an interval (#6469) (#6482)
github-actions[bot] May 9, 2023
3ed0282
Make tensor split contiguous (#6580) (#6593)
github-actions[bot] May 9, 2023
a9d2910
[ASR] Fix for old models in change_attention_model (#6608)
sam1373 May 10, 2023
077b7f9
Update manifest.py to use os.path for get_full_path (#6598)
stevehuang52 May 10, 2023
9eed6d3
Cherry pick commits in #6601 to main (#6611)
fayejf May 10, 2023
77b9a85
Create dummy iters to satisy len checks (#6600) (#6603)
github-actions[bot] May 10, 2023
9f367f4
add GPT eval mode fix for interleaved to main (#6610)
aklife97 May 10, 2023
8592562
Fix batch size reconf for T5 FT for multi-validation (#6582) (#6588)
github-actions[bot] May 10, 2023
b3f5f39
Not doing CastToFloat by default (#6524) (#6563)
github-actions[bot] May 10, 2023
09f2e37
Turn autocast off when precision is fp32 (#6576)
github-actions[bot] May 10, 2023
2a446cb
update core commit hash in readme (#6622) (#6623)
github-actions[bot] May 10, 2023
2cc0f62
add hat image to docs (#6619) (#6621)
github-actions[bot] May 11, 2023
94e6e25
Allow indices exchange via distributed (#6618) (#6624)
github-actions[bot] May 11, 2023
7f48130
Offline and streaming inference support for hybrid model (#6570)
fayejf May 11, 2023
c44e3b6
Patch decoding for PC models (#6630) (#6631)
github-actions[bot] May 11, 2023
ef49b0a
Fix wer.py where 'errors' variable was not set (#6633) (#6634)
github-actions[bot] May 11, 2023
1b785e2
Restore GPT support for interleaved pipeline parallelism (#6528) (#6613)
timmoon10 May 11, 2023
44e890e
Add FA
hsiehjackson May 12, 2023
a5fcbee
Fix XPOS
hsiehjackson May 12, 2023
aedcc7c
Add warning
hsiehjackson May 12, 2023
7fbf571
Fix bugs
hsiehjackson May 13, 2023
ddb067e
Fix attention
hsiehjackson May 13, 2023
81a8c21
Fix comment
hsiehjackson May 15, 2023
36d685b
Fix cast dtype
hsiehjackson May 15, 2023
a1d1e5a
Undo xpos
hsiehjackson May 15, 2023
2eaa60a
bugfix (#6636)
fayejf May 11, 2023
5eb3552
Disable interctc tests (#6638)
Kipok May 11, 2023
4e94268
Add megatron_core to requirements (#6639) (#6640)
github-actions[bot] May 11, 2023
56847f3
Remove from jenkins (#6642)
github-actions[bot] May 11, 2023
986feed
sft model can use this script for eval (#6637)
arendu May 12, 2023
6d2c969
[TTS] Fix TTS audio preprocessing bugs (#6628)
rlangman May 12, 2023
954d43f
Move black parameters to pyproject.toml (#6647)
artbataev May 12, 2023
11c58f3
ASR-TTS Models: Support hybrid RNNT-CTC, improve docs. (#6620)
artbataev May 12, 2023
db7d578
fix conversion and eval (#6648)
arendu May 13, 2023
acb2c56
Confidence ensembles implementation (#6614)
Kipok May 15, 2023
1b28a7b
Patch memory used for NeMo Megatron models (#6615)
titu1994 May 15, 2023
6fb6e47
handle artifacts when path is dir (#6658)
arendu May 16, 2023
4ccba61
remove upgrading setuptools in reinstall.sh (#6659)
XuesongYang May 16, 2023
82d5d58
merge lora weights into base model (#6597)
arendu May 16, 2023
89b428c
upgrade to 23.04 (#6660)
ericharper May 16, 2023
9683d02
Merge r1.18.0 bugfixes and doc updates to main (#6655)
ericharper May 16, 2023
c648d99
Confidence ensembles: fix issues and add tuning functionality (#6657)
Kipok May 16, 2023
f736f60
[TTS] Implement new TextToSpeech dataset (#6575)
rlangman May 16, 2023
4e7afbb
Dialogue dataset (#6654)
yidong72 May 16, 2023
7e62925
Add support for RNNT/hybrid models to partial transcribe (#6609)
stevehuang52 May 16, 2023
e009385
eval_beamsearch_ngram.py with hybrid ctc (#6656)
karpnv May 17, 2023
c5e229a
fix bucketing bug issue for picking new bucket (#6663)
nithinraok May 17, 2023
9d7d0b1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 17, 2023
b739a5e
Add t5 flash-attention
hsiehjackson May 18, 2023
473ff20
PE refactor (#6673)
hsiehjackson May 18, 2023
4a0699d
Add singleton alibi
hsiehjackson May 18, 2023
9cfea92
Fix FA mask
hsiehjackson May 18, 2023
8c3bfbd
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 18, 2023
9d01255
singleton PE
hsiehjackson May 18, 2023
8a6e294
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 18, 2023
8bd1466
Fix attn bias inference
hsiehjackson May 22, 2023
0e02478
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 18, 2023
8ed6a0a
fix eval
ekmb May 19, 2023
a6b856c
[TTS] Add callback for saving audio during FastPitch training (#6665)
rlangman May 18, 2023
213b5a3
update batch size recommendation to min 32 for 43b (#6675)
Zhilin123 May 18, 2023
1b93141
Make Note usage consistent in adapter_mixins.py (#6678)
BrianMcBrayer May 18, 2023
d2938b9
Fix masking bug for TTS Aligner (#6677)
redoctopus May 18, 2023
1564d94
[ASR] Adding ssl config for fast-conformer (#6672)
krishnacpuvvada May 19, 2023
82f863b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 19, 2023
8b55842
Fix xpos offset
hsiehjackson May 23, 2023
fbdd7fe
Fix sequence parallel
hsiehjackson May 24, 2023
8535a6a
Fix parallel
hsiehjackson May 24, 2023
873f2e1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 23, 2023
7847a54
Uncomment correct bias size
hsiehjackson May 24, 2023
4aa46d7
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 24, 2023
7514bf4
Remove unused module
hsiehjackson May 25, 2023
9a133d0
Fix singleton tril
hsiehjackson May 25, 2023
5ce3819
Fix kerple/sandwitch rename xpos
hsiehjackson May 25, 2023
bbee276
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2023
de61214
fix sandwich
hsiehjackson May 25, 2023
dcab11e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2023
cd3bb6d
Add unitest
hsiehjackson May 30, 2023
4fac042
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 25, 2023
129e55d
Fix bug
hsiehjackson May 30, 2023
3b5ec97
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 30, 2023
b2eb222
Add requirements
hsiehjackson May 30, 2023
c73f983
Remove requirements
hsiehjackson May 30, 2023
06ce313
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 30, 2023
8c969fe
Remove requirement flash-attn
hsiehjackson May 30, 2023
f70cc3f
Fix FA causal for inference
hsiehjackson Jun 1, 2023
a0cea83
Add experimental PE
hsiehjackson Jun 1, 2023
c7c6a1b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 1, 2023
6876703
Update all invalid tree references to blobs for NeMo samples (#6679)
BrianMcBrayer May 19, 2023
6c65625
Update README.rst about container (#6686)
fayejf May 19, 2023
456153a
Fix a bug, use _ceil_to_nearest instead as _round_to_nearest is not d…
github-actions[bot] May 20, 2023
69992d6
Enable ONNX export of 5B GPT trained with TE FP8 modules (#6458)
asfiyab-nvidia May 22, 2023
4ee6d8f
[TTS] Add script for text preprocessing (#6541)
rlangman May 22, 2023
c856936
[TTS] Fix adapter duration issue (#6697)
hsiehjackson May 22, 2023
b70dbf7
karpnv/issues6690 (#6705)
karpnv May 23, 2023
1a66d30
Limit codeql scope (#6710)
titu1994 May 23, 2023
ff772f7
eval fix (#6685)
arendu May 23, 2023
2231a57
Fix k2 installation in Docker with CUDA 12 (#6707) (#6709)
github-actions[bot] May 24, 2023
8b3dce5
[TTS] Filter out silent audio files during preprocessing (#6716)
rlangman May 24, 2023
963855b
not pinning version (#6680)
yidong72 May 24, 2023
b0f33f1
Tutorial fixes (#6717) (#6718)
github-actions[bot] May 24, 2023
a4ef711
preprocess squad in sft format (#6727)
arendu May 25, 2023
da5e6f8
Fix Codeql (#6731)
titu1994 May 25, 2023
2c35e0b
[TTS] fix inconsistent type hints for IpaG2p (#6733)
XuesongYang May 26, 2023
2bac13d
VP Fixes for converter + Config management (#6698)
titu1994 May 26, 2023
5831405
Graph RNNT: Grid- and Compose-Transducer. W-Transducer loss (#6168)
artbataev May 26, 2023
2e963da
Fix fastpitch test nightly (#6730)
hsiehjackson May 26, 2023
7f83283
Fix for interctc test random failure (#6644)
Kipok May 26, 2023
599c503
check for first or last stage (#6708) (#6743)
github-actions[bot] May 27, 2023
0725b2d
sharded manifests docs (#6751)
bmwshop May 29, 2023
bdeab5b
[TTS] relax hardcoded prefix for phonemes and tones and infer phoneme…
XuesongYang May 30, 2023
146371b
[TTS] corrected misleading deprecation warnings. (#6702)
XuesongYang May 30, 2023
8f43ae3
Bug fix to restore act ckpt (#6753) (#6755)
github-actions[bot] May 31, 2023
7daad62
Bug fix to reset sequence parallelism (#6756) (#6770)
github-actions[bot] May 31, 2023
49e016e
Fix TTS adapter tutorial (#6741)
hsiehjackson May 31, 2023
34f5452
Fix checkpointed forward and add test for full activation checkpointi…
github-actions[bot] May 31, 2023
c022acb
lora notebook (#6765)
arendu May 31, 2023
e98f425
Fix Links (#6777) (#6778)
github-actions[bot] May 31, 2023
bcb3fd3
Remove alibi tril
hsiehjackson Jun 1, 2023
71bff2f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 1, 2023
2e6eba5
Add flash-attn requirement
hsiehjackson Jun 1, 2023
424a15d
revert sft dataset changes
ekmb Jun 1, 2023
e79a35a
Move flash-attn requirement
hsiehjackson Jun 1, 2023
4c953aa
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 1, 2023
0b18768
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 1, 2023
8863360
Add install
hsiehjackson Jun 1, 2023
2dc0418
peft eval directly from ckpt (#6785)
arendu Jun 1, 2023
1353aca
Add Frame-VAD examples and utils (#6463)
stevehuang52 Jun 1, 2023
b8d19b2
[TTS][zh] refine hardcoded lowercase for ASCII letters. (#6781)
XuesongYang Jun 2, 2023
7ad325d
Revert evaluation
hsiehjackson Jun 2, 2023
b875a78
Revert evaluation
hsiehjackson Jun 2, 2023
1f229c0
Fix
hsiehjackson Jun 2, 2023
26dbc9f
Fix gpu
hsiehjackson Jun 2, 2023
a3cf08e
Spellchecking ASR customization model (#6179)
bene-ges Jun 2, 2023
90ef33a
Fix test
hsiehjackson Jun 2, 2023
380a6f2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 2, 2023
b69cbf7
Fix device
hsiehjackson Jun 2, 2023
de52c2d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 2, 2023
8dc863b
Fix conflict
hsiehjackson Jun 2, 2023
29c3cd4
Merge branch 'main' into gpt-alibi-FA
hsiehjackson Jun 2, 2023
e782202
Revert
hsiehjackson Jun 2, 2023
7f40a05
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
hsiehjackson Jun 2, 2023
d814f47
clean
hsiehjackson Jun 2, 2023
65118c4
Change device
hsiehjackson Jun 2, 2023
89d4547
Change device
hsiehjackson Jun 2, 2023
9c50e29
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 2, 2023
218ffa3
Merge branch 'main' into gpt-alibi-FA
hsiehjackson Jun 2, 2023
84acce0
Add test FA
hsiehjackson Jun 5, 2023
874f992
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
hsiehjackson Jun 5, 2023
35ac850
Merge branch 'main' into gpt-alibi-FA
hsiehjackson Jun 5, 2023
98783ce
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 5, 2023
08dbd86
Add CI
hsiehjackson Jun 5, 2023
bdfe61e
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
hsiehjackson Jun 5, 2023
6df2df8
Fix yaml order
hsiehjackson Jun 5, 2023
1f460d9
Test random attention mask
hsiehjackson Jun 5, 2023
01f4391
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 5, 2023
23634bf
Add install FA for tests
hsiehjackson Jun 6, 2023
4cfb2da
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
hsiehjackson Jun 6, 2023
528c416
cherry pick 6788 (#6816)
ekmb Jun 6, 2023
a751928
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
hsiehjackson Jun 6, 2023
ee692d4
Merge branch 'main' into gpt-alibi-FA
hsiehjackson Jun 6, 2023
5178f6b
Support 2D mask
hsiehjackson Jun 6, 2023
45876ad
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 6, 2023
1c15644
add missing comp_att_mask arg
ekmb Jun 6, 2023
74da509
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
ekmb Jun 6, 2023
5da1bc3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 6, 2023
fb895da
Merge branch 'main' into gpt-alibi-FA
hsiehjackson Jun 6, 2023
81d2fb0
Fix code ql
hsiehjackson Jun 6, 2023
b578ff5
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
hsiehjackson Jun 6, 2023
82120c3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 6, 2023
a9bb73e
Merge branch 'main' into gpt-alibi-FA
hsiehjackson Jun 6, 2023
662733b
Megatron MPT-7B Support (#6804)
trias702 Jun 7, 2023
6b18be2
Fix test triton
hsiehjackson Jun 7, 2023
bdd91d6
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
hsiehjackson Jun 7, 2023
92e7dba
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 7, 2023
bb89e61
Update FA in CI
hsiehjackson Jun 7, 2023
672f262
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
hsiehjackson Jun 7, 2023
2a526ad
Fix Jenkin error
hsiehjackson Jun 7, 2023
0ac5374
Resume with FA
hsiehjackson Jun 7, 2023
7acf5cf
Follow comments
hsiehjackson Jun 7, 2023
cdff779
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 7, 2023
dcab29d
Merge branch 'main' into gpt-alibi-FA
hsiehjackson Jun 7, 2023
aba44ae
Fix README
hsiehjackson Jun 7, 2023
7c0a530
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
hsiehjackson Jun 7, 2023
194b4bb
Fix README
hsiehjackson Jun 7, 2023
fe173c1
Remove torch.cuda
hsiehjackson Jun 7, 2023
7c38447
Merge branch 'main' into gpt-alibi-FA
hsiehjackson Jun 7, 2023
1104174
Remove unused import
hsiehjackson Jun 7, 2023
a3010bd
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
hsiehjackson Jun 7, 2023
81b002e
Merge branch 'main' into gpt-alibi-FA
hsiehjackson Jun 8, 2023
a883aa2
kerple init
hsiehjackson Jun 8, 2023
6a895f0
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
hsiehjackson Jun 8, 2023
0504814
Merge branch 'main' into gpt-alibi-FA
hsiehjackson Jun 8, 2023
889dec6
Add TE comment
hsiehjackson Jun 8, 2023
fd2899a
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
hsiehjackson Jun 8, 2023
7255e31
Merge branch 'main' into gpt-alibi-FA
hsiehjackson Jun 8, 2023
c972553
Merge branch 'main' into gpt-alibi-FA
hsiehjackson Jun 9, 2023
b8b5611
Fix error when inference.compute_attention_mask=False
hsiehjackson Jun 9, 2023
83ef08d
Merge branch 'gpt-alibi-FA' of https://github.com/NVIDIA/NeMo into gp…
hsiehjackson Jun 9, 2023
498ec3d
Merge branch 'main' into gpt-alibi-FA
michalivne Jun 11, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,11 @@ WORKDIR /tmp/nemo
COPY requirements .
RUN for f in $(ls requirements*.txt); do pip3 install --disable-pip-version-check --no-cache-dir -r $f; done

# install flash attention dependencies
RUN pip install flash-attn
# pinned triton version for flash-attention https://github.com/HazyResearch/flash-attention/blob/main/flash_attn/flash_attn_triton.py#L3
RUN pip install triton==2.0.0.dev20221202

# install k2, skip if installation fails
COPY scripts /tmp/nemo/scripts/
RUN INSTALL_MSG=$(/bin/bash /tmp/nemo/scripts/speech_recognition/k2/setup.sh); INSTALL_CODE=$?; \
Expand Down
346 changes: 346 additions & 0 deletions Jenkinsfile

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -280,6 +280,16 @@ It is highly recommended to use the NVIDIA PyTorch or NeMo container if having i

Transformer Engine requires PyTorch to be built with CUDA 11.8.


Flash Attention
~~~~~~~~~~~~~~~~~~~~
Transformer Engine already supports Flash Attention for GPT models. If you want to use Flash Attention for non-causal models or use with attention bias (introduced from position encoding, e.g. Alibi), please install `flash-attn <https://github.com/HazyResearch/flash-attention>`_.

.. code-block:: bash
pip install flash-attn
pip install triton==2.0.0.dev20221202
NeMo Text Processing
~~~~~~~~~~~~~~~~~~~~
NeMo Text Processing, specifically (Inverse) Text Normalization, is now a separate repository `https://github.com/NVIDIA/NeMo-text-processing <https://github.com/NVIDIA/NeMo-text-processing>`_.
Expand Down
5 changes: 4 additions & 1 deletion examples/nlp/language_modeling/conf/megatron_gpt_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ model:
transformer_block_type: 'pre_ln' # Options ['pre_ln', 'post_ln', 'normformer']
openai_gelu: False # Use OpenAI's GELU instead of the default GeLU
normalize_attention_scores: True # Whether to scale the output Q * K^T by 1 / sqrt(hidden_size_per_head). This arg is provided as a configuration option mostly for compatibility with models that have been weight-converted from HF. You almost always want to se this to True.
position_embedding_type: 'learned_absolute' # Position embedding type. Options ['learned_absolute', 'rope']
position_embedding_type: 'learned_absolute' # Position embedding type. Options ['learned_absolute', 'rope', 'alibi', 'kerple' , 'xpos', 'sandwich'] xpos and sandwich are experimental.
hsiehjackson marked this conversation as resolved.
Show resolved Hide resolved
rotary_percentage: 1.0 # If using position_embedding_type=rope, then the per head dim is multiplied by this.
attention_type: 'multihead' # Attention type. Options ['multihead']
share_embeddings_and_output_weights: True # Share embedding and output layer weights.
Expand Down Expand Up @@ -167,6 +167,9 @@ model:
reduce_amax: True # Perform reduction to sync amax tensors across GPUs after every iteration
use_emha: False # Use fused multi-head attention for large sequence-length. Note this is not yet supported. Please set to False.

## Flash Attention
use_flash_attention: False # Use flash attention in self-attention module, this config does nothing when transformer_engine=True

data:
# Path to data must be specified by the user.
# Supports List, String and Dictionary
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,4 +36,5 @@ megatron_legacy: False # Whether to use the legacy Megatron model. This affects
normalize_attention_scores: True # Whether to scale the output Q * K^T by 1 / sqrt(hidden_size_per_head). This arg is provided as a configuration option mostly for compatibility with models that have been weight-converted from HF. You almost always want to se this to True.
num_moe_experts: 1 # When >1, FFNs are changed to MoE layers
moe_frequency: 1 # every Nth ffn layer will be made MoE
moe_dropout: 0.0 # Dropout value for MoE layers
moe_dropout: 0.0 # Dropout value for MoE layers
use_flash_attention: false # Use flash attention in self-attention module
ericharper marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -129,4 +129,5 @@ inference:
repetition_penalty: 1.2 # The parameter for repetition penalty. 1.0 means no penalty.
min_tokens_to_generate: 0 # The minimum length of the sequence to be generated.
compute_logprob: False # a flag used to compute logprob of all the input text, a very special case of running inference, default False
outfile_path: output.txt
outfile_path: output.txt
compute_attention_mask: True
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,7 @@ def __init__(
gradient_accumulation_fusion=False,
persist_layer_norm=False,
openai_gelu=False,
megatron_legacy=False,
onnx_safe=False,
sequence_parallel=False,
transformer_engine=False,
Expand All @@ -163,6 +164,7 @@ def __init__(
fp8_amax_compute_algo='most_recent',
reduce_amax=True,
use_emha=False,
use_flash_attention=False,
):
super(GPTModel, self).__init__(share_token_embeddings=share_embeddings_and_output_weights)

Expand Down Expand Up @@ -232,6 +234,7 @@ def __init__(
persist_layer_norm=persist_layer_norm,
openai_gelu=openai_gelu,
onnx_safe=onnx_safe,
megatron_legacy=megatron_legacy,
sequence_parallel=sequence_parallel,
transformer_engine=transformer_engine,
fp8=fp8,
Expand All @@ -243,6 +246,7 @@ def __init__(
fp8_amax_compute_algo=fp8_amax_compute_algo,
reduce_amax=reduce_amax,
use_emha=use_emha,
use_flash_attention=use_flash_attention,
)

if self.share_embeddings_and_output_weights:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
from pytorch_lightning.trainer.trainer import Trainer

from nemo.collections.nlp.models.nlp_model import NLPModel
from nemo.collections.nlp.modules.common.megatron.attention import HAVE_FLASH_ATTENTION
from nemo.collections.nlp.modules.common.megatron.clip_grads import (
clip_grad_norm_distributed_optimizer,
clip_grad_norm_fp32,
Expand Down Expand Up @@ -84,6 +85,12 @@ def __init__(self, cfg: DictConfig, trainer: Trainer, no_lm_init=True):
if trainer is None:
raise ValueError(f"Trainer cannot be None for Megatron-based models. Please provide a PTL trainer object.")

if cfg.get('use_flash_attention', False) and not HAVE_FLASH_ATTENTION:
raise ImportError(
"flash_attn was not found. Please see the installation instructions: https://github.com/HazyResearch/flash-attention."
"If you use flash_attn with triton. Please install triton==2.0.0.dev20221202."
)

# this prevents base constructor from initializing tokenizer
self.tokenizer = None

Expand Down Expand Up @@ -205,9 +212,10 @@ def _build_tokenizer(self):
self.tokenizer = get_nmt_tokenizer(
library=self._cfg.tokenizer.library,
model_name=self._cfg.tokenizer.type,
tokenizer_model=self.register_artifact("tokenizer.model", self._cfg.tokenizer.model),
vocab_file=self.register_artifact("tokenizer.vocab_file", self._cfg.tokenizer.vocab_file),
merges_file=self.register_artifact("tokenizer.merge_file", self._cfg.tokenizer.merge_file),
tokenizer_model=self.register_artifact("tokenizer.model", self._cfg.tokenizer.get('model', None)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will the tokenizer not have the model or vocab_file or merge_file property?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just wondering if this may introduce silent failures.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trias702 could you please comment on that

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaximumEntropy It happens if you use a HF tokenizer. To use a pretrained HF tokenizer like EleutherAI/gpt-neox-20b or bert-base-uncased you don't need to pass any model, vocab_file, or merge_file in the config, you just need to pass tokenizer.library=huggingface and tokenizer.type=EleutherAI/gpt-neox-20b. But because the old code would call register_artifact for cfg.tokenizer.model which would be None, it would throw an error. That's why I changed this block, to ensure it works correctly with pure HF tokenizers at model instantiation.

vocab_file=self.register_artifact("tokenizer.vocab_file", self._cfg.tokenizer.get('vocab_file', None)),
merges_file=self.register_artifact("tokenizer.merge_file", self._cfg.tokenizer.get('merge_file', None)),
use_fast=self.cfg.tokenizer.get('use_fast', False),
delimiter=self.cfg.tokenizer.get('delimiter', None),
legacy=legacy,
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -300,7 +300,7 @@ def get_inference_config(self):
def model_provider_func(self, pre_process, post_process):
"""Model depends on pipeline paralellism."""
model = GPTModel(
vocab_size=self.padded_vocab_size,
vocab_size=self.cfg.get('override_vocab_size', self.padded_vocab_size),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is override_vocab_size and why is it needed? I don't see it in the gpt_config.yaml file either.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed for converting models from one platform (Mosaic, HF, etc) into Megatron, because you need to instantiate the model with your Embedding layer for the exact dimensionality you require in order to import the weights, otherwise you get a mismatch or you need to do complex weight surgery to cut out the weights you need and pad with zeros. It was easier to add a special override flag to just force the model to use the vocab_size necessary to make downstream conversion work. If the override_vocab_size is None, which it will be 99.8% of the time, it uses padded_vocab_size same as before, so it felt like a really harmless bypass to make model conversion much easier for all future conversion work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, for MPT-7B, the override_vocab_size is not equal to the padded vocab size computed based on the tokenizer vocab size plus divisibility by 128 upscaling?

I have concerns about this for two reasons 1) This is a "dangling" property with no reference in the yaml file so the only way for people to use this is to override from CLI via something like +model.override_vocab_size=32000 2) There are divisibility constraints on the vocab size based on the tensor parallel size and mod 128 divisibility for tensor cores, this sidesteps both of them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is divisible, but the actual number is incorrect if you try to divine the Embedding size by reading the tokeniser vocab size. For MPT-7B, the tokeniser they trained with has vocab size of 50254, but their Embedding weights are size 50432 which is divisible by 128 and works just fine with TP. However, if you just pass the MPT-7B tokeniser to MegatronGPTModel at instantiation, MegatronGPTModel does that upsampling-divisibility logic to create the padded vocab_size, which in this case gives you 50304. However, that is wrong, because the actual state_dict weights in MPT-7B are sized for 50432, so you'll have a mismatch when trying to load the weights. The solution was to either introduce complex weight surgery logic, which pads with zeros, but even then you have to pass some target number to pad up to, so it was easier to just pass that number as the actual matrix size for the Embedding layer, bypassing the upsampling-divisibility stuff.

This override parameter is extremely useful for any work where we want to import some external model weights from a 3rd party Transformer model into Megatron, because we need to tell Megatron exactly what size we want the Embedding layer to be, and currently there is no way of doing that. I agree that this parameter is "dangling", but it's designed to only ever be used inside of a script which converts a 3rd party model into Nemo, for example this one: https://github.com/NVIDIA/NeMo/blob/2a526adf9dd5d4ef22c1d6c1d807c6a8bbea14ec/scripts/nlp_language_modeling/convert_mpt_7b_hf_to_nemo.py

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaximumEntropy for the final call, but @trias702, if it's for external only - how about renaming it to make it clear or adding a comment?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is really on a case-by-case basis, why don't we create a model type class and set this vocab according to the model type instead of relying on user to set it? I would rather set "MPT-7B" than some magic number "50432".

hidden_size=self.cfg.hidden_size,
max_position_embeddings=self.cfg.max_position_embeddings,
num_layers=self.cfg.num_layers,
Expand Down Expand Up @@ -357,6 +357,8 @@ def model_provider_func(self, pre_process, post_process):
fp8_amax_compute_algo=self.cfg.get('fp8_amax_compute_algo', 'most_recent'),
reduce_amax=self.cfg.get('reduce_amax', True),
use_emha=self.cfg.get('use_emha', False),
use_flash_attention=self.cfg.get('use_flash_attention', False),
megatron_legacy=self.cfg.get('megatron_legacy', False),
)

return model
Expand Down Expand Up @@ -765,7 +767,6 @@ def fwd_output_and_loss_func(dataloader_iter, model, checkpoint_activations_all_
if self.get_attention_mask_from_fusion:
required_keys.remove('attention_mask')
batch = {key: val.cuda(non_blocking=True) if key in required_keys else None for key, val in batch.items()}

# Model forward pass
output_tensor = model(
batch['tokens'],
Expand Down Expand Up @@ -822,9 +823,10 @@ def fwd_output_only_func(dataloader_iter, model):
inference_max_sequence_len,
) = batch
tokens = tokens.cuda()
attention_mask = attention_mask.cuda()
position_ids = position_ids.cuda()
attention_mask = attention_mask[0:1]
if attention_mask is not None:
attention_mask = attention_mask.cuda()
attention_mask = attention_mask[0:1]
extra_arg['set_inference_key_value_memory'] = set_inference_key_value_memory[0].item()
extra_arg['inference_max_sequence_len'] = inference_max_sequence_len[0].item()
output_tensor = model(tokens, position_ids, attention_mask, **extra_arg)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -753,6 +753,7 @@ def predict_step(self, batch: Any, batch_idx: int, dataloader_idx: Optional[int]
"add_BOS": inference_config["add_BOS"],
"all_probs": inference_config["all_probs"],
"compute_logprob": inference_config["compute_logprob"],
"compute_attention_mask": inference_config.get("compute_attention_mask", True),
}

task_ids, processed_inputs = batch
Expand Down
Loading