{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":561328948,"defaultBranch":"main","name":"TransformerEngine","ownerLogin":"nzmora-nvidia","currentUserCanPush":false,"isFork":true,"isEmpty":false,"createdAt":"2022-11-03T13:08:54.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/96238833?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1691587929.0","currentOid":""},"activityList":{"items":[{"before":"3b7b7c68fc310067567956d6f63f633e2012bcec","after":"b8ba734e01671e77d7efebc1f773adf12839de9d","ref":"refs/heads/main","pushedAt":"2023-08-30T10:40:46.000Z","pushType":"push","commitsCount":4,"pusher":{"login":"nzmora-nvidia","name":"Neta Zmora","path":"/nzmora-nvidia","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/96238833?s=80&v=4"},"commit":{"message":"[Paddle] Add parallel support (#357)\n\n* [Paddle] Add TP, DP, PP, FSDP\r\n\r\nSigned-off-by: Tian Zheng (Engrg-Hardware 1) \r\n\r\n* Minor fix\r\n\r\nSigned-off-by: Tian Zheng (Engrg-Hardware 1) \r\n\r\n* Fix CI failure\r\n\r\nSigned-off-by: Tian Zheng (Engrg-Hardware 1) \r\n\r\n* Remove set_nccl_overlap_warning_if_tp\r\n\r\nSigned-off-by: Tian Zheng (Engrg-Hardware 1) \r\n\r\n* Improve variable naming\r\n\r\nSigned-off-by: Tian Zheng (Engrg-Hardware 1) \r\n\r\n* Refactor FP8 Buffer\r\n\r\nSigned-off-by: Tian Zheng (Engrg-Hardware 1) \r\n\r\n* Stylic changes\r\n\r\nSigned-off-by: Tian Zheng (Engrg-Hardware 1) \r\n\r\n* Fix FP32 parallel training\r\n\r\nSigned-off-by: Tian Zheng (Engrg-Hardware 1) \r\n\r\n* Fix numel performance issue\r\n\r\nSigned-off-by: Tian Zheng (Engrg-Hardware 1) \r\n\r\n* Squashed commit of the following:\r\n\r\ncommit 79e2e5fd774e67dcdda9aae01a9f31a6479c5d70\r\nAuthor: Tian Zheng (Engrg-Hardware 1) \r\nDate: Sun Aug 20 14:39:16 2023 +0000\r\n\r\n Add TP test\r\n\r\n Signed-off-by: Tian Zheng (Engrg-Hardware 1) \r\n\r\ncommit 1d40ad60540490f97ed82ba877cc6eda8902cbf6\r\nAuthor: Tian Zheng (Engrg-Hardware 1) \r\nDate: Sun Aug 20 14:22:25 2023 +0000\r\n\r\n Fix tp_size when disabled\r\n\r\n Signed-off-by: Tian Zheng (Engrg-Hardware 1) \r\n\r\ncommit 6632f735a0c8251862355fc74622af59fae3a509\r\nAuthor: Tian Zheng (Engrg-Hardware 1) \r\nDate: Sun Aug 20 05:52:18 2023 +0000\r\n\r\n Add TP for attention and transformer layer\r\n\r\n Signed-off-by: Tian Zheng (Engrg-Hardware 1) \r\n\r\nSigned-off-by: Tian Zheng (Engrg-Hardware 1) \r\n\r\n* Add shape check\r\n\r\nSigned-off-by: Tian Zheng (Engrg-Hardware 1) \r\n\r\n* Add FSDP check for stage 1,2,3\r\n\r\nSigned-off-by: Tian Zheng (Engrg-Hardware 1) \r\n\r\n* Review changes\r\n\r\nSigned-off-by: Tian Zheng (Engrg-Hardware 1) \r\n\r\n* Fix group_sharding test\r\n\r\nSigned-off-by: Tian Zheng (Engrg-Hardware 1) \r\n\r\n* Support NVTE_FUSE_ATTN\r\n\r\nSigned-off-by: Tian Zheng (Engrg-Hardware 1) \r\n\r\n* Fix CI errors\r\n\r\nSigned-off-by: Tian Zheng (Engrg-Hardware 1) \r\n\r\n---------\r\n\r\nSigned-off-by: Tian Zheng (Engrg-Hardware 1) \r\nCo-authored-by: Kirthi Shankar Sivamani ","shortMessageHtmlLink":"[Paddle] Add parallel support (NVIDIA#357)"}},{"before":"7804d1167d78c29867b9f32ade5b7520be3bb870","after":"3b7b7c68fc310067567956d6f63f633e2012bcec","ref":"refs/heads/main","pushedAt":"2023-08-20T15:38:12.000Z","pushType":"push","commitsCount":15,"pusher":{"login":"nzmora-nvidia","name":"Neta Zmora","path":"/nzmora-nvidia","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/96238833?s=80&v=4"},"commit":{"message":"[PyTorch] Catch misaligned address errors in softmax (#390)\n\nCatch misaligned address errors\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani ","shortMessageHtmlLink":"[PyTorch] Catch misaligned address errors in softmax (NVIDIA#390)"}},{"before":"92db67ea4fc000985ebe99004cd888d05efcd3da","after":"d95856a9e84fe6be9ba3e0fa6360bbbbd0e464c5","ref":"refs/heads/dev_nzmora_improve_softmax_export_test","pushedAt":"2023-08-10T21:32:16.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"nzmora-nvidia","name":"Neta Zmora","path":"/nzmora-nvidia","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/96238833?s=80&v=4"},"commit":{"message":"Improve softmax ONNX export tests\n\n* Add dynamically shaped input mask in test_export_softmax\n* Fix test_softmax_mask_fn - use env. var `NVTE_ONNX_KVCACHE_MAX_SEQ_LEN` to control whether the test uses the default mask generation function or dynamic TRILU mask slicing.\n* Change core_attention ONNX export test: use \"no_mask\" as attn mask type when testing `te.attention.DotProductAttention` w/o masking.\n* Use ORT CUDA backend by default.\nSigned-off-by: Neta Zmora ","shortMessageHtmlLink":"Improve softmax ONNX export tests"}},{"before":"ef7606818f39edd26ea223b61fbb6d5637deb66f","after":"92db67ea4fc000985ebe99004cd888d05efcd3da","ref":"refs/heads/dev_nzmora_improve_softmax_export_test","pushedAt":"2023-08-10T06:51:55.000Z","pushType":"push","commitsCount":5,"pusher":{"login":"nzmora-nvidia","name":"Neta Zmora","path":"/nzmora-nvidia","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/96238833?s=80&v=4"},"commit":{"message":"Merge branch 'main' into dev_nzmora_improve_softmax_export_test","shortMessageHtmlLink":"Merge branch 'main' into dev_nzmora_improve_softmax_export_test"}},{"before":"e53a46a9de0bc50897acda66f69e9c974a108519","after":"ef7606818f39edd26ea223b61fbb6d5637deb66f","ref":"refs/heads/dev_nzmora_improve_softmax_export_test","pushedAt":"2023-08-09T21:53:00.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"nzmora-nvidia","name":"Neta Zmora","path":"/nzmora-nvidia","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/96238833?s=80&v=4"},"commit":{"message":"Update tests/pytorch/test_onnx_export.py\r\n\r\nChange core_attention ONNX export test: use \"no_mask\" as attn mask type when testing `te.attention.DotProductAttention` w/o masking.\n\nCo-authored-by: Kirthi Shankar Sivamani \nSigned-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>","shortMessageHtmlLink":"Update tests/pytorch/test_onnx_export.py"}},{"before":"66ff2e36ec79e286ec6c6989db8eecc808ca895c","after":"7804d1167d78c29867b9f32ade5b7520be3bb870","ref":"refs/heads/main","pushedAt":"2023-08-09T13:32:53.000Z","pushType":"push","commitsCount":3,"pusher":{"login":"nzmora-nvidia","name":"Neta Zmora","path":"/nzmora-nvidia","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/96238833?s=80&v=4"},"commit":{"message":"Disable FAv2 for deterministic use (#366)\n\n* Disable FAv2 for deterministic use\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani \r\n\r\n* Also disable FusedAttention backend with deterministic\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani \r\n\r\n* Fix\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani \r\n\r\n---------\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani ","shortMessageHtmlLink":"Disable FAv2 for deterministic use (NVIDIA#366)"}},{"before":null,"after":"e53a46a9de0bc50897acda66f69e9c974a108519","ref":"refs/heads/dev_nzmora_improve_softmax_export_test","pushedAt":"2023-08-09T13:32:09.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"nzmora-nvidia","name":"Neta Zmora","path":"/nzmora-nvidia","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/96238833?s=80&v=4"},"commit":{"message":"Improve softmax ONNX export tests\n\n* Use ORT CUDA backend by default.\n* Add dynamically shaped input mask in test_export_softmax\n* Fix test_softmax_mask_fn - use env. var `NVTE_ONNX_KVCACHE_MAX_SEQ_LEN` to control whether the test uses the default mask generation function or dynamic TRILU mask slicing.","shortMessageHtmlLink":"Improve softmax ONNX export tests"}},{"before":"8c3110d18eb9fcdb5514692f9659c7840572eccf","after":"66ff2e36ec79e286ec6c6989db8eecc808ca895c","ref":"refs/heads/main","pushedAt":"2023-08-08T19:06:02.000Z","pushType":"push","commitsCount":30,"pusher":{"login":"nzmora-nvidia","name":"Neta Zmora","path":"/nzmora-nvidia","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/96238833?s=80&v=4"},"commit":{"message":"[JAX] flash attention integration (#345)\n\n* Fix flash attention dropout probability with inference\r\n\r\nSigned-off-by: Reese Wang \r\n\r\n* Add output as the fused attention ctx tensor\r\n\r\nSigned-off-by: Reese Wang \r\n\r\n* Add rng_state as the fused attention ctx tensors\r\n\r\nSigned-off-by: Reese Wang \r\n\r\n* Add flash attention supported lengths to the fused attention\r\n\r\nSigned-off-by: Reese Wang \r\n\r\n* Refactor attention primitive to reuse abstract shaped array\r\n\r\nSigned-off-by: Reese Wang \r\n\r\n* Detect backend type to allocate appropriate ctx size\r\n\r\nSigned-off-by: Reese Wang \r\n\r\n* Skip dropout correctness instead of return success\r\n\r\nSigned-off-by: Reese Wang \r\n\r\n* Use cudaMemsetAsync and enhance the error handling\r\n\r\nSigned-off-by: Reese Wang \r\n\r\n* Add flash attention kernel elts_per_thread update\r\n\r\nSigned-off-by: Reese Wang \r\n\r\n* Remove redundant max 512 suffix\r\n\r\nSigned-off-by: Reese Wang \r\n\r\n* Keep only DType and remove NVTEDType from python\r\n\r\nSigned-off-by: Reese Wang \r\n\r\n* Fix a float32_attention_logits bugs\r\n\r\nSigned-off-by: Reese Wang \r\n\r\n* Re-calculate workspace size for self attention\r\n\r\nSigned-off-by: Reese Wang \r\n\r\n* Enhance bias/dbias shape guard\r\n\r\nSigned-off-by: Reese Wang \r\n\r\n* Enhance the seed/rng_state checker\r\n\r\nSigned-off-by: Reese Wang \r\n\r\n* Use jax.core.ShapedArray as jax.abstract_arrays is deprecated\r\n\r\nSigned-off-by: Reese Wang \r\n\r\n* Enhance the unittest docs\r\n\r\nSigned-off-by: Reese Wang \r\n\r\n---------\r\n\r\nSigned-off-by: Reese Wang ","shortMessageHtmlLink":"[JAX] flash attention integration (NVIDIA#345)"}},{"before":"7f25a9f12410f678e123768d6d4bb196e5e72ef5","after":"41811fc05aad801001accc32862e541125237f7c","ref":"refs/heads/dev_nzmora_bug_310","pushedAt":"2023-07-13T18:17:43.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/36168853?s=80&v=4"},"commit":{"message":"Bug fix\n\nSigned-off-by: Kirthi Shankar Sivamani ","shortMessageHtmlLink":"Bug fix"}},{"before":"f7dd59bf28fb31590ab74ea21783ac748f9dcd4f","after":"7f25a9f12410f678e123768d6d4bb196e5e72ef5","ref":"refs/heads/dev_nzmora_bug_310","pushedAt":"2023-07-13T16:24:32.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/36168853?s=80&v=4"},"commit":{"message":"Merge branch 'main' into dev_nzmora_bug_310","shortMessageHtmlLink":"Merge branch 'main' into dev_nzmora_bug_310"}},{"before":"08350ceff3b296e201348b9c268d26ad142b1c75","after":"f7dd59bf28fb31590ab74ea21783ac748f9dcd4f","ref":"refs/heads/dev_nzmora_bug_310","pushedAt":"2023-07-13T08:46:09.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"nzmora-nvidia","name":"Neta Zmora","path":"/nzmora-nvidia","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/96238833?s=80&v=4"},"commit":{"message":"Fix FP32 LayerNorm ONNX export\n\nWhen running inference use a fwd method that is registered with torchscript.\n\nSigned-off-by: Neta Zmora ","shortMessageHtmlLink":"Fix FP32 LayerNorm ONNX export"}},{"before":"a7bc7cf755b1347065640f10ec1ead521a0c4942","after":"8c3110d18eb9fcdb5514692f9659c7840572eccf","ref":"refs/heads/main","pushedAt":"2023-07-13T08:40:39.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"nzmora-nvidia","name":"Neta Zmora","path":"/nzmora-nvidia","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/96238833?s=80&v=4"},"commit":{"message":"Catch cublas FP8 errors (#317)\n\n* Better dimension assert for FP8\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani \r\n\r\n* line\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani \r\n\r\n---------\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani ","shortMessageHtmlLink":"Catch cublas FP8 errors (NVIDIA#317)"}},{"before":null,"after":"08350ceff3b296e201348b9c268d26ad142b1c75","ref":"refs/heads/dev_nzmora_bug_310","pushedAt":"2023-07-09T23:25:34.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"nzmora-nvidia","name":"Neta Zmora","path":"/nzmora-nvidia","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/96238833?s=80&v=4"},"commit":{"message":"Fix FP32 LayerNorm ONNX export\n\nWhen running inference use a fwd method that is registered with torchscript.\n\nSigned-off-by: Neta Zmora ","shortMessageHtmlLink":"Fix FP32 LayerNorm ONNX export"}},{"before":"a83605dfa9545e7b06b24b450ae1cbe13326c1d6","after":"a7bc7cf755b1347065640f10ec1ead521a0c4942","ref":"refs/heads/main","pushedAt":"2023-07-09T23:15:12.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"nzmora-nvidia","name":"Neta Zmora","path":"/nzmora-nvidia","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/96238833?s=80&v=4"},"commit":{"message":"[JAX] Support arbitrary dimensinos of fp8 meta. (#309)\n\nSigned-off-by: Ming Huang ","shortMessageHtmlLink":"[JAX] Support arbitrary dimensinos of fp8 meta. (NVIDIA#309)"}},{"before":"804f120322a13cd5f21ea8268860607dcecd055c","after":"a83605dfa9545e7b06b24b450ae1cbe13326c1d6","ref":"refs/heads/main","pushedAt":"2023-07-03T20:15:34.000Z","pushType":"push","commitsCount":10,"pusher":{"login":"nzmora-nvidia","name":"Neta Zmora","path":"/nzmora-nvidia","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/96238833?s=80&v=4"},"commit":{"message":"Check for cuDNN frontend API when building (#307)\n\nSigned-off-by: Tim Moon \r\nCo-authored-by: Kirthi Shankar Sivamani ","shortMessageHtmlLink":"Check for cuDNN frontend API when building (NVIDIA#307)"}},{"before":"d0a28a3a0932c46530dea2046d02bf32ead3e9a2","after":"a4a8b774b3feeb6025fe8676b7fc42c94a995350","ref":"refs/heads/dev_nzmora_fix_const_of_shape","pushedAt":"2023-06-22T18:50:30.281Z","pushType":"push","commitsCount":2,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/36168853?s=80&v=4"},"commit":{"message":"Merge branch 'main' into dev_nzmora_fix_const_of_shape","shortMessageHtmlLink":"Merge branch 'main' into dev_nzmora_fix_const_of_shape"}},{"before":"401287095682978aeae0ba344e74da97c6c824a9","after":"d0a28a3a0932c46530dea2046d02bf32ead3e9a2","ref":"refs/heads/dev_nzmora_fix_const_of_shape","pushedAt":"2023-06-22T18:48:48.955Z","pushType":"push","commitsCount":1,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/36168853?s=80&v=4"},"commit":{"message":"fix lint\n\nSigned-off-by: Kirthi Shankar Sivamani ","shortMessageHtmlLink":"fix lint"}},{"before":"690c9ab0b8e2de3db78f47d335aa1cdea0b22c0d","after":"401287095682978aeae0ba344e74da97c6c824a9","ref":"refs/heads/dev_nzmora_fix_const_of_shape","pushedAt":"2023-06-22T18:43:22.975Z","pushType":"push","commitsCount":3,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/36168853?s=80&v=4"},"commit":{"message":"Merge branch 'main' into dev_nzmora_fix_const_of_shape","shortMessageHtmlLink":"Merge branch 'main' into dev_nzmora_fix_const_of_shape"}},{"before":null,"after":"690c9ab0b8e2de3db78f47d335aa1cdea0b22c0d","ref":"refs/heads/dev_nzmora_fix_const_of_shape","pushedAt":"2023-06-22T14:16:55.328Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"nzmora-nvidia","name":"Neta Zmora","path":"/nzmora-nvidia","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/96238833?s=80&v=4"},"commit":{"message":"Fix ONNX export of layer_norm\n\nONNX has a spec bug: ConstantOfShape supports all dtypes except for BF16.\nTo WAR we use dtype FP32 and then cast to BF16.\n\nWill also issue a PR to the ONNX sig committee to change the spec in opset 20.\n\nSigned-off-by: Neta Zmora ","shortMessageHtmlLink":"Fix ONNX export of layer_norm"}},{"before":"df6f347fa1039125f9777400c3e9ce4c461d9eda","after":"804f120322a13cd5f21ea8268860607dcecd055c","ref":"refs/heads/main","pushedAt":"2023-06-22T13:05:03.273Z","pushType":"push","commitsCount":11,"pusher":{"login":"nzmora-nvidia","name":"Neta Zmora","path":"/nzmora-nvidia","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/96238833?s=80&v=4"},"commit":{"message":"Fix BF16 ONNX export for successful ONNX Runtime Verification (#290)\n\nSigned-off-by: Asfiya Baig ","shortMessageHtmlLink":"Fix BF16 ONNX export for successful ONNX Runtime Verification (NVIDIA…"}},{"before":"bf041cf1a9e42e7df4e69b9a9ba9a10ff409cb3c","after":"83ee7689b272e93db5bce184f0e3495c0f0b2be5","ref":"refs/heads/dev_nzmora_fix_sm_export","pushedAt":"2023-06-16T20:33:21.704Z","pushType":"push","commitsCount":1,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/36168853?s=80&v=4"},"commit":{"message":"lint\n\nSigned-off-by: Kirthi Shankar Sivamani ","shortMessageHtmlLink":"lint"}},{"before":"9f6c0579448c560f6722127c6b7f89a764024933","after":"bf041cf1a9e42e7df4e69b9a9ba9a10ff409cb3c","ref":"refs/heads/dev_nzmora_fix_sm_export","pushedAt":"2023-06-16T19:36:27.289Z","pushType":"push","commitsCount":1,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/36168853?s=80&v=4"},"commit":{"message":"Fix exports\n\nSigned-off-by: Kirthi Shankar Sivamani ","shortMessageHtmlLink":"Fix exports"}},{"before":"58e23420eb3c3e524e39f37ad8fc1f9c929efc01","after":"9f6c0579448c560f6722127c6b7f89a764024933","ref":"refs/heads/dev_nzmora_fix_sm_export","pushedAt":"2023-06-16T19:05:09.596Z","pushType":"push","commitsCount":5,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/36168853?s=80&v=4"},"commit":{"message":"Resolve conflicts and fix ci\n\nSigned-off-by: Kirthi Shankar Sivamani ","shortMessageHtmlLink":"Resolve conflicts and fix ci"}},{"before":"1e9740ad6de37d621552d8573f949ecb12e4057b","after":"58e23420eb3c3e524e39f37ad8fc1f9c929efc01","ref":"refs/heads/dev_nzmora_fix_sm_export","pushedAt":"2023-06-15T22:54:13.152Z","pushType":"push","commitsCount":1,"pusher":{"login":"nzmora-nvidia","name":"Neta Zmora","path":"/nzmora-nvidia","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/96238833?s=80&v=4"},"commit":{"message":"ONNX export Code refactoring\n\nShare function compute_in_fp32 between softmax.py (softmax symbolic functions) and te_onnx_extensions.py (the rest of the symbolic functions).\n\nSigned-off-by: Neta Zmora ","shortMessageHtmlLink":"ONNX export Code refactoring"}},{"before":null,"after":"1e9740ad6de37d621552d8573f949ecb12e4057b","ref":"refs/heads/dev_nzmora_fix_sm_export","pushedAt":"2023-06-15T15:39:37.133Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"nzmora-nvidia","name":"Neta Zmora","path":"/nzmora-nvidia","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/96238833?s=80&v=4"},"commit":{"message":"Fix softmax ONNX export\n\n* BF16 is validated using \"fake i/o\": ie. instead of using BF16 as input/output, use FP32 input/output and convert to/from BF16 in the forward method.\n\n* Wrap softmax symbolic functions with conversion to/from FP32 to produce the same semantics as TE's softmax (compute is performed at FP32 precision regardless of input/output data type).\n\nSigned-off-by: Neta Zmora ","shortMessageHtmlLink":"Fix softmax ONNX export"}},{"before":"68f60b890c319327b9171af5756cc9a71301d676","after":"df6f347fa1039125f9777400c3e9ce4c461d9eda","ref":"refs/heads/main","pushedAt":"2023-06-13T19:23:35.280Z","pushType":"push","commitsCount":24,"pusher":{"login":"nzmora-nvidia","name":"Neta Zmora","path":"/nzmora-nvidia","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/96238833?s=80&v=4"},"commit":{"message":"[JAX] Move jax.experimental.maps.Mesh to jax.sharding.Mesh (#276)\n\nMove jax.experimental.maps.Mesh to jax.sharding.Mesh\r\n\r\nSigned-off-by: Reese Wang ","shortMessageHtmlLink":"[JAX] Move jax.experimental.maps.Mesh to jax.sharding.Mesh (NVIDIA#276)"}},{"before":"8d4761ade3cdd564176022501d2223b4cf5130a5","after":"68f60b890c319327b9171af5756cc9a71301d676","ref":"refs/heads/main","pushedAt":"2023-05-15T23:13:03.151Z","pushType":"push","commitsCount":2,"pusher":{"login":"nzmora-nvidia","name":"Neta Zmora","path":"/nzmora-nvidia","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/96238833?s=80&v=4"},"commit":{"message":"Add env. var. for efficient text-generation in inference (#214)\n\n* Dynamically-generated causal attention mask (for ONNX export)\r\n\r\nTE's default causal mask is square (seq_len, seq_len) and is\r\ndynamically allocated for different sequence sizes. Dynamic\r\nallocation and dictionary lookups are not supported by ONNX.\r\nGPT generative phase uses rectangular masks.\r\n\r\nThis commit forces softmax to use `forward_torch_softmax` and\r\nto dynamically generate an attention mask when exporting to ONNX.\r\nThe mask is generated w/o using conditional control-flow by generating\r\na (k_seq_len, k_seq_len) mask and slicing it to (q_seq_len, k_seq_len)\r\n\r\nAn alternate implementation is to pre-allocate a mask of shape\r\n(max_seq, max_seq) and to slice that. This solution is more performant\r\nat the expense of space, but the problem is the TE doesn't have a concept\r\nof max_seq.\r\n\r\n* Add to test_export_softmax a test for te.softmax.FusedScaleMaskSoftmax.\r\n* Add test_softmax_mask_fn to test that TE's default attention mask and\r\nthe new ONNX-compatible mask produce the same behavior.\r\n* Add test_export_gpt_generation to test that the ONNX model can correctly\r\nhandle inputs with different shapes and that the attention mask it adjusted\r\non-the-fly to different sequence lengths.\r\n\r\nMisc:\r\n* Add a PRNG seeding fixture for more stability in tests.\r\n* Add dynamic shapes for ONNX input/output tests.\r\n* Allow validate_result to compare ORT output to pre-computed TE outputs.\r\n\r\nSigned-off-by: Neta Zmora \r\n\r\n* Add NVTE_ONNX_KVCACHE_MAX_SEQ_LEN for efficient text-generation in inference\r\n\r\n* Introduce an environment variable (NVTE_ONNX_KVCACHE_MAX_SEQ_LEN) to set the maximum sequence length.\r\nIn ONNX inference with KV-Cache optimizations for GPT text generation, the attention mask shape can be square (context-phase) or rectangular (generation-phase).\r\nWhen exporting to ONNX and this variable is set, TE preallocates an upper triangular (k=1) matrix with a size as prescribed by the variable, and dynamically slices the mask for the required shape.\r\nTE models can be exported to ONNX when NVTE_ONNX_KVCACHE_MAX_SEQ_LEN is not configured, but the attention masking is always square and not fit for efficient text generation.\r\n\r\n* Work-around torch.onnx.export bug that incorrectly folds\r\nlayer_norm(data, scale=add(gamma,1)) to layer_norm(data, scale=gamma)\r\nwhen we use LN with zero-centered gamma.\r\n\r\n* ONNX export tests\r\n * Add a fixture (seed_default_rng) to seed the PRNG\r\n * Add a fixture (set_max_seq_len) to set the max sequence length when exporting to ONNX for GPT text generation\r\n\r\nSigned-off-by: Neta Zmora \r\n\r\n* Fix linting errors\r\n\r\nSigned-off-by: Neta Zmora \r\n\r\n* Remove immutable default values from a couple of function signatures\r\n\r\nSigned-off-by: Neta Zmora \r\n\r\n* Add @skip_FP8 to test_export_gpt_generation\r\n\r\nSigned-off-by: Neta Zmora \r\n\r\n* Update transformer_engine/pytorch/softmax.py\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani \r\n\r\n* Fix CI error for softmax export\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani \r\n\r\n* Lint\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani \r\n\r\n---------\r\n\r\nSigned-off-by: Neta Zmora \r\nSigned-off-by: Kirthi Shankar Sivamani \r\nCo-authored-by: Kirthi Shankar Sivamani ","shortMessageHtmlLink":"Add env. var. for efficient text-generation in inference (NVIDIA#214)"}},{"before":"475c5408a59c0d1ccc7fa108f219680f3e37ff63","after":"dbd61bd466e7a63abbb32c9ae51261f84041632e","ref":"refs/heads/nzmora_dev_refactoring_2","pushedAt":"2023-05-13T19:08:19.564Z","pushType":"push","commitsCount":1,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/36168853?s=80&v=4"},"commit":{"message":"Lint\n\nSigned-off-by: Kirthi Shankar Sivamani ","shortMessageHtmlLink":"Lint"}},{"before":"0ee9d66018eb047ed8a5ca41ff378a966c6cb62c","after":"475c5408a59c0d1ccc7fa108f219680f3e37ff63","ref":"refs/heads/nzmora_dev_refactoring_2","pushedAt":"2023-05-13T19:07:34.643Z","pushType":"push","commitsCount":1,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/36168853?s=80&v=4"},"commit":{"message":"Fix CI error for softmax export\n\nSigned-off-by: Kirthi Shankar Sivamani ","shortMessageHtmlLink":"Fix CI error for softmax export"}},{"before":"4b00eade73b5dcc09839d3cee3ab9a4528e4b7c1","after":"0ee9d66018eb047ed8a5ca41ff378a966c6cb62c","ref":"refs/heads/nzmora_dev_refactoring_2","pushedAt":"2023-05-12T19:53:50.787Z","pushType":"push","commitsCount":4,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/36168853?s=80&v=4"},"commit":{"message":"Merge branch 'main' into nzmora_dev_refactoring_2","shortMessageHtmlLink":"Merge branch 'main' into nzmora_dev_refactoring_2"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"Y3Vyc29yOnYyOpK7MjAyMy0wOC0zMFQxMDo0MDo0Ni4wMDAwMDBazwAAAAN1ps3e","startCursor":"Y3Vyc29yOnYyOpK7MjAyMy0wOC0zMFQxMDo0MDo0Ni4wMDAwMDBazwAAAAN1ps3e","endCursor":"Y3Vyc29yOnYyOpK7MjAyMy0wNS0xMlQxOTo1Mzo1MC43ODcwOTVazwAAAAMrxX2T"}},"title":"Activity · nzmora-nvidia/TransformerEngine"}