Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checking we use fused kernels to compute scaled masked softmax on prefix lm #209

Merged
merged 3 commits into from
Nov 26, 2021

Conversation

thomasw21
Copy link
Member

@thomasw21 thomasw21 commented Nov 26, 2021

There seems to be no issues with cuda kernels, as tests pass locally.

@thomasw21 thomasw21 changed the title [WIP] Checking when we use fused kernels to compute scaled masked softmax Checking we use fused kernels to compute scaled masked softmax on prefix lm Nov 26, 2021
@thomasw21 thomasw21 marked this pull request as ready for review November 26, 2021 16:02
@thomasw21 thomasw21 merged commit b227590 into main Nov 26, 2021
@stas00
Copy link
Contributor

stas00 commented Nov 27, 2021

@thomasw21, this PR appears to have broken the test suite:

>               self.assertIn("Using fused softmax", cs.out)
E               AssertionError: 'Using fused softmax' not found in 'using world size: 1, data-parallel-size: 1, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 \nusing torch.float16 for parameters ...\n------------------------ arguments ------------------------\n  accumulate_allreduce_grads_in_fp32 .......
=========================== short test summary info ============================
FAILED tests/test_model.py::MyTestCase::test_prefix_lm_wo_reset_attention_mask
FAILED tests/test_training.py::MegDSTestTraining::test_training_prefix_lm_all_0
FAILED tests/test_training.py::MegDSTestTraining::test_training_prefix_lm_all_1
FAILED tests/test_training.py::MegDSTestTraining::test_training_prefix_lm_all_2
FAILED tests/test_training.py::MegDSTestTraining::test_training_prefix_lm_all_3

Full logs:

https://github.com/bigscience-workshop/Megatron-DeepSpeed/runs/4339199658?check_suite_focus=true

In general please try to run the test suite locally if AWS doesn't give resources to run the CI (which unfortunately sucks :( )

thomasw21 added a commit that referenced this pull request Nov 27, 2021
@thomasw21
Copy link
Member Author

Hmm reverted in c9afebc Though this passed locally. I'll have second look at it on monday.

@stas00
Copy link
Contributor

stas00 commented Nov 27, 2021

Thank you, Thomas! That's helpful to other PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[PrefixLM] Improve test to test out custom cuda kernel
2 participants