-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding support to return logits and generate for Megatron-LM GPT models #819
adding support to return logits and generate for Megatron-LM GPT models #819
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! I think the generate
method you are adding should be renamed to something including the name meagtron
, otherwise we will confuse users that might expect to get the Transformers generate method and all it supports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this!
What does this PR do?
megatron_generate
method for Megatron-LM GPT model, this will use Tensor and Pipeline Parallelism to complete generations for a batch ofinputs
when using greedy with/without top_k/top_p sampling and for individual promptinputs
when using beam search decoding. Only a subset of features of transformersgenerate
is supported. This will help in using large models via tensor and pipeline parallelism for generation (already does key-value caching and uses fused kernels by default).Below is the run of the example script megatron_gpt2_generation.py with the main parts of output logs given below: