Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Megatron encoder-decoder refactor #3542

Merged
merged 148 commits into from
Feb 10, 2022

Conversation

michalivne
Copy link
Collaborator

@michalivne michalivne commented Jan 27, 2022

This PR refactor Megatron T5 to a modular base encoder-decoder class hierarchy, and a T5 class which customizes the base class. The goal of the refactor is to support multiple encoder-decoder models with minimal code redundancy.

Refactor High Level Description

Hierarchy tree is expressed below via list alignment (i.e., right alignment for child class).

  • MegatronModule sub-classes:
    • MegatronTransformerEncoderModule/MegatronTransformerDecoderModule - various architectures for manipulation of hidden --> hidden
    • MegatronTransformerEncoderDecoderModule - contains above classes and generates required joint encoder/decoder mask given independent masks, and reparametrization (e.g., vae)
    • TokensEncoderDecoderModule - modality-specific (e.g., tokens-tokens) model which map between pre-processed data (e.g., token ids) and encoder/decoder continuous input (i.e., hidden); handles embedders (i.e., embedding + positional embedding); handles projection from hidden to data distribution (e.g., project from hidden dimensions to the number of tokens)
  • NLPModel sub-classes:
    • MegatronBaseModel - common initialization of Megatron
      • MegatronLMEncoderDecoderModel - modality-specific (e.g., text-text) model which pre-process raw data (e.g., tokenizer pre-process text to token ids); generic modality-specific training/testing/inference/loss methods
        • MegatronT5Model - a specific model (i.e., T5) which prepares datasets (i.e., with corresponding data augmentation), and model-specific features (e.g., sentinels in tokenizer)

Notable Changes

  1. Encoder/decoder mask is 2D with 1 for active element, and 0 for masked element (i.e., as opposed to 3D Megatron mask with False for active and True for masked). This allows easy manipulation of masks. Conversion to 3D Megatron masks is done on-the-fly.
  2. Joint cross-attention mask is not precomputed, but is rather done on-the-fly.

michalivne and others added 2 commits January 28, 2022 00:50
2. Added megatron base encoder decoder class.

Signed-off-by: Micha Livne <mlivne@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Jan 27, 2022

This pull request introduces 6 alerts and fixes 1 when merging 059c4d3 into 101977e - view on LGTM.com

new alerts:

  • 2 for Unused import
  • 2 for Wrong name for an argument in a class instantiation
  • 1 for Unused local variable
  • 1 for Syntax error

fixed alerts:

  • 1 for Unused import

Signed-off-by: Micha Livne <mlivne@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Jan 30, 2022

This pull request introduces 6 alerts and fixes 1 when merging a7127a9 into 1b83bec - view on LGTM.com

new alerts:

  • 2 for Unused import
  • 2 for Wrong name for an argument in a class instantiation
  • 1 for Unused local variable
  • 1 for Syntax error

fixed alerts:

  • 1 for Unused import

Signed-off-by: Micha Livne <mlivne@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Jan 30, 2022

This pull request introduces 9 alerts and fixes 1 when merging d4cbc7c into 1b83bec - view on LGTM.com

new alerts:

  • 2 for Wrong name for an argument in a call
  • 2 for Unused import
  • 2 for Wrong name for an argument in a class instantiation
  • 1 for Unused local variable
  • 1 for Syntax error
  • 1 for Variable defined multiple times

fixed alerts:

  • 1 for Unused import

Signed-off-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Jan 31, 2022

This pull request introduces 9 alerts and fixes 1 when merging 802eae7 into 1b83bec - view on LGTM.com

new alerts:

  • 3 for Unused import
  • 2 for Wrong name for an argument in a call
  • 2 for Wrong name for an argument in a class instantiation
  • 1 for Unused local variable
  • 1 for Syntax error

fixed alerts:

  • 1 for Unused import

Signed-off-by: Micha Livne <mlivne@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Jan 31, 2022

This pull request introduces 12 alerts and fixes 1 when merging e4008e0 into 9a1cc36 - view on LGTM.com

new alerts:

  • 3 for Unused import
  • 2 for Wrong name for an argument in a call
  • 2 for Unused local variable
  • 2 for Wrong name for an argument in a class instantiation
  • 1 for First parameter of a method is not named 'self'
  • 1 for Superclass attribute shadows subclass method
  • 1 for Syntax error

fixed alerts:

  • 1 for Unused import

Signed-off-by: Micha Livne <mlivne@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Jan 31, 2022

This pull request introduces 9 alerts and fixes 1 when merging 5cee721 into ddcc2a6 - view on LGTM.com

new alerts:

  • 2 for Unused local variable
  • 2 for Unused import
  • 2 for Wrong name for an argument in a class instantiation
  • 1 for First parameter of a method is not named 'self'
  • 1 for Superclass attribute shadows subclass method
  • 1 for Syntax error

fixed alerts:

  • 1 for Unused import

Signed-off-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>
…ivne/NeMo into megatron-encoder-decoder-refactor
Signed-off-by: Micha Livne <mlivne@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Feb 9, 2022

This pull request introduces 2 alerts and fixes 4 when merging 7e66395 into 00df30e - view on LGTM.com

new alerts:

  • 2 for Unused import

fixed alerts:

  • 3 for Unused import
  • 1 for First parameter of a method is not named 'self'

Signed-off-by: Micha Livne <mlivne@nvidia.com>
…ivne/NeMo into megatron-encoder-decoder-refactor
@lgtm-com
Copy link

lgtm-com bot commented Feb 9, 2022

This pull request introduces 2 alerts and fixes 4 when merging eda6288 into 00df30e - view on LGTM.com

new alerts:

  • 2 for Unused import

fixed alerts:

  • 3 for Unused import
  • 1 for First parameter of a method is not named 'self'

ericharper
ericharper previously approved these changes Feb 9, 2022
Copy link
Collaborator

@ericharper ericharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

Had some minor comments, but will approve. Please change your copyrights to 2022.

2. Updated copyright year.

Signed-off-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>
Signed-off-by: Micha Livne <mlivne@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Feb 9, 2022

This pull request introduces 2 alerts and fixes 4 when merging a2c145b into 00df30e - view on LGTM.com

new alerts:

  • 2 for Unused import

fixed alerts:

  • 3 for Unused import
  • 1 for First parameter of a method is not named 'self'

Signed-off-by: Micha Livne <mlivne@nvidia.com>
…ivne/NeMo into megatron-encoder-decoder-refactor
Signed-off-by: Micha Livne <mlivne@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Feb 9, 2022

This pull request introduces 2 alerts and fixes 4 when merging d6bc1ec into 00df30e - view on LGTM.com

new alerts:

  • 2 for Unused import

fixed alerts:

  • 3 for Unused import
  • 1 for First parameter of a method is not named 'self'

Signed-off-by: Micha Livne <mlivne@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Feb 9, 2022

This pull request introduces 2 alerts and fixes 4 when merging be1c9ec into 00df30e - view on LGTM.com

new alerts:

  • 2 for Unused import

fixed alerts:

  • 3 for Unused import
  • 1 for First parameter of a method is not named 'self'

Signed-off-by: Micha Livne <mlivne@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Feb 9, 2022

This pull request introduces 1 alert and fixes 4 when merging 67e56da into 00df30e - view on LGTM.com

new alerts:

  • 1 for Unused import

fixed alerts:

  • 3 for Unused import
  • 1 for First parameter of a method is not named 'self'

Signed-off-by: Micha Livne <mlivne@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Feb 10, 2022

This pull request introduces 1 alert and fixes 4 when merging 17f00d3 into 00df30e - view on LGTM.com

new alerts:

  • 1 for Unused import

fixed alerts:

  • 3 for Unused import
  • 1 for First parameter of a method is not named 'self'

Copy link
Collaborator

@ericharper ericharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@michalivne michalivne merged commit 8e15ba4 into NVIDIA:main Feb 10, 2022
@michalivne michalivne deleted the megatron-encoder-decoder-refactor branch February 10, 2022 16:30
fayejf pushed a commit that referenced this pull request Mar 2, 2022
* 1. Added megatron encoder and decoder.
2. Added megatron base encoder decoder class.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added base encoder-decoder megatron model.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Workingon T5Model refactor.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. FInished T5Model.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added skeleton for ModelPT encoder decoder.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Consolidated functions.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Class structure is done. Not tested yete.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Rearranging code, and fixing bugs.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated name of classes from Model to Module.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Merged conflicts.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Removed unsused imports.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed typo.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixing unused imports.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Relpaced self.cfg to self._cfg.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* Fix typi

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed typos in function arguments.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added support in custom tokenizer.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* Fix for T5 with SPM

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* whole word masking fix for spm

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix whole word masking for spm

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Moved tokens_encoder_decoder.py to modules.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Removed comment.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed class namkes.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated Jenkinsfile with new T5 model.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added a missing module.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated CI to use a smaller model.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added apex import guards.
2. Updated copyright year.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added hint types.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed decoding to use token_logits instead of dec_output.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed indentation bug.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

Co-authored-by: Micha Livne <mlivne@nvidia.com>
Co-authored-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants