Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Megatron KERPLE positional embeddings #6478

Merged
merged 25 commits into from
Apr 24, 2023

Commits on Apr 17, 2023

  1. [TTS] FastPitch adapter fine-tune and conditional layer normalization (

    …#6416)
    
    [TTS] FastPitch adapter fine-tune and conditional layer normalization (#6416)
    
    ---------
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    hsiehjackson and pre-commit-ci[bot] authored Apr 17, 2023
    Configuration menu
    Copy the full SHA
    b9a9c40 View commit details
    Browse the repository at this point in the history
  2. [TTS] whitelist broken path fix. (#6412)

    * [TTS] whitelist broken path fix.
    
    Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    ---------
    
    Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    XuesongYang and pre-commit-ci[bot] authored Apr 17, 2023
    Configuration menu
    Copy the full SHA
    14e9668 View commit details
    Browse the repository at this point in the history

Commits on Apr 18, 2023

  1. [TTS] FastPitch speaker encoder (#6417)

    * Add initial codes
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * Remove wemb
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * Fix import
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * Restore aligner loss
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * Add ConditionalInput
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * Fix error and support pre-trained config
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * Follow comments
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * Rename config
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * Change copyright and random weight test
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * Add initial codes
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * Fix import error
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * Add initial codes
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * Fix dataset error
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * Remove reference speaker embedding
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * Remove SV encoder
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * Follow comments
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * Fix length type
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * Fix append
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * Move error msg
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * Add look-up into speaker encoder
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * Add valueerror msg
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * Move lookup
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * Remove unused
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * Fix error
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * Rebase and Fix error
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * Fix spk encoder
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * Rename n_speakers
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * Follow comments
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * Fix n_speakers None error
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    ---------
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    hsiehjackson and pre-commit-ci[bot] authored Apr 18, 2023
    Configuration menu
    Copy the full SHA
    536ee62 View commit details
    Browse the repository at this point in the history
  2. Sharded manifests for tarred datasets (#6395)

    * testing sharded manifests
    
    Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
    
    * compatibility
    
    Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
    
    * proper fixes
    
    Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
    
    * adding flag tot convert_to_tarred_audio_dataset
    
    Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
    
    * shard_manifests conf param
    
    Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
    
    * propagating the shard_manifests param
    
    Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
    
    * propagating the shard_manifests param
    
    Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
    
    * distributed checks
    
    Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
    
    * typo
    
    Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
    
    * typo
    
    Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
    
    * fixes
    
    Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
    
    * fixes
    
    Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
    
    * fixes
    
    Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
    
    * fixes
    
    Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
    
    * fixes
    
    Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
    
    * fixes
    
    Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fixes based on PR comments and tests
    
    Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fixes to convert_to_tarred_audio_dataset.py
    
    Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
    
    * reversing manifest shards flag
    
    Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * tests
    
    Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
    
    * excluding manifests from webdataset url expansion
    
    Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * expand manifest paths before attempting to cache from datastore
    
    Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
    
    * explicit use of UTF-8 for manifest i/o
    
    Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
    
    ---------
    
    Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    bmwshop and pre-commit-ci[bot] authored Apr 18, 2023
    Configuration menu
    Copy the full SHA
    ceb539f View commit details
    Browse the repository at this point in the history
  3. Update wfst_text_normalization.rst (#6374)

    Add Hungarian (incoming in NeMo-text-processing)
    
    Signed-off-by: Jim O’Regan <jaoregan@tcd.ie>
    jimregan authored Apr 18, 2023
    Configuration menu
    Copy the full SHA
    499a3b2 View commit details
    Browse the repository at this point in the history

Commits on Apr 19, 2023

  1. Support Swiglu in TP PP Conversion (#6437) (#6451)

    * Support Swiglu in TP PP Conversion
    
    
    
    * Guard activation
    
    
    
    * Guard activation
    
    
    
    ---------
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
    github-actions[bot] and titu1994 authored Apr 19, 2023
    Configuration menu
    Copy the full SHA
    a365879 View commit details
    Browse the repository at this point in the history
  2. Update NeMo_TTS_Primer.ipynb (#6436)

    * Update NeMo_TTS_Primer.ipynb
    
    Changed a mistake in line 782. Instead of frequency band (ie. pitch) we should write frequency bin. Note that frequency bins in FFT are not related to pitch.
    
    Signed-off-by: Mostafa Ghorbandoost <mos.ghorbandoost@gmail.com>
    
    * Update NeMo_TTS_Primer.ipynb
    
    Corrected the description of spectrogram and mel spectrogram calculations in lines 782 & 783 and added a fourth point to the description and added a reference for more mathematical details at the end of this point.
    
    Signed-off-by: Mostafa Ghorbandoost <mos.ghorbandoost@gmail.com>
    
    ---------
    
    Signed-off-by: Mostafa Ghorbandoost <mos.ghorbandoost@gmail.com>
    pythinker authored Apr 19, 2023
    Configuration menu
    Copy the full SHA
    be711c9 View commit details
    Browse the repository at this point in the history

Commits on Apr 20, 2023

  1. add rampup batch size support for Megatron GPT (#6424)

    * added rampup batch size support
    
    Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
    
    * added tests for rampup batch size
    
    Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
    
    * fixed the typos
    
    Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
    
    * added assertions
    
    Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
    
    * changed assertion rules
    
    Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
    
    * deleted unused imports
    
    Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
    
    * changed tests for rampup batch size
    
    Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
    
    * updated rampup batch size tests
    
    Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fixed styling
    
    Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
    
    * rampup batch size tests changes
    
    Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
    
    ---------
    
    Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
    Signed-off-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
    Co-authored-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    Co-authored-by: Eric Harper <complex451@gmail.com>
    4 people authored Apr 20, 2023
    Configuration menu
    Copy the full SHA
    9e72326 View commit details
    Browse the repository at this point in the history
  2. Meagtron encoder decoder fix for empty validation outputs (#6459) (#6461

    )
    
    * 1. Meagtron encoder decoder fix for empty validation outputs.
    
    
    
    * 1. Debugging.
    
    ---------
    
    Signed-off-by: Micha Livne <mlivne@nvidia.com>
    Co-authored-by: Micha Livne <michalivne@users.noreply.github.com>
    Co-authored-by: Micha Livne <mlivne@nvidia.com>
    3 people authored Apr 20, 2023
    Configuration menu
    Copy the full SHA
    41fcf4d View commit details
    Browse the repository at this point in the history

Commits on Apr 21, 2023

  1. Code-Switching dataset creation - upgrading to aggregate tokenizer ma…

    …nifest format (#6448)
    
    * added functionality to create agg tokenizer compatible manifest for CS, flag to use this mode by default
    
    Signed-off-by: Kunal Dhawan <kunaldhawan97@gmail.com>
    
    * updated README with the new agg_tokenizer_manifest flag
    
    Signed-off-by: Kunal Dhawan <kunaldhawan97@gmail.com>
    
    * fixed typo in scripts/speech_recognition/code_switching/README.md
    
    Signed-off-by: Kunal Dhawan <kunaldhawan97@gmail.com>
    
    * changed agg_tokenizer_manifest to is_lid_manifest
    
    Signed-off-by: Kunal Dhawan <kunaldhawan97@gmail.com>
    
    ---------
    
    Signed-off-by: Kunal Dhawan <kunaldhawan97@gmail.com>
    Co-authored-by: Dima Rekesh <bmwshop@gmail.com>
    KunalDhawan and bmwshop authored Apr 21, 2023
    Configuration menu
    Copy the full SHA
    77f0959 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    2822ff3 View commit details
    Browse the repository at this point in the history
  3. Update script for ngram rnnt and hat beam search decoding (#6370)

    * add rnnt ngram beamsearch script
    
    Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
    
    * add return encoding embedding option
    
    Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
    
    * update script
    
    Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
    
    * add rnnt and hat ngram decoding script
    
    Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
    
    * add some parameters
    
    Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * add return_encoder_embeddings parameter to RNNTDecodingConfig
    
    Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
    
    * replace return_encoder_embeddings parameter
    
    Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
    
    * generalization of scipt behavior
    
    Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * remove return_encoder_embeddings parameter
    
    Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
    
    * remove return_encoder_embeddings parameter
    
    Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
    
    * add manual encoder_embeddings calculation
    
    Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix beam_width value to 8
    
    Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>
    
    * fix rescoring description
    
    Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>
    
    ---------
    
    Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
    Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
    3 people authored Apr 21, 2023
    Configuration menu
    Copy the full SHA
    244ba8d View commit details
    Browse the repository at this point in the history

Commits on Apr 22, 2023

  1. BERT pre-training mp fork to spawn (#6442) (#6454)

    * change bert fork to spawn
    
    
    
    * num_workers=0 fix
    
    
    
    ---------
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    Co-authored-by: Abhinav Khattar <aklife97@gmail.com>
    github-actions[bot] and aklife97 authored Apr 22, 2023
    Configuration menu
    Copy the full SHA
    094cbae View commit details
    Browse the repository at this point in the history
  2. fix replace_bos_with_pad not found (#6443) (#6450)

    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    Co-authored-by: Abhinav Khattar <aklife97@gmail.com>
    github-actions[bot] and aklife97 authored Apr 22, 2023
    Configuration menu
    Copy the full SHA
    daa9744 View commit details
    Browse the repository at this point in the history
  3. reduce workers on NMT CI (#6472) (#6474)

    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    Co-authored-by: Abhinav Khattar <aklife97@gmail.com>
    github-actions[bot] and aklife97 authored Apr 22, 2023
    Configuration menu
    Copy the full SHA
    557c4b7 View commit details
    Browse the repository at this point in the history

Commits on Apr 23, 2023

  1. 1. Added KERPLE positional embeddings to encoder-decoder.

    Signed-off-by: Micha Livne <mlivne@nvidia.com>
    michalivne committed Apr 23, 2023
    Configuration menu
    Copy the full SHA
    690742b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8664d09 View commit details
    Browse the repository at this point in the history
  3. 1. Added a missing file.

    Signed-off-by: Micha Livne <mlivne@nvidia.com>
    michalivne committed Apr 23, 2023
    Configuration menu
    Copy the full SHA
    ed4c373 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    e3ca438 View commit details
    Browse the repository at this point in the history
  5. 1. Fixing commits.

    Signed-off-by: Micha Livne <mlivne@nvidia.com>
    michalivne committed Apr 23, 2023
    Configuration menu
    Copy the full SHA
    c6fa1a9 View commit details
    Browse the repository at this point in the history
  6. Merge branch 'megatron-kerple-positional-embeddings' of github.com:NV…

    …IDIA/NeMo into megatron-kerple-positional-embeddings
    michalivne committed Apr 23, 2023
    Configuration menu
    Copy the full SHA
    f482074 View commit details
    Browse the repository at this point in the history
  7. 1. Debugging.

    michalivne committed Apr 23, 2023
    Configuration menu
    Copy the full SHA
    f6ed850 View commit details
    Browse the repository at this point in the history
  8. 1. Debugging.

    michalivne committed Apr 23, 2023
    Configuration menu
    Copy the full SHA
    27cf8de View commit details
    Browse the repository at this point in the history
  9. 1. Debugging.

    michalivne committed Apr 23, 2023
    Configuration menu
    Copy the full SHA
    0f593b8 View commit details
    Browse the repository at this point in the history
  10. 1. Debugging.

    michalivne committed Apr 23, 2023
    Configuration menu
    Copy the full SHA
    9e84e42 View commit details
    Browse the repository at this point in the history