Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NFA updates #6695

Merged
merged 60 commits into from
Jun 9, 2023
Merged

NFA updates #6695

merged 60 commits into from
Jun 9, 2023

Commits on Mar 11, 2023

  1. update V_NEGATIVE_NUM constant to make better use of torch.float32 range

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Mar 11, 2023
    Configuration menu
    Copy the full SHA
    037194b View commit details
    Browse the repository at this point in the history
  2. adjust backpointers dtype if U_max too large

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Mar 11, 2023
    Configuration menu
    Copy the full SHA
    cb69ccc View commit details
    Browse the repository at this point in the history
  3. Remove print statements

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Mar 11, 2023
    Configuration menu
    Copy the full SHA
    0dd8729 View commit details
    Browse the repository at this point in the history
  4. Remove need for user to specify model_downsample_factor

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Mar 11, 2023
    Configuration menu
    Copy the full SHA
    cceab7c View commit details
    Browse the repository at this point in the history

Commits on Mar 13, 2023

  1. change model.cfg.sample_rate to model.cfg.preprocessor.sample_rate

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Mar 13, 2023
    Configuration menu
    Copy the full SHA
    f9489d8 View commit details
    Browse the repository at this point in the history
  2. add check to make sure that window_stride is in model.cfg.preprocessor

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Mar 13, 2023
    Configuration menu
    Copy the full SHA
    7b37972 View commit details
    Browse the repository at this point in the history

Commits on Mar 15, 2023

  1. reduce memory consumption of backpointers by making them relative ins…

    …tead of absolute
    
    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Mar 15, 2023
    Configuration menu
    Copy the full SHA
    0cef35e View commit details
    Browse the repository at this point in the history
  2. update librosa.get_duration() 'filename' param to 'path'

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Mar 15, 2023
    Configuration menu
    Copy the full SHA
    11f3430 View commit details
    Browse the repository at this point in the history

Commits on Mar 16, 2023

  1. Do not throw error if 'text' or 'pred_text' are empty and make sure C…

    …TM filepaths in the output manifest are null
    
    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Mar 16, 2023
    Configuration menu
    Copy the full SHA
    9d9b7b2 View commit details
    Browse the repository at this point in the history
  2. preprocess input text by removing any duplicate spaces and converting…

    … any newlines to spaces
    
    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Mar 16, 2023
    Configuration menu
    Copy the full SHA
    d916db4 View commit details
    Browse the repository at this point in the history

Commits on Apr 4, 2023

  1. Use Utterance dataclass instead of dictionaries for keeping track of …

    …token/word/segment alignments
    
    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Apr 4, 2023
    Configuration menu
    Copy the full SHA
    643a8ee View commit details
    Browse the repository at this point in the history
  2. Merge branch 'main' into nfa_updates

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Apr 4, 2023
    Configuration menu
    Copy the full SHA
    2be92bf View commit details
    Browse the repository at this point in the history

Commits on Apr 5, 2023

  1. refactor so can save alignments as ctm and ass format files

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Apr 5, 2023
    Configuration menu
    Copy the full SHA
    0897f33 View commit details
    Browse the repository at this point in the history
  2. fix bugs for saving character based ASS files and for using pred_text…

    … to do alignment
    
    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Apr 5, 2023
    Configuration menu
    Copy the full SHA
    c74a63f View commit details
    Browse the repository at this point in the history

Commits on Apr 6, 2023

  1. Make token level .ass file use tokens with recovered capitalization

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Apr 6, 2023
    Configuration menu
    Copy the full SHA
    f7c920e View commit details
    Browse the repository at this point in the history
  2. Do not try to generate alignment files if text or pred text is empty,…

    … or if number of tokens is too large for T
    
    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Apr 6, 2023
    Configuration menu
    Copy the full SHA
    45e3fb1 View commit details
    Browse the repository at this point in the history
  3. rename output manifest file to say '_with_output_file_paths.json'

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Apr 6, 2023
    Configuration menu
    Copy the full SHA
    a57409d View commit details
    Browse the repository at this point in the history

Commits on Apr 7, 2023

  1. add flag to resegment ass subtitle file to fill available text space

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Apr 7, 2023
    Configuration menu
    Copy the full SHA
    dbd5232 View commit details
    Browse the repository at this point in the history

Commits on Apr 8, 2023

  1. Fix bug in resegmentation code

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Apr 8, 2023
    Configuration menu
    Copy the full SHA
    10f85e3 View commit details
    Browse the repository at this point in the history

Commits on Apr 20, 2023

  1. Fix bug which skipped some utterances if batch_size more than 1

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Apr 20, 2023
    Configuration menu
    Copy the full SHA
    f1561d4 View commit details
    Browse the repository at this point in the history

Commits on Apr 21, 2023

  1. reduce memory requirements by doing torch.gather on a slice of the lo…

    …g probs when they are needed
    
    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Apr 21, 2023
    Configuration menu
    Copy the full SHA
    5ebe1e4 View commit details
    Browse the repository at this point in the history

Commits on Apr 22, 2023

  1. reduce memory requirements by not saving whole v_matrix

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Apr 22, 2023
    Configuration menu
    Copy the full SHA
    ccda03e View commit details
    Browse the repository at this point in the history

Commits on May 16, 2023

  1. remove any extra spaces in pred_text

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed May 16, 2023
    Configuration menu
    Copy the full SHA
    aad4d04 View commit details
    Browse the repository at this point in the history

Commits on May 22, 2023

  1. Merge branch 'main' into nfa_updates

    Signed-off-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com>
    erastorgueva-nv authored May 22, 2023
    Configuration menu
    Copy the full SHA
    49031d6 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    033a9fd View commit details
    Browse the repository at this point in the history
  3. remove unused list pred_text_all_lines

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed May 22, 2023
    Configuration menu
    Copy the full SHA
    b49ddb7 View commit details
    Browse the repository at this point in the history

Commits on May 23, 2023

  1. support using hybrid Transducer-CTC models for alignment

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed May 23, 2023
    Configuration menu
    Copy the full SHA
    afab69e View commit details
    Browse the repository at this point in the history

Commits on Jun 6, 2023

  1. Configuration menu
    Copy the full SHA
    041b18d View commit details
    Browse the repository at this point in the history
  2. fix typo - add brackets to torch.cuda.is_available()

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Jun 6, 2023
    Configuration menu
    Copy the full SHA
    623369a View commit details
    Browse the repository at this point in the history
  3. make sure token case restoration will work if superscript or subscrip…

    …t num is in text
    
    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Jun 6, 2023
    Configuration menu
    Copy the full SHA
    2debbc0 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    2879cad View commit details
    Browse the repository at this point in the history
  5. remove any BOM from input text

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Jun 6, 2023
    Configuration menu
    Copy the full SHA
    918abdc View commit details
    Browse the repository at this point in the history
  6. pick out 1st hypotheses if there is a tuple of them

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Jun 6, 2023
    Configuration menu
    Copy the full SHA
    9c91d43 View commit details
    Browse the repository at this point in the history
  7. Remove print statement

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Jun 6, 2023
    Configuration menu
    Copy the full SHA
    846957c View commit details
    Browse the repository at this point in the history
  8. add detail to error message if fail to recover capitalization of tokens

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Jun 6, 2023
    Configuration menu
    Copy the full SHA
    0940c70 View commit details
    Browse the repository at this point in the history
  9. add flag use_local_attention

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Jun 6, 2023
    Configuration menu
    Copy the full SHA
    ed6a4b8 View commit details
    Browse the repository at this point in the history
  10. rename additional_ctm_grouping_separator -> additional_segment_groupi…

    …ng_separator
    
    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Jun 6, 2023
    Configuration menu
    Copy the full SHA
    412633e View commit details
    Browse the repository at this point in the history
  11. update description of additional_segment_grouping_separator

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Jun 6, 2023
    Configuration menu
    Copy the full SHA
    ad7df58 View commit details
    Browse the repository at this point in the history
  12. add simple docstring to get_utt_obj function

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Jun 6, 2023
    Configuration menu
    Copy the full SHA
    bd5d274 View commit details
    Browse the repository at this point in the history
  13. Make docstring for add_t_start_end_to_utt_obj

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Jun 6, 2023
    Configuration menu
    Copy the full SHA
    a5f793b View commit details
    Browse the repository at this point in the history
  14. update docstrings for add_t_start_end_to_utt_obj and get_batch_variables

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Jun 6, 2023
    Configuration menu
    Copy the full SHA
    47c72d1 View commit details
    Browse the repository at this point in the history
  15. update README and comments in align.py

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Jun 6, 2023
    Configuration menu
    Copy the full SHA
    e239375 View commit details
    Browse the repository at this point in the history
  16. change 'ground truth' -> 'reference text' in documentation

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Jun 6, 2023
    Configuration menu
    Copy the full SHA
    af35b5e View commit details
    Browse the repository at this point in the history

Commits on Jun 7, 2023

  1. add header

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Jun 7, 2023
    Configuration menu
    Copy the full SHA
    ecb3ce2 View commit details
    Browse the repository at this point in the history
  2. add comments to get_utt_obj function

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Jun 7, 2023
    Configuration menu
    Copy the full SHA
    93ac8f6 View commit details
    Browse the repository at this point in the history
  3. move constants so they are after imports

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Jun 7, 2023
    Configuration menu
    Copy the full SHA
    ce243e0 View commit details
    Browse the repository at this point in the history
  4. add file description for make_ass_files

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Jun 7, 2023
    Configuration menu
    Copy the full SHA
    9827ce4 View commit details
    Browse the repository at this point in the history
  5. get rid of Utterance object's S attribute, and correct tests so they …

    …pass now
    
    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Jun 7, 2023
    Configuration menu
    Copy the full SHA
    5935ada View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    23380c5 View commit details
    Browse the repository at this point in the history
  7. remove some unused variables

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Jun 7, 2023
    Configuration menu
    Copy the full SHA
    8ac4a2f View commit details
    Browse the repository at this point in the history
  8. remove unused variable model from functions saving output files

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Jun 7, 2023
    Configuration menu
    Copy the full SHA
    195f306 View commit details
    Browse the repository at this point in the history
  9. remove unused var minimum_timestamp_duration from make_ass_files func…

    …tions and return utt_obj
    
    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Jun 7, 2023
    Configuration menu
    Copy the full SHA
    b92c25f View commit details
    Browse the repository at this point in the history
  10. move minimum_timestamp_duration param to CTMFileConfig

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Jun 7, 2023
    Configuration menu
    Copy the full SHA
    d3a49e5 View commit details
    Browse the repository at this point in the history

Commits on Jun 8, 2023

  1. Configuration menu
    Copy the full SHA
    4f7714c View commit details
    Browse the repository at this point in the history
  2. remove unused enumerate and unused import

    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Jun 8, 2023
    Configuration menu
    Copy the full SHA
    dab537d View commit details
    Browse the repository at this point in the history
  3. switch reading duration from librosa to soundfile to avoid filename/p…

    …ath deprecation message
    
    Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
    erastorgueva-nv committed Jun 8, 2023
    Configuration menu
    Copy the full SHA
    7312497 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    76cd1b3 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    6b7959b View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    0b6c5f7 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    93c0d69 View commit details
    Browse the repository at this point in the history