Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge r1.7.0 main #3773

Merged
merged 22 commits into from
Mar 1, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion docs/source/asr/configs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -671,9 +671,16 @@ The most important component at the top level is the ``strategy``. It can take o
decoding:
strategy: "greedy_batch"

# preserve decoding alignments
preserve_alignments: false

# Overrides the fused batch size after training.
# Setting it to -1 will process whole batch at once when combined with `greedy_batch` decoding strategy
fused_batch_size: Optional[int] = -1

# greedy strategy config
greedy:
max_symbols: 30
max_symbols: 10

# beam strategy config
beam:
Expand Down
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,8 +113,8 @@
bibtex_bibfiles = [
'asr/asr_all.bib',
'nlp/nlp_all.bib',
'nlp/text_normalization/tn_itn_all.bib',
'tools/tools_all.bib',
'nemo_text_processing/textprocessing_all.bib',
'tts_all.bib',
]

Expand Down
13 changes: 4 additions & 9 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,13 +31,14 @@ NVIDIA NeMo User Guide
asr/speaker_diarization/intro

.. toctree::
:maxdepth: 2
:maxdepth: 3
:caption: Natural Language Processing
:name: Natural Language Processing

nlp/megatron

nlp/models
nlp/megatron
nlp/api
nlp/text_normalization/intro

.. toctree::
:maxdepth: 2
Expand All @@ -55,12 +56,6 @@ NVIDIA NeMo User Guide

common/intro

.. toctree::
:maxdepth: 2
:caption: Text Processing
:name: Text Processing

nemo_text_processing/intro

.. toctree::
:maxdepth: 2
Expand Down
17 changes: 0 additions & 17 deletions docs/source/nemo_text_processing/intro.rst

This file was deleted.

78 changes: 0 additions & 78 deletions docs/source/nemo_text_processing/text_normalization.rst

This file was deleted.

1 change: 0 additions & 1 deletion docs/source/nlp/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,3 @@ NeMo's NLP collection supports provides the following task-specific models:
entity_linking
nlp_model
machine_translation
text_normalization
21 changes: 21 additions & 0 deletions docs/source/nlp/text_normalization/intro.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
(Inverse) Text Normalization
============================

NeMo supports Text Normalization (TN) and Inverse Text Normalization (ITN) tasks via rule-based `nemo_text_processing` python package and Neural-based TN/ITN model.

Rule-based (WFST) TN/ITN:

.. toctree::
:maxdepth: 1

wfst/intro


Neural TN/ITN:

.. toctree::
:maxdepth: 1

nn_text_normalization


Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _text_normalization:
.. _nn_text_normalization:

Text Normalization Models
==========================
Neural Text Normalization Models
================================
Text normalization is the task of converting a written text into its spoken form. For example,
``$123`` should be verbalized as ``one hundred twenty three dollars``, while ``123 King Ave``
should be verbalized as ``one twenty three King Avenue``. At the same time, the inverse problem
Expand Down Expand Up @@ -279,7 +279,7 @@ The argument ``data.train_ds.decoder_data_augmentation`` in the config file cont
References
----------

.. bibliography:: nlp_all.bib
.. bibliography:: tn_itn_all.bib
:style: plain
:labelprefix: NLP-TEXTNORM
:keyprefix: nlp-textnorm-
56 changes: 56 additions & 0 deletions docs/source/nlp/text_normalization/tn_itn_all.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
@article{ebden2015kestrel,
title={The Kestrel TTS text normalization system},
author={Ebden, Peter and Sproat, Richard},
journal={Natural Language Engineering},
volume={21},
number={3},
pages={333},
year={2015},
publisher={Cambridge University Press}
}

@article{sproat2016rnn,
title={RNN approaches to text normalization: A challenge},
author={Sproat, Richard and Jaitly, Navdeep},
journal={arXiv preprint arXiv:1611.00068},
year={2016}
}

@book{taylor2009text,
title={Text-to-speech synthesis},
author={Taylor, Paul},
year={2009},
publisher={Cambridge university press}
}

@misc{zhang2021nemo,
title={NeMo Inverse Text Normalization: From Development To Production},
author={Yang Zhang and Evelina Bakhturina and Kyle Gorman and Boris Ginsburg},
year={2021},
eprint={2104.05055},
archivePrefix={arXiv},
primaryClass={cs.CL}
}

@inproceedings{sparrowhawk,
title = {TTS for Low Resource Languages: A Bangla Synthesizer},
author = {Alexander Gutkin and Linne Ha and Martin Jansche and Knot Pipatsrisawat and Richard Sproat},
booktitle = {10th Language Resources and Evaluation Conference},
year = {2016},
}

@article{mohri2005weighted,
title={Weighted automata in text and speech processing},
author={Mohri, Mehryar and Pereira, Fernando and Riley, Michael},
journal={arXiv preprint cs/0503077},
year={2005}
}

@incollection{mohri2009weighted,
title={Weighted automata algorithms},
author={Mohri, Mehryar},
booktitle={Handbook of weighted automata},
pages={213--254},
year={2009},
publisher={Springer}
}
22 changes: 22 additions & 0 deletions docs/source/nlp/text_normalization/wfst/intro.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
WFST-based (Inverse) Text Normalization
=======================================

NeMo supports Text Normalization (TN) and Inverse Text Normalization (ITN) tasks via rule-based `nemo_text_processing` python package and Neural-based TN/ITN model.

`nemo_text_processing` that is installed with the `nemo_toolkit`, see :doc:`NeMo Introduction <../starthere/intro>` for installation details.
Additional requirements can be found in `setup.sh <https://github.com/NVIDIA/NeMo/blob/stable/nemo_text_processing/setup.sh>`_.

Tutorials on how to get started with WFST-based NeMo text normalization could be found `tutorials/text_processing <https://github.com/NVIDIA/NeMo/tree/stable/tutorials/text_processing>`_.

Rule-based (WFST) TN/ITN:

.. toctree::
:maxdepth: 2

wfst_text_normalization
wfst_inverse_text_normalization
wfst_text_processing_deployment
wfst_api



Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _wfst_api:

NeMo Text Processing API
========================

Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,29 @@
.. _wfst_itn:

Inverse Text Normalization
==========================

Inverse text normalization (ITN) is a part of the Automatic Speech Recognition (ASR) post-processing pipeline.
ITN is the task of converting the raw spoken output of the ASR model into its written form to improve text readability.

For example,
`"in nineteen seventy"` -> `"in 1975"`
and `"it costs one hundred and twenty three dollars"` -> `"it costs $123"`.
Quick Start Guide
-----------------

.. code-block:: python

# import WFST-based ITN module
from nemo_text_processing.inverse_text_normalization.inverse_normalize import InverseNormalizer

# initialize inverse normalizer
inverse_normalizer = InverseNormalizer(lang="en")

# try normalizer on a few examples
print(inverse_normalizer.normalize("it costs one hundred and twenty three dollars"))
# >>>"it costs $123"

print(inverse_normalizer.normalize("in nineteen seventy"))
# >>> "in 1970"


NeMo ITN :cite:`textprocessing-itn-zhang2021nemo` is based on WFST-grammars :cite:`textprocessing-itn-Mohri2009`. We also provide a deployment route to C++ using `Sparrowhawk <https://github.com/google/sparrowhawk>`_ :cite:`textprocessing-itn-sparrowhawk` -- an open-source version of Google Kestrel :cite:`textprocessing-itn-ebden2015kestrel`.
See :doc:`Text Procesing Deployment <../tools/text_processing_deployment>` for details.
Expand All @@ -17,11 +34,8 @@ See :doc:`Text Procesing Deployment <../tools/text_processing_deployment>` for d






Classes
----------------------------------
--------


The base class for every grammar is :class:`GraphFst<nemo_text_processing.text_normalization.en.GraphFst>`.
Expand Down Expand Up @@ -75,13 +89,25 @@ Example evaluation run on (cleaned) `Google's text normalization dataset <https:

python run_evaluation.py --input=./en_with_types/output-00001-of-00100 <--language LANGUAGE> [--cat CLASS_CATEGORY] [--filter]

Supported Languages
-------------------

ITN supports: English, Spanish, German, French, Vietnamese, and Russian languages.

Installation
------------

`nemo_text_processing` is installed with the `nemo_toolkit`.

See :doc:`NeMo Introduction <../starthere/intro>` for installation details.

Additional requirements can be found in `setup.sh <https://github.com/NVIDIA/NeMo/blob/stable/nemo_text_processing/setup.sh>`_.


References
----------

.. bibliography:: textprocessing_all.bib
.. bibliography:: ../tn_itn_all.bib
:style: plain
:labelprefix: TEXTPROCESSING-ITN
:keyprefix: textprocessing-itn-
Loading