Skip to content

Commit

Permalink
add TN/ITN link in speech tools list (NVIDIA#9142)
Browse files Browse the repository at this point in the history
* add TN/ITN link in speech tools list

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* fix TN docs warnings

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

---------

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
  • Loading branch information
erastorgueva-nv authored May 9, 2024
1 parent 2d2219c commit c9efcbe
Show file tree
Hide file tree
Showing 6 changed files with 22 additions and 23 deletions.
26 changes: 13 additions & 13 deletions docs/source/nlp/text_normalization/nn_text_normalization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ The term *duplex* refers to the fact that our system can be trained to do both T
Quick Start Guide
-----------------

To run the pretrained models interactively see :ref:`inference_text_normalization`.
To run the pretrained models interactively see :ref:`inference_text_normalization_nn`.

Available models
^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -79,7 +79,7 @@ The purpose of the preprocessing scripts is to standardize the format in order t
We also changed punctuation class `PUNCT` to be treated like a plain token ( label changed from `<sil> to ``<self>`), since we want to preserve punctuation even after normalization.
For text normalization it is crucial to avoid unrecoverable errors, which are linguistically coherent and not semantic preserving.
We noticed that due to data scarcity the model struggles verbalizing long numbers correctly, so we changed the ground truth for long numbers to digit by digit verbalization.
We also ignore certain semiotic classes from neural verbalization, e.g. `ELECTRONIC` or `WHITELIST` -- `VERBATIM` and `LETTER` in the original dataset. Instead we label urls/email addresses and abbreviations as plain tokens, and handle it separately with WFST-based grammars, see :ref:`inference_text_normalization`.
We also ignore certain semiotic classes from neural verbalization, e.g. `ELECTRONIC` or `WHITELIST` -- `VERBATIM` and `LETTER` in the original dataset. Instead we label urls/email addresses and abbreviations as plain tokens, and handle it separately with WFST-based grammars, see :ref:`inference_text_normalization_nn`.
This simplifies the task for the model and significantly reduces unrecoverable errors.
Expand Down Expand Up @@ -199,7 +199,7 @@ To enable training with the tarred dataset, add the following arguments:
data.train_ds.use_tarred_dataset=True \
data.train_ds.tar_metadata_file=\PATH_TO\<TARRED_DATA_OUTPUT_DIR>\metadata.json
.. _inference_text_normalization:
.. _inference_text_normalization_nn:

Model Inference
---------------
Expand Down Expand Up @@ -230,16 +230,16 @@ To run inference from a file adjust the previous command by
This pipeline consists of

* WFST-based grammars to verbalize hard classes, such as urls and abbreviations.
* regex pre-preprocssing of the input, e.g.
* adding space around `-` in alpha-numerical words, e.g. `2-car` -> `2 - car`
* converting unicode fraction e.g. ½ to 1/2
* normalizing greek letters and some special characters, e.g. `+` -> `plus`
* Moses :cite:`nlp-textnorm-koehnetal2007moses`. tokenization/preprocessing of the input
* inference with neural tagger and decoder
* Moses postprocessing/ detokenization
* WFST-based grammars to verbalize some `VERBATIM`
* punctuation correction for TTS (to match the output punctuation to the input form)
* WFST-based grammars to verbalize hard classes, such as urls and abbreviations.
* regex pre-preprocssing of the input, e.g.
* adding space around `-` in alpha-numerical words, e.g. `2-car` -> `2 - car`
* converting unicode fraction e.g. ½ to 1/2
* normalizing greek letters and some special characters, e.g. `+` -> `plus`
* Moses :cite:`nlp-textnorm-koehnetal2007moses` tokenization/preprocessing of the input
* inference with neural tagger and decoder
* Moses postprocessing/ detokenization
* WFST-based grammars to verbalize some `VERBATIM`
* punctuation correction for TTS (to match the output punctuation to the input form)

Model Architecture
------------------
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ An example bash-script that runs inference and evaluation is provided here: `run
Quick Start Guide
-----------------

To run the pretrained models see :ref:`inference_text_normalization`.
To run the pretrained models see :ref:`inference_text_normalization_tagging`.

Available models
^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -115,7 +115,7 @@ Example of a training command:
.. _inference_text_normalization:
.. _inference_text_normalization_tagging:

Model Inference
---------------
Expand Down Expand Up @@ -162,4 +162,4 @@ References
.. bibliography:: tn_itn_all.bib
:style: plain
:labelprefix: NLP-TEXTNORM-TAG
:keyprefix: nlp-textnorm-tag
:keyprefix: nlp-textnorm-tag-
2 changes: 1 addition & 1 deletion docs/source/nlp/text_normalization/wfst/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ NeMo-text-processing supports Text Normalization (TN), audio-based TN and Invers

.. warning::

*TN/ITN transitioned from [NVIDIA/NeMo](https://github.com/NVIDIA/NeMo) repository to a standalone [NVIDIA/NeMo-text-processing](https://github.com/NVIDIA/NeMo-text-processing) repository. All updates and discussions/issues should go to the new repository.*
TN/ITN transitioned from `NVIDIA/NeMo <https://github.com/NVIDIA/NeMo>`__ repository to a standalone `NVIDIA/NeMo-text-processing <https://github.com/NVIDIA/NeMo-text-processing>`__ repository. All updates and discussions/issues should go to the new repository.


WFST-based TN/ITN:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@ Text (Inverse) Normalization

.. warning::

*TN/ITN transitioned from [NVIDIA/NeMo](https://github.com/NVIDIA/NeMo) repository to a standalone [NVIDIA/NeMo-text-processing](https://github.com/NVIDIA/NeMo-text-processing) repository. All updates and discussions/issues should go to the new repository.*

TN/ITN transitioned from `NVIDIA/NeMo <https://github.com/NVIDIA/NeMo>`_ repository to a standalone `NVIDIA/NeMo-text-processing <https://github.com/NVIDIA/NeMo-text-processing>`_ repository. All updates and discussions/issues should go to the new repository.

The `nemo_text_processing` Python package is based on WFST grammars :cite:`textprocessing-norm-mohri2005weighted` and supports:

Expand Down Expand Up @@ -188,7 +187,7 @@ Language Support Matrix

See :ref:`Grammar customization <wfst_customization>` for grammar customization details.

See :ref:`Text Processing Deployment <wfst_text_processing_deployment>` for deployment in C++ details.
See :doc:`Text Processing Deployment <./wfst_text_processing_deployment>` for deployment in C++ details.

WFST TN/ITN resources could be found in :ref:`here <wfst_resources>`.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,13 @@ Deploy to Production with C++ backend

.. warning::

*TN/ITN transitioned from [NVIDIA/NeMo](https://github.com/NVIDIA/NeMo) repository to a standalone [NVIDIA/NeMo-text-processing](https://github.com/NVIDIA/NeMo-text-processing) repository. All updates and discussions/issues should go to the new repository.*

TN/ITN transitioned from `NVIDIA/NeMo <https://github.com/NVIDIA/NeMo>`_ repository to a standalone `NVIDIA/NeMo-text-processing <https://github.com/NVIDIA/NeMo-text-processing>`_ repository. All updates and discussions/issues should go to the new repository.

NeMo-text-processing provides tools to deploy :doc:`TN and ITN <wfst_text_normalization>` for production :cite:`textprocessing-deployment-zhang2021nemo`.
It uses `Sparrowhawk <https://github.com/google/sparrowhawk>`_ :cite:`textprocessing-deployment-sparrowhawk` -- an open-source C++ framework by Google.
The grammars written with NeMo-text-processing can be exported into an `OpenFST <https://www.openfst.org/>`_ Archive File (FAR) and dropped into Sparrowhawk.

.. image:: images/deployment_pipeline.png
.. image:: ./images/deployment_pipeline.png
:align: center
:alt: Deployment pipeline
:scale: 50%
Expand Down
1 change: 1 addition & 0 deletions docs/source/tools/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,4 @@ There are also additional NeMo-related tools hosted in separate github repositor
:maxdepth: 1

speech_data_processor
../nlp/text_normalization/intro

0 comments on commit c9efcbe

Please sign in to comment.