Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ES TN Fix for Issue #166 #224

Merged
merged 6 commits into from
Sep 3, 2024
Merged

ES TN Fix for Issue #166 #224

merged 6 commits into from
Sep 3, 2024

Conversation

zoobereq
Copy link
Collaborator

@zoobereq zoobereq commented Aug 29, 2024

What does this PR do ?

The PR implements a fix for one of the handful of issues reported in Issue #166 for Spanish TN.

The issue at hand pertains to certain proper noun strings (e.g. Li, Xi, Yi, etc.) being incorrectly tagged as ordinal Roman numerals and normalized accordingly.

Before your PR is "Ready for review"

Pre checks:

  • Have you signed your commits? Use git commit -s to sign.
  • Do all unittests finish successfully before sending PR?
    1. pytest or (if your machine does not have GPU) pytest --cpu from the root folder (given you marked your test cases accordingly @pytest.mark.run_only_on('CPU')).
    2. Sparrowhawk tests bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...
  • If you are adding a new feature: Have you added test cases for both pytest and Sparrowhawk here.
  • Have you added __init__.py for every folder and subfolder, including data folder which has .TSV files?
  • Have you followed codeQL results and removed unused variables and imports (report is at the bottom of the PR in github review box) ?
  • Have you added the correct license header Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. to all newly added Python files?
  • If you copied nemo_text_processing/text_normalization/en/graph_utils.py your header's second line should be Copyright 2015 and onwards Google, Inc.. See an example here.
  • Remove import guards (try import: ... except: ...) if not already done.
  • If you added a new language or a new feature please update the NeMo documentation (lives in different repo).
  • Have you added your language support to tools/text_processing_deployment/pynini_export.py.

PR Type:

  • New Feature
  • Bugfix
  • Documentation
  • Test

If you haven't finished some of the above items you can still open "Draft" PR.

Simon Zuberek and others added 2 commits August 29, 2024 15:43
Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
@zoobereq zoobereq marked this pull request as ready for review August 30, 2024 14:54
Simon Zuberek and others added 3 commits August 30, 2024 15:10
Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
Copy link
Collaborator

@tbartley94 tbartley94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tbartley94 tbartley94 merged commit 92bdf93 into main Sep 3, 2024
5 checks passed
BuyuanCui pushed a commit that referenced this pull request Sep 19, 2024
* ES TN Fix for Issue #166

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updates the cache

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* Unioning the lower and upper Roman graphs into one

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Removes all upper-case Roman numerals from data files

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

---------

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
Co-authored-by: Simon Zuberek <szuberek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
BuyuanCui pushed a commit that referenced this pull request Sep 26, 2024
* ES TN Fix for Issue #166

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updates the cache

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* Unioning the lower and upper Roman graphs into one

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Removes all upper-case Roman numerals from data files

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

---------

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
Co-authored-by: Simon Zuberek <szuberek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
BuyuanCui pushed a commit that referenced this pull request Oct 16, 2024
* ES TN Fix for Issue #166

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updates the cache

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* Unioning the lower and upper Roman graphs into one

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Removes all upper-case Roman numerals from data files

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>

---------

Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
Co-authored-by: Simon Zuberek <szuberek@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Alex Cui <alexcui1994@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants