-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some bugs in de, es and fr #228
Labels
bug
Something isn't working
Comments
German:
Spanish:
French:
|
tbartley94
pushed a commit
that referenced
this issue
Oct 17, 2024
* Implements the fix Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Expands the list to TLDs with over 1000 registrations as per Google's registry 06/2020 Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Updates the TLD mappings and tests Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * Updates the cache Signed-off-by: Simon Zuberek <szuberek@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Simon Zuberek <szuberek@nvidia.com> Co-authored-by: Simon Zuberek <szuberek@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
A fix for German (1.) and (2.) has been implemented. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi!
I use the latest NeMo release: 1.1.0. I found the following bugs.
Bugs
German (de):
text:
Here is brettspielversand.de.
norm_text:
Here is b r e t t s p i e l v e r s a n d punkt de.
expected output:
Here is brettspielversand punkt de.
text:
Sinnesbereichen.in allen Sinnen.
norm_text:
S i n n e s b e r e i c h e n punkt in allen Sinnen.
expected output:
Sinnesbereichen punkt in allen Sinnen.
text:
Hier zoome ich auf die Läsion. Wir befinden uns also auf der 2D-Mammographie.
norm_text:
Hier zoome ich auf die Läsion. Wir befinden uns also auf der 2D-Mammographie.
expected output:
Hier zoome ich auf die Läsion. Wir befinden uns also auf der Zwei-D-Mammographie.
(not sure)For German normalization, I use the following code:
Spanish (es):
El texto de Li Qin en este libro ahora está disponible en forma de libro electrónico.
norm_text:
El texto de quincuagésimo primero Qin en este libro ahora está disponible en forma de libro electrónico.
expected output:
El texto de Li Qin en este libro ahora está disponible en forma de libro electrónico.
(not sure)For Spanish normalization, I use the following code:
French (fr):
Les Tech Clippings seront diffusés en exclusivité sur la chaîne Youtube DIGITIMES tous les vendredis à 20h.
norm_text:
Les Tech Clippings seront diffusés en exclusivité sur la chaîne Youtube DIGITIMES tous les vendredis à 20h.
expected output:
Les Tech Clippings seront diffusés en exclusivité sur la chaîne YouTube DIGITIMES tous les vendredis à 20 heures.
(not sure)For French normalization, I use the following code:
The text was updated successfully, but these errors were encountered: