[Bug] Invalid `cz` code when calling num2words #4098

SkaceKamen · 2024-12-27T10:56:49Z

Describe the bug

Due to a change in num2words package, cz is no longer valid lang code. cs should be used now.

See savoirfairelinux/num2words#587 for the change

To Reproduce

Try to use TTS with czech language and latest num2words dependency
Crash due to unsupported language

Expected behavior

Czech language should work

Logs

File "/usr/local/lib/python3.10/site-packages/TTS/api.py", line 366, in tts_to_file
    wav = self.tts(
  File "/usr/local/lib/python3.10/site-packages/TTS/api.py", line 312, in tts
    wav = self.synthesizer.tts(
  File "/usr/local/lib/python3.10/site-packages/TTS/utils/synthesizer.py", line 406, in tts
    outputs = self.tts_model.synthesize(
  File "/usr/local/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 410, in synthesize
    return self.full_inference(text, speaker_wav, language, **settings)
  File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 479, in full_inference
    return self.inference(
  File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 525, in inference
    text_tokens = torch.IntTensor(self.tokenizer.encode(sent, lang=language)).unsqueeze(0).to(self.device)
  File "/usr/local/lib/python3.10/site-packages/TTS/tts/layers/xtts/tokenizer.py", line 666, in encode
    txt = self.preprocess_text(txt, lang)
  File "/usr/local/lib/python3.10/site-packages/TTS/tts/layers/xtts/tokenizer.py", line 652, in preprocess_text
    txt = multilingual_cleaners(txt, lang)
  File "/usr/local/lib/python3.10/site-packages/TTS/tts/layers/xtts/tokenizer.py", line 573, in multilingual_cleaners
    text = expand_numbers_multilingual(text, lang)
  File "/usr/local/lib/python3.10/site-packages/TTS/tts/layers/xtts/tokenizer.py", line 562, in expand_numbers_multilingual
    text = re.sub(_number_re, lambda m: _expand_number(m, lang), text)
  File "/usr/local/lib/python3.10/re.py", line 209, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "/usr/local/lib/python3.10/site-packages/TTS/tts/layers/xtts/tokenizer.py", line 562, in <lambda>
    text = re.sub(_number_re, lambda m: _expand_number(m, lang), text)
  File "/usr/local/lib/python3.10/site-packages/TTS/tts/layers/xtts/tokenizer.py", line 542, in _expand_number
    return num2words(int(m.group(0)), lang=lang if lang != "cs" else "cz")
  File "/usr/local/lib/python3.10/site-packages/num2words/__init__.py", line 98, in num2words
    raise NotImplementedError()

Environment

{
    "CUDA": {
        "GPU": [
            "Tesla P40"
        ],
        "available": true,
        "version": "12.4"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.5.1+cu124",
        "TTS": "0.25.1",
        "numpy": "1.26.4"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "",
        "python": "3.10.16",
        "version": "#140-Ubuntu SMP Wed Dec 18 17:59:53 UTC 2024"
    }
}

Additional context

Simple fix would be to remove the fix that was probably applied in the past to get around the num2words non-standard code:
https://github.com/coqui-ai/TTS/blob/dev/TTS/tts/layers/xtts/tokenizer.py#L482
https://github.com/coqui-ai/TTS/blob/dev/TTS/tts/layers/xtts/tokenizer.py#L487
https://github.com/coqui-ai/TTS/blob/dev/TTS/tts/layers/xtts/tokenizer.py#L515
https://github.com/coqui-ai/TTS/blob/dev/TTS/tts/layers/xtts/tokenizer.py#L519

The num2words release that contains the fix:
https://github.com/savoirfairelinux/num2words/releases/tag/v0.5.14

The text was updated successfully, but these errors were encountered:

eginhard · 2024-12-27T17:30:55Z

Thanks for the investigation! This repository is no longer updated, but if you like you can open a PR with that fix in our fork. Otherwise I can take care of it in 1-2 weeks.

SkaceKamen · 2024-12-27T20:21:43Z

Thanks for the info, I'll move my issue there and create a PR

SkaceKamen added the bug Something isn't working label Dec 27, 2024

SkaceKamen mentioned this issue Dec 27, 2024

Add Czech language DrewThomasson/ebook2audiobook#123

Open

SkaceKamen closed this as completed Dec 27, 2024

eginhard mentioned this issue Dec 28, 2024

Fix num2words call using non-standard lang code idiap/coqui-ai-TTS#237

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Invalid `cz` code when calling num2words #4098

[Bug] Invalid `cz` code when calling num2words #4098

SkaceKamen commented Dec 27, 2024 •

edited

Loading

eginhard commented Dec 27, 2024

SkaceKamen commented Dec 27, 2024

[Bug] Invalid cz code when calling num2words #4098

[Bug] Invalid cz code when calling num2words #4098

Comments

SkaceKamen commented Dec 27, 2024 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

eginhard commented Dec 27, 2024

SkaceKamen commented Dec 27, 2024

[Bug] Invalid `cz` code when calling num2words #4098

[Bug] Invalid `cz` code when calling num2words #4098

SkaceKamen commented Dec 27, 2024 •

edited

Loading