Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identifying Spoken Language #4903

Closed
Sasha-Bachynskyi opened this issue Sep 8, 2022 · 7 comments
Closed

Identifying Spoken Language #4903

Sasha-Bachynskyi opened this issue Sep 8, 2022 · 7 comments
Labels
bug Something isn't working

Comments

@Sasha-Bachynskyi
Copy link

Sasha-Bachynskyi commented Sep 8, 2022

Hello, developers.
Is there a model or something to identify spoken language? For example, how to identify whether a speaker speaks English or Russian.
I looked for it in the tutorials and found nothing.
I will appreciate any help

@Sasha-Bachynskyi Sasha-Bachynskyi added the bug Something isn't working label Sep 8, 2022
@Sasha-Bachynskyi Sasha-Bachynskyi changed the title Define Spoken Language Identify Spoken Language Sep 8, 2022
@Sasha-Bachynskyi Sasha-Bachynskyi changed the title Identify Spoken Language Identifying Spoken Language Sep 8, 2022
@nithinraok
Copy link
Collaborator

@fayejf is the model published? Please point to the docs.

@jnnnnn
Copy link

jnnnnn commented Sep 30, 2022

@fayejf
Copy link
Collaborator

fayejf commented Oct 5, 2022

@jnnnnn @Sasha-Bachynskyi The model is published. Thanks for your patience. #5080

@fayejf fayejf closed this as completed Oct 5, 2022
@Sasha-Bachynskyi
Copy link
Author

Hi, @fayejf!

I can't figure out how to use this model. There is only an instance of how to initialize a model.
Could you give an example of what method I should call and how to pass the audio file in?

Thank you in advance for helping!

@nithinraok
Copy link
Collaborator

Hi @Sasha-Bachynskyi , PR to merge info regarding docs should be merged soon.
#5366

You may infer the label using EncDecSpeakerLabelModel class. https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/api.html#nemo.collections.asr.models.EncDecSpeakerLabelModel

For inferencing on single audio file use get_label method. Instead for inferencing on multiple files use batch_inference

@Sasha-Bachynskyi
Copy link
Author

Hi @nithinraok, I'm sorry for bothering you.
I want to identify the spoken language in a single file.

I use the following instruction

Below is my code:

import nemo.collections.asr as nemo_asr

langid_model = nemo_asr.models.EncDecSpeakerLabelModel.from_pretrained(model_name="langid_ambernet")

lang = langid_model.get_label('audio.wav')

But, I get an error:

Traceback (most recent call last):
  File "/home/denis/test_lang/test-lang.py", line 5, in <module>
    lang = vad_model.get_label('audio.wav')
  File "/home/denis/anaconda3/envs/nemo2/lib/python3.9/site-packages/nemo/collections/asr/models/label_models.py", line 455, in get_label
    _, logits = self.infer_file(path2audio_file=path2audio_file)
  File "/home/denis/anaconda3/envs/nemo2/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/denis/anaconda3/envs/nemo2/lib/python3.9/site-packages/nemo/collections/asr/models/label_models.py", line 427, in infer_file
    audio = librosa.core.resample(audio, sr, target_sr)
TypeError: resample() takes 1 positional argument but 3 were given

It seems that there is something wrong with librosa

System info:
Nvidia video A40
Nemo - branch main, installed 22th of February 2023
librosa - 0.10.0

What can it be? I'd appreciate any help in advance

@nithinraok
Copy link
Collaborator

Looks like librosa is expecting mandatory naming args from newest version. Lower your librosa version or use the fix provided at #6086

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants