Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wav2vec2 inference simple process #5409

Closed
elisonlau opened this issue Dec 26, 2023 · 5 comments
Closed

wav2vec2 inference simple process #5409

elisonlau opened this issue Dec 26, 2023 · 5 comments

Comments

@elisonlau
Copy link

🚀 Feature Request

Provide a simple inference process/pipe for the wav2vec 2.0 model.

Motivation

Current inference script examples/speech_recognition/infer.py handles a lot of cases, resulting being extremely complex.

Pitch

@sooftware and I found issues #2651 has dealed with this request,but two years passed, that codes would happened a lot of errors, because of some library/independent has removed or changed.
@sooftware or anyone else has updated that codes, or another simple solution to do infer

Alternatives

.

Thanks & Looking forward reply

@sooftware
Copy link

Check [link].

@elisonlau
Copy link
Author

Check [link].

@sooftware Thanks so much for your immediately reply. But further more whether you have the latest the recognize.py that fit for the latest fairseq library/code such as: the replacement of flashlight with wav2letter , "cannot import name 'base_architecture' from 'fairseq.models.wav2vec.wav2vec2_asr'"....

@elisonlau
Copy link
Author

elisonlau commented Dec 26, 2023

Check [link].

@sooftware Thanks so much for your immediately reply. But further more whether you have the latest the recognize.py that fit for the latest fairseq library/code such as: the replacement of flashlight with wav2letter , "cannot import name 'base_architecture' from 'fairseq.models.wav2vec.wav2vec2_asr'"....

I dont want to use hf transformers, just want to use fairseq to infer

@uralik
Copy link
Contributor

uralik commented Dec 26, 2023

@elisonlau please consider checking the official torchaudio backend for wav2vec2 based models, IIRC it well supports the ckpts from fairseq there as you can see in this tutorial: https://pytorch.org/audio/stable/tutorials/speech_recognition_pipeline_tutorial.html

I suspect its quote unlikely to see any substantial changes in the existing pipelines in this repo

@elisonlau
Copy link
Author

@elisonlau please consider checking the official torchaudio backend for wav2vec2 based models, IIRC it well supports the ckpts from fairseq there as you can see in this tutorial: https://pytorch.org/audio/stable/tutorials/speech_recognition_pipeline_tutorial.html

I suspect its quote unlikely to see any substantial changes in the existing pipelines in this repo

@uralik thanks for your reference, I have tried it but it didn't work. The error shows in below

 24 with torch.inference_mode():

---> 25 features, _ = model.extract_features(waveform)
TypeError: Wav2VecCtc.forward() takes 1 positional argument but 2 were given

I an not sure whether that my checkpt file is fine-tune model or there is some distinguish between "TORCHAUDIO.PIPELINES" and original fariseq....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants