wav2vec2 inference simple process #5409

elisonlau · 2023-12-26T09:46:42Z

🚀 Feature Request

Provide a simple inference process/pipe for the wav2vec 2.0 model.

Motivation

Current inference script examples/speech_recognition/infer.py handles a lot of cases, resulting being extremely complex.

Pitch

@sooftware and I found issues #2651 has dealed with this request,but two years passed, that codes would happened a lot of errors, because of some library/independent has removed or changed.
@sooftware or anyone else has updated that codes, or another simple solution to do infer

Alternatives

.

Thanks & Looking forward reply

sooftware · 2023-12-26T09:49:16Z

Check [link].

elisonlau · 2023-12-26T13:35:37Z

Check [link].

@sooftware Thanks so much for your immediately reply. But further more whether you have the latest the recognize.py that fit for the latest fairseq library/code such as: the replacement of flashlight with wav2letter , "cannot import name 'base_architecture' from 'fairseq.models.wav2vec.wav2vec2_asr'"....

elisonlau · 2023-12-26T14:04:00Z

Check [link].

@sooftware Thanks so much for your immediately reply. But further more whether you have the latest the recognize.py that fit for the latest fairseq library/code such as: the replacement of flashlight with wav2letter , "cannot import name 'base_architecture' from 'fairseq.models.wav2vec.wav2vec2_asr'"....

I dont want to use hf transformers, just want to use fairseq to infer

uralik · 2023-12-26T18:59:28Z

@elisonlau please consider checking the official torchaudio backend for wav2vec2 based models, IIRC it well supports the ckpts from fairseq there as you can see in this tutorial: https://pytorch.org/audio/stable/tutorials/speech_recognition_pipeline_tutorial.html

I suspect its quote unlikely to see any substantial changes in the existing pipelines in this repo

elisonlau · 2023-12-27T03:14:14Z

@elisonlau please consider checking the official torchaudio backend for wav2vec2 based models, IIRC it well supports the ckpts from fairseq there as you can see in this tutorial: https://pytorch.org/audio/stable/tutorials/speech_recognition_pipeline_tutorial.html

I suspect its quote unlikely to see any substantial changes in the existing pipelines in this repo

@uralik thanks for your reference, I have tried it but it didn't work. The error shows in below

 24 with torch.inference_mode():

---> 25 features, _ = model.extract_features(waveform)
TypeError: Wav2VecCtc.forward() takes 1 positional argument but 2 were given

I an not sure whether that my checkpt file is fine-tune model or there is some distinguish between "TORCHAUDIO.PIPELINES" and original fariseq....

elisonlau added enhancement help wanted needs triage labels Dec 26, 2023

elisonlau closed this as completed Jan 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wav2vec2 inference simple process #5409

wav2vec2 inference simple process #5409

elisonlau commented Dec 26, 2023

sooftware commented Dec 26, 2023

elisonlau commented Dec 26, 2023

elisonlau commented Dec 26, 2023 •

edited

Loading

uralik commented Dec 26, 2023

elisonlau commented Dec 27, 2023

wav2vec2 inference simple process #5409

wav2vec2 inference simple process #5409

Comments

elisonlau commented Dec 26, 2023

🚀 Feature Request

Motivation

Pitch

Alternatives

sooftware commented Dec 26, 2023

elisonlau commented Dec 26, 2023

elisonlau commented Dec 26, 2023 • edited Loading

uralik commented Dec 26, 2023

elisonlau commented Dec 27, 2023

@uralik thanks for your reference, I have tried it but it didn't work. The error shows in below

---> 25 features, _ = model.extract_features(waveform) TypeError: Wav2VecCtc.forward() takes 1 positional argument but 2 were given

elisonlau commented Dec 26, 2023 •

edited

Loading

---> 25 features, _ = model.extract_features(waveform)
TypeError: Wav2VecCtc.forward() takes 1 positional argument but 2 were given