-
Notifications
You must be signed in to change notification settings - Fork 675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch processing torchaudio-squim #3424
Comments
Hi @bloodraven66, thanks for trying the squim model for evaluation. Actually you can! Just pass a batch tensor to the model, it will generate scores also in batch. |
I guess we can update the tutorial so that it contains links to documentation, using I had a bit of difficulty to find the answer to this, but the following seems to be the documentation. https://pytorch.org/audio/main/generated/torchaudio.prototype.SquimSubjective.html#forward |
There is batch processing but if sequences are of different length and you end up having to pad them to be the same length, predictions are changed as masking is not supported. Are sequences of different length used during training? If so, is there any masking that could be introduced into the implementation for inference? |
During training all audio samples are truncated to 5 seconds. Masking is difficult to support for the Objective model since the model uses DPRNN as backbone, how to transpose the mask along with the RNN input needs to be considered. |
Is there a way to fine-tune a squim subjective model with my own data? What kind of data would I have to use and how would I go about fine tuning (in a high level). Is there documentation |
@fullstackmedusa You need a dataset of paired waveforms and numerical labels (from 1 to 5), and another clean speech dataset as reference. You can find the details in https://arxiv.org/abs/2206.12285 |
🚀 The feature
This is regarding the objective and subjective metrics available as part of torchaudio-squim (https://pytorch.org/audio/main/tutorials/squim_tutorial.html#sphx-glr-tutorials-squim-tutorial-py).
Currently, it works only at batch size = 1. i.e, the waveforms are expected to be of shape (1, N). Can we have batch level processing?
Motivation, pitch
Researchers usually use these metrics on test sets and a range of model configurations. I'm also looking at using the subjective model on multiple non-matching references for a single audio. Batch processing will help a lot in speeding up the processing time.
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: