You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
speaker_reco_infer.py loads the model and manifestfiles and then breaks,
I guess its again a pytorch issue? wanted to use the model from yesterday:
[NeMo W 2021-09-17 12:18:42 patch_utils:49] torch.stft() signature has been updated for PyTorch 1.7+
Please update PyTorch to remain compatible with later versions of NeMo.
RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2]
A little bit more explanation to the inference part would be nice,
like a link to the script that i was using here, and also how to use the embedding that is created at the end of the jupyter notebook
The text was updated successfully, but these errors were encountered:
Describe the bug
speaker_reco_infer.py loads the model and manifestfiles and then breaks,
I guess its again a pytorch issue? wanted to use the model from yesterday:
[NeMo W 2021-09-17 12:18:42 patch_utils:49] torch.stft() signature has been updated for PyTorch 1.7+
Please update PyTorch to remain compatible with later versions of NeMo.
RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2]
Steps/Code to reproduce bug
Container:
nvcr.io/nvidia/nemo:1.2.0
but installed:
python -m pip install git+https://github.com/NVIDIA/NeMo.git@'main'
python -m pip install pytorch_lightning==1.4.2
(like used/working in the goolge colab,
I also tried nemo 1.2 and pytorch-lightning 1.3.8 and nemo 1.3 and recent 1.4.7 later on)
run:
https://github.com/NVIDIA/NeMo/blob/48fe9e69feba7651694fd6ae0a096a0655ed601c/examples/speaker_tasks/recognition/speaker_reco_infer.py
with:
model, train.json from:
https://colab.research.google.com/github/NVIDIA/NeMo/blob/main/tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb
added:
test.json and bonian.wav
{"audio_filepath": "bonian.wav", "offset": 0, "duration": 11.370666666666667, "label": ""}
Expected behavior
working :-)
Environment overview (please complete the following information)
docker pull
&docker run
commands usedEnvironment details
If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:
Additional context
https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/speaker_recognition/results.html
A little bit more explanation to the inference part would be nice,
like a link to the script that i was using here, and also how to use the embedding that is created at the end of the jupyter notebook
The text was updated successfully, but these errors were encountered: