[Question] How to solve Exception while using another wav file: RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2] ? #1457

sankulka · 2020-11-16T10:21:17Z

Describe your question
I just started off learning Nemo for ASR activities and getting exception if I send a different wav file to convert into text. Could you please share what pre-processing has to be performed for any other different wav file/format than an4 dataset

A clear and concise description of your question.
Describe what you want to achieve. And/or what NeMo APIs are unclear/confusing.
I am trying to send a wav file of < 20 sec duration to get the text output from the quartznet model. Here is a sample code:

files = ['my_sample.wav']
for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)):
print(f"Audio in {fname} was recognized as: {transcription}")

After this, I get below exception.

RuntimeError Traceback (most recent call last)
in ()
1 files = ['my_sample.wav']
----> 2 for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)):
3 print(f"Audio in {fname} was recognized as: {transcription}")

14 frames
/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
24 def decorate_context(*args, **kwargs):
25 with self.class():
---> 26 return func(*args, **kwargs)
27 return cast(F, decorate_context)
28

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/models/ctc_models.py in transcribe(self, paths2audio_files, batch_size, logprobs)
158 for test_batch in temporary_datalayer:
159 logits, logits_len, greedy_predictions = self.forward(
--> 160 input_signal=test_batch[0].to(device), input_signal_length=test_batch[1].to(device)
161 )
162 if logprobs:

/usr/local/lib/python3.6/dist-packages/nemo/core/classes/common.py in call(self, wrapped, instance, args, kwargs)
509
510 # Call the method - this can be forward, or any other callable method
--> 511 outputs = wrapped(*args, **kwargs)
512
513 instance._attach_and_validate_output_types(output_types=output_types, out_objects=outputs)

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/models/ctc_models.py in forward(self, input_signal, input_signal_length, processed_signal, processed_signal_length)
394 if not has_processed_signal:
395 processed_signal, processed_signal_length = self.preprocessor(
--> 396 input_signal=input_signal, length=input_signal_length,
397 )
398

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

/usr/local/lib/python3.6/dist-packages/nemo/core/classes/common.py in call(self, wrapped, instance, args, kwargs)
509
510 # Call the method - this can be forward, or any other callable method
--> 511 outputs = wrapped(*args, **kwargs)
512
513 instance._attach_and_validate_output_types(output_types=output_types, out_objects=outputs)

/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
24 def decorate_context(*args, **kwargs):
25 with self.class():
---> 26 return func(*args, **kwargs)
27 return cast(F, decorate_context)
28

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/modules/audio_preprocessing.py in forward(self, input_signal, length)
77 @torch.no_grad()
78 def forward(self, input_signal, length):
---> 79 processed_signal, processed_length = self.get_features(input_signal, length)
80
81 return processed_signal, processed_length

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/modules/audio_preprocessing.py in get_features(self, input_signal, length)
247
248 def get_features(self, input_signal, length):
--> 249 return self.featurizer(input_signal, length)
250
251 @Property

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),

/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
24 def decorate_context(*args, **kwargs):
25 with self.class():
---> 26 return func(*args, **kwargs)
27 return cast(F, decorate_context)
28

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/parts/features.py in forward(self, x, seq_len)
345 # disable autocast to get full range of stft values
346 with torch.cuda.amp.autocast(enabled=False):
--> 347 x = self.stft(x)
348
349 # torch returns real, imag; so convert to magnitude

/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/parts/features.py in (x)
273 win_length=self.win_length,
274 center=True,
--> 275 window=self.window.to(dtype=torch.float),
276 )
277

/usr/local/lib/python3.6/dist-packages/torch/functional.py in stft(input, n_fft, hop_length, win_length, window, center, pad_mode, normalized, onesided, return_complex)
511 extended_shape = [1] * (3 - signal_dim) + list(input.size())
512 pad = int(n_fft // 2)
--> 513 input = F.pad(input.view(extended_shape), (pad, pad), pad_mode)
514 input = input.view(input.shape[-signal_dim:])
515 return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in _pad(input, pad, mode, value)
3557 assert len(pad) == 2, '3D tensors expect 2 values for padding'
3558 if mode == 'reflect':
-> 3559 return torch._C._nn.reflection_pad1d(input, pad)
3560 elif mode == 'replicate':
3561 return torch._C._nn.replication_pad1d(input, pad)

RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2]

Environment overview (please complete the following information)

Environment location: [Bare-metal, Docker, Cloud(specify cloud provider - AWS, Azure, GCP, Collab)]
Collab
Method of NeMo install: [pip install or from source]. Please specify exact commands you used to install.
import nemo
import nemo.collections.asr as nemo_asr
If method of install is [Docker], provide docker pull & docker run commands used

Environment details

If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:

OS version
PyTorch version
Python version

Additional context

Add any other context about the problem here.
Example: GPU model

The text was updated successfully, but these errors were encountered:

okuchaiev · 2020-11-18T06:01:29Z

@sankulka few questions:

Are you able to successfully execute this notebook https://colab.research.google.com/github/NVIDIA/NeMo/blob/v1.0.0b2/tutorials/NeMo_voice_swap_app.ipynb
If you replace your file with this one https://dldata-public.s3.us-east-2.amazonaws.com/2086-149220-0033.wav , does the error go away?
Do you know how many channels and what sample rate is your file? (Should be single channel 16Khz)

rbracco · 2020-11-18T18:39:13Z

I had the same error. It was due to my microphone being stereo (2 channel) and 44.1Khz instead of mono (1 channel) and 16Khz as required.

You can check the sample_rate and resample if needed using torchaudio

import torchaudio

y, sr = torchaudio.load('my_sample.wav')
y = y.mean(dim=0) # if there are multiple channels, average them to single channel
if sr != 16000:
    resampler = torchaudio.transforms.Resample(sr, 16000)
    y_resampled = resampler(y)
torchaudio.save('my_sample_resampled.wav', y, sr)

files = ['my_sample_resampled.wav']
for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)):
print(f"Audio in {fname} was recognized as: {transcription}")

sankulka · 2020-11-19T08:09:39Z

Thanks rbracco. Yes, by changing the sample rate, it worked well. Regards.

sankulka · 2020-11-19T08:12:33Z

Yes, by changing the sample rate, it processed well. Thanks Robert. Regards, Santosh

…

On Thu, Nov 19, 2020 at 12:09 AM Robert Bracco ***@***.***> wrote: I had the same error. It was due to my microphone being stereo (2 channel) and 44.1Khz instead of mono (1 channel) and 16Khz as required. You can check the sample_rate and resample if needed using torchaudio import torchaudio y, sr = torchaudio.load('my_sample.wav') y = y.mean(dim=0) # if there are multiple channels, average them to single channel if sr != 16000: resampler = torchaudio.transforms.Resample(sr, 16000) y_resampled = resampler(y) torchaudio.save('my_sample_resampled.wav') files = ['my_sample_resampled.wav'] for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)): print(f"Audio in {fname} was recognized as: {transcription}") — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1457 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFKVUCYXGMDJHEGTWK6P4KDSQQIGBANCNFSM4TW7FMZQ> .

Gangwaradi · 2021-10-13T19:35:59Z

For me it's not working. showing an error:

save() missing 2 required positional arguments: 'src' and 'sample_rate'

Please help me solve this problem

rbracco · 2021-10-13T20:06:31Z

Can you post code? torchaudio.save() requires 3 arguments, a filepath, the audio, and the audio's sample_rate. I've edited the code above to include all 3.

It seems like you are doing something like torchaudio.save('my_sample_resampled.wav') but that's just the filepath, the correct would be torchaudio.save('my_sample_resampled.wav', y, sr)

Gangwaradi · 2021-10-13T21:37:04Z

Hi rbacco,
Thanks for your reply. I have solved this problem in different way. Instead of changing my recorded file I have changed my code for recording audio. Earlier I was using 2 channel while recording. When I changed it to 1 then all thing working fine. Code to record audio which have only one channel give below which is copied from https://dsp.stackexchange.com/questions/13728/what-are-chunks-when-recording-a-voice-signal

One channel recording:

import pyaudio
import wave
import sys

CHUNK = 1024
What is CHUNKS here ?
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "my_sample.wav"

p = pyaudio.PyAudio()

stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
print("start....")

frames = []

for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)

print("done...")

stream.stop_stream()
stream.close()
p.terminate()

wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()

While prediction you can simply use:
files = ['my_sample.wav']
for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)):
print(f"Audio in {fname} was recognized as: {transcription}")

If you have audio file with two channel:

I have also found a different way to solve this problem. Error is created due to having two channel while recording , So we can take just one channel because both channel have minute change. I have change my code which provide me accurate result. Code is given below.

import torchaudio
import torch

y, sr = torchaudio.load('my_sample.wav')
y = torch.reshape(y[0], (1, y[0].size(0)))
torchaudio.save('my_sample_resampled.wav',y ,sr)

files = ['my_sample_resampled.wav']
for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)):
print(f"Audio in {fname} was recognized as: {transcription}")

Or you can take average both channel and convert it into one channel.

import torchaudio

y, sr = torchaudio.load('/content/output.wav')
y = y.mean(dim = 0) # if there are multiple channels, average them to single channel
y = torch.reshape(y,(1, y.size(0)))
torchaudio.save('my_sample_resampled.wav', y, sr)

files = ['my_sample_resampled.wav']
for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)):
print(f"Audio in {fname} was recognized as: {transcription}")

sheecegardezi · 2022-01-02T10:30:42Z

For me this error was being generated because the wav file had stereo channels. I needed to convert the file to mono channel:

from pydub import AudioSegment
file_path = "input_sound_file.wav"
sound = AudioSegment.from_wav(file_path)
sound = sound.set_channels(1)
sound.export(file_path, format="wav")

sankulka added the question label Nov 16, 2020

sankulka closed this as completed Nov 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] How to solve Exception while using another wav file: RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2] ? #1457

[Question] How to solve Exception while using another wav file: RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2] ? #1457

sankulka commented Nov 16, 2020 •

edited

Loading

okuchaiev commented Nov 18, 2020

rbracco commented Nov 18, 2020 •

edited

Loading

sankulka commented Nov 19, 2020

sankulka commented Nov 19, 2020 via email

Gangwaradi commented Oct 13, 2021

rbracco commented Oct 13, 2021

Gangwaradi commented Oct 13, 2021 •

edited

Loading

sheecegardezi commented Jan 2, 2022

[Question] How to solve Exception while using another wav file: RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2] ? #1457

[Question] How to solve Exception while using another wav file: RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2] ? #1457

Comments

sankulka commented Nov 16, 2020 • edited Loading

okuchaiev commented Nov 18, 2020

rbracco commented Nov 18, 2020 • edited Loading

sankulka commented Nov 19, 2020

sankulka commented Nov 19, 2020 via email

Gangwaradi commented Oct 13, 2021

rbracco commented Oct 13, 2021

Gangwaradi commented Oct 13, 2021 • edited Loading

sheecegardezi commented Jan 2, 2022

sankulka commented Nov 16, 2020 •

edited

Loading

rbracco commented Nov 18, 2020 •

edited

Loading

Gangwaradi commented Oct 13, 2021 •

edited

Loading