-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] How to solve Exception while using another wav file: RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2] ? #1457
Comments
@sankulka few questions:
|
I had the same error. It was due to my microphone being stereo (2 channel) and 44.1Khz instead of mono (1 channel) and 16Khz as required. You can check the sample_rate and resample if needed using torchaudio
|
Thanks rbracco. Yes, by changing the sample rate, it worked well. Regards. |
Yes, by changing the sample rate, it processed well. Thanks Robert.
Regards,
Santosh
…On Thu, Nov 19, 2020 at 12:09 AM Robert Bracco ***@***.***> wrote:
I had the same error. It was due to my microphone being stereo (2 channel)
and 44.1Khz instead of mono (1 channel) and 16Khz as required.
You can check the sample_rate and resample if needed using torchaudio
import torchaudio
y, sr = torchaudio.load('my_sample.wav')
y = y.mean(dim=0) # if there are multiple channels, average them to single channel
if sr != 16000:
resampler = torchaudio.transforms.Resample(sr, 16000)
y_resampled = resampler(y)
torchaudio.save('my_sample_resampled.wav')
files = ['my_sample_resampled.wav']
for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)):
print(f"Audio in {fname} was recognized as: {transcription}")
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1457 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFKVUCYXGMDJHEGTWK6P4KDSQQIGBANCNFSM4TW7FMZQ>
.
|
For me it's not working. showing an error: save() missing 2 required positional arguments: 'src' and 'sample_rate' Please help me solve this problem |
Can you post code? torchaudio.save() requires 3 arguments, a filepath, the audio, and the audio's sample_rate. I've edited the code above to include all 3. It seems like you are doing something like |
Hi rbacco, One channel recording: import pyaudio CHUNK = 1024 p = pyaudio.PyAudio() stream = p.open(format=FORMAT, frames = [] for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)): print("done...") stream.stop_stream() wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb') While prediction you can simply use: If you have audio file with two channel: I have also found a different way to solve this problem. Error is created due to having two channel while recording , So we can take just one channel because both channel have minute change. I have change my code which provide me accurate result. Code is given below. import torchaudio y, sr = torchaudio.load('my_sample.wav') files = ['my_sample_resampled.wav'] Or you can take average both channel and convert it into one channel. import torchaudio y, sr = torchaudio.load('/content/output.wav') files = ['my_sample_resampled.wav'] |
For me this error was being generated because the wav file had stereo channels. I needed to convert the file to mono channel:
|
Describe your question
I just started off learning Nemo for ASR activities and getting exception if I send a different wav file to convert into text. Could you please share what pre-processing has to be performed for any other different wav file/format than an4 dataset
A clear and concise description of your question.
Describe what you want to achieve. And/or what NeMo APIs are unclear/confusing.
I am trying to send a wav file of < 20 sec duration to get the text output from the quartznet model. Here is a sample code:
files = ['my_sample.wav']
for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)):
print(f"Audio in {fname} was recognized as: {transcription}")
After this, I get below exception.
RuntimeError Traceback (most recent call last)
in ()
1 files = ['my_sample.wav']
----> 2 for fname, transcription in zip(files, quartznet.transcribe(paths2audio_files=files)):
3 print(f"Audio in {fname} was recognized as: {transcription}")
14 frames
/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
24 def decorate_context(*args, **kwargs):
25 with self.class():
---> 26 return func(*args, **kwargs)
27 return cast(F, decorate_context)
28
/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/models/ctc_models.py in transcribe(self, paths2audio_files, batch_size, logprobs)
158 for test_batch in temporary_datalayer:
159 logits, logits_len, greedy_predictions = self.forward(
--> 160 input_signal=test_batch[0].to(device), input_signal_length=test_batch[1].to(device)
161 )
162 if logprobs:
/usr/local/lib/python3.6/dist-packages/nemo/core/classes/common.py in call(self, wrapped, instance, args, kwargs)
509
510 # Call the method - this can be forward, or any other callable method
--> 511 outputs = wrapped(*args, **kwargs)
512
513 instance._attach_and_validate_output_types(output_types=output_types, out_objects=outputs)
/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/models/ctc_models.py in forward(self, input_signal, input_signal_length, processed_signal, processed_signal_length)
394 if not has_processed_signal:
395 processed_signal, processed_signal_length = self.preprocessor(
--> 396 input_signal=input_signal, length=input_signal_length,
397 )
398
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
/usr/local/lib/python3.6/dist-packages/nemo/core/classes/common.py in call(self, wrapped, instance, args, kwargs)
509
510 # Call the method - this can be forward, or any other callable method
--> 511 outputs = wrapped(*args, **kwargs)
512
513 instance._attach_and_validate_output_types(output_types=output_types, out_objects=outputs)
/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
24 def decorate_context(*args, **kwargs):
25 with self.class():
---> 26 return func(*args, **kwargs)
27 return cast(F, decorate_context)
28
/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/modules/audio_preprocessing.py in forward(self, input_signal, length)
77 @torch.no_grad()
78 def forward(self, input_signal, length):
---> 79 processed_signal, processed_length = self.get_features(input_signal, length)
80
81 return processed_signal, processed_length
/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/modules/audio_preprocessing.py in get_features(self, input_signal, length)
247
248 def get_features(self, input_signal, length):
--> 249 return self.featurizer(input_signal, length)
250
251 @Property
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
24 def decorate_context(*args, **kwargs):
25 with self.class():
---> 26 return func(*args, **kwargs)
27 return cast(F, decorate_context)
28
/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/parts/features.py in forward(self, x, seq_len)
345 # disable autocast to get full range of stft values
346 with torch.cuda.amp.autocast(enabled=False):
--> 347 x = self.stft(x)
348
349 # torch returns real, imag; so convert to magnitude
/usr/local/lib/python3.6/dist-packages/nemo/collections/asr/parts/features.py in (x)
273 win_length=self.win_length,
274 center=True,
--> 275 window=self.window.to(dtype=torch.float),
276 )
277
/usr/local/lib/python3.6/dist-packages/torch/functional.py in stft(input, n_fft, hop_length, win_length, window, center, pad_mode, normalized, onesided, return_complex)
511 extended_shape = [1] * (3 - signal_dim) + list(input.size())
512 pad = int(n_fft // 2)
--> 513 input = F.pad(input.view(extended_shape), (pad, pad), pad_mode)
514 input = input.view(input.shape[-signal_dim:])
515 return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in _pad(input, pad, mode, value)
3557 assert len(pad) == 2, '3D tensors expect 2 values for padding'
3558 if mode == 'reflect':
-> 3559 return torch._C._nn.reflection_pad1d(input, pad)
3560 elif mode == 'replicate':
3561 return torch._C._nn.replication_pad1d(input, pad)
RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 1, 2]
Environment overview (please complete the following information)
Collab
import nemo
import nemo.collections.asr as nemo_asr
docker pull
&docker run
commands usedEnvironment details
If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:
Additional context
Add any other context about the problem here.
Example: GPU model
The text was updated successfully, but these errors were encountered: