add arg for sampling rate #114

karasjoh000 · 2024-12-01T03:29:46Z

Some input devices give errors (e.g. scarlett ) if frequency is not set to device frequency.

python -m sounddevice
#    0 HDA Intel PCH: ALC1220 Analog (hw:0,0), ALSA (2 in, 8 out)
#    1 HDA Intel PCH: ALC1220 Digital (hw:0,1), ALSA (0 in, 2 out)
#    2 HDA Intel PCH: ALC1220 Alt Analog (hw:0,2), ALSA (2 in, 0 out)
#    3 HDA NVidia: HDMI 0 (hw:1,3), ALSA (0 in, 8 out)
#    4 HDA NVidia: PHL 272P7VU (hw:1,7), ALSA (0 in, 2 out)
#    5 HDA NVidia: HDMI 2 (hw:1,8), ALSA (0 in, 8 out)
#    6 HDA NVidia: HDMI 3 (hw:1,9), ALSA (0 in, 8 out)
#    7 HDA NVidia: HDMI 4 (hw:1,10), ALSA (0 in, 8 out)
#    8 HDA NVidia: HDMI 5 (hw:1,11), ALSA (0 in, 8 out)
#    9 HDA NVidia: HDMI 6 (hw:1,12), ALSA (0 in, 8 out)
#   10 Scarlett 2i2 USB: Audio (hw:2,0), ALSA (2 in, 0 out)   <-- attempting to use this one
#   11 sysdefault, ALSA (128 in, 128 out)
#   12 front, ALSA (0 in, 8 out)
#   13 surround21, ALSA (0 in, 128 out)
#   14 surround40, ALSA (0 in, 8 out)
#   15 surround41, ALSA (0 in, 128 out)
#   16 surround50, ALSA (0 in, 128 out)
#   17 surround51, ALSA (0 in, 8 out)
#   18 surround71, ALSA (0 in, 8 out)
#   19 iec958, ALSA (0 in, 2 out)
#   20 spdif, ALSA (0 in, 2 out)
# * 21 default, ALSA (128 in, 128 out)
#   22 dmix, ALSA (0 in, 2 out)

whisper-ctranslate2 --live_transcribe True --live_input_device="10" --device=cuda --model medium
# Expression 'paInvalidSampleRate' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2048
# Expression 'PaAlsaStreamComponent_InitialConfigure( &self->capture, inParams, self->primeBuffers, hwParamsCapture, &realSr )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2718
# Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2842
# Consider specifying the language using `--language`. It improves significantly prediction in live transcription.
# �[32mLive stream device: �[37mScarlett 2i2 USB: Audio (hw:2,0)�[0m
# �[32mListening.. �[37m(Ctrl+C to Quit)�[0m
# 
# �[93mQuitting..�[0m
# Traceback (most recent call last):
#   File "/lsiopy/bin/whisper-ctranslate2", line 8, in <module>
#     sys.exit(main())
#              ^^^^^^
#   File "/lsiopy/lib/python3.12/site-packages/src/whisper_ctranslate2/whisper_ctranslate2.py", line 206, in main
#     ).inference()
#       ^^^^^^^^^^^
#   File "/lsiopy/lib/python3.12/site-packages/src/whisper_ctranslate2/live.py", line 170, in inference
#     self.listen()
#   File "/lsiopy/lib/python3.12/site-packages/src/whisper_ctranslate2/live.py", line 158, in listen
#     with sd.InputStream(
#          ^^^^^^^^^^^^^^^
#   File "/lsiopy/lib/python3.12/site-packages/sounddevice.py", line 1440, in __init__
#     _StreamBase.__init__(self, kind='input', wrap_callback='array',
#   File "/lsiopy/lib/python3.12/site-packages/sounddevice.py", line 909, in __init__
#     _check(_lib.Pa_OpenStream(self._ptr, iparameters, oparameters,
#   File "/lsiopy/lib/python3.12/site-packages/sounddevice.py", line 2796, in _check
#     raise PortAudioError(errormsg, err)
# sounddevice.PortAudioError: Error opening InputStream: Invalid sample rate [PaErrorCode -9997]

# to get the sample rate: 
arecord -D "hw:2,0" -f cd 
#Recording WAVE 'stdin' : Signed 16 bit Little Endian, Rate 44100 Hz, Stereo
#arecord: set_params:1352: Sample format non available
#Available formats:
#- S32_LE

# with this PR
python -m src.whisper_ctranslate2.whisper_ctranslate2 --live_transcribe True --live_input_device=10 --live_input_device_sample_rate=44100 --device=cuda --model medium
# Consider specifying the language using `--language`. It improves significantly prediction in live transcription.
# Live stream device: Scarlett 2i2 USB: Audio (hw:2,0)
# Listening.. (Ctrl+C to Quit)
# ....
#  Shhh, shhh, shhh, shhh, shhh.
# ...............
#  This is a test. This is a test.
# 
#

jordimas · 2024-12-01T08:49:51Z

Looks good.
If you can please do:

"make dev"

and commit the changes. This will fix the formatting. Thanks

karasjoh000 · 2024-12-02T20:57:54Z

@jordimas amended the changes from make dev

jordimas · 2024-12-03T06:33:15Z

Thanks, some of the tests are failing

----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/runner/work/whisper-ctranslate2/whisper-ctranslate2/tests/testlive.py", line 8, in test_constructor
live = Live(
TypeError: Live.init() missing 1 required positional argument: 'input_device_sample_rate'

They probably just need to be update it

Thanks

jordimas · 2024-12-12T17:28:18Z

This has been merged 4d8a052

Thanks!

karasjoh000 had a problem deploying to CI/CD December 1, 2024 08:33 — with GitHub Actions Failure

karasjoh000 had a problem deploying to CI/CD December 1, 2024 08:33 — with GitHub Actions Error

karasjoh000 had a problem deploying to CI/CD December 1, 2024 08:33 — with GitHub Actions Failure

karasjoh000 had a problem deploying to CI/CD December 1, 2024 08:33 — with GitHub Actions Error

add arg for sampling rate

5505108

karasjoh000 force-pushed the main branch from b7c878f to 5505108 Compare December 2, 2024 20:57

karasjoh000 had a problem deploying to CI/CD December 3, 2024 06:22 — with GitHub Actions Error

karasjoh000 had a problem deploying to CI/CD December 3, 2024 06:22 — with GitHub Actions Failure

karasjoh000 had a problem deploying to CI/CD December 3, 2024 06:22 — with GitHub Actions Error

jordimas closed this Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add arg for sampling rate #114

add arg for sampling rate #114

karasjoh000 commented Dec 1, 2024

jordimas commented Dec 1, 2024

karasjoh000 commented Dec 2, 2024

jordimas commented Dec 3, 2024

jordimas commented Dec 12, 2024

add arg for sampling rate #114

add arg for sampling rate #114

Conversation

karasjoh000 commented Dec 1, 2024

jordimas commented Dec 1, 2024

karasjoh000 commented Dec 2, 2024

jordimas commented Dec 3, 2024

jordimas commented Dec 12, 2024