Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add arg for sampling rate #114

Closed
wants to merge 1 commit into from
Closed

Conversation

karasjoh000
Copy link
Contributor

Some input devices give errors (e.g. scarlett ) if frequency is not set to device frequency.

python -m sounddevice
#    0 HDA Intel PCH: ALC1220 Analog (hw:0,0), ALSA (2 in, 8 out)
#    1 HDA Intel PCH: ALC1220 Digital (hw:0,1), ALSA (0 in, 2 out)
#    2 HDA Intel PCH: ALC1220 Alt Analog (hw:0,2), ALSA (2 in, 0 out)
#    3 HDA NVidia: HDMI 0 (hw:1,3), ALSA (0 in, 8 out)
#    4 HDA NVidia: PHL 272P7VU (hw:1,7), ALSA (0 in, 2 out)
#    5 HDA NVidia: HDMI 2 (hw:1,8), ALSA (0 in, 8 out)
#    6 HDA NVidia: HDMI 3 (hw:1,9), ALSA (0 in, 8 out)
#    7 HDA NVidia: HDMI 4 (hw:1,10), ALSA (0 in, 8 out)
#    8 HDA NVidia: HDMI 5 (hw:1,11), ALSA (0 in, 8 out)
#    9 HDA NVidia: HDMI 6 (hw:1,12), ALSA (0 in, 8 out)
#   10 Scarlett 2i2 USB: Audio (hw:2,0), ALSA (2 in, 0 out)   <-- attempting to use this one
#   11 sysdefault, ALSA (128 in, 128 out)
#   12 front, ALSA (0 in, 8 out)
#   13 surround21, ALSA (0 in, 128 out)
#   14 surround40, ALSA (0 in, 8 out)
#   15 surround41, ALSA (0 in, 128 out)
#   16 surround50, ALSA (0 in, 128 out)
#   17 surround51, ALSA (0 in, 8 out)
#   18 surround71, ALSA (0 in, 8 out)
#   19 iec958, ALSA (0 in, 2 out)
#   20 spdif, ALSA (0 in, 2 out)
# * 21 default, ALSA (128 in, 128 out)
#   22 dmix, ALSA (0 in, 2 out)
whisper-ctranslate2 --live_transcribe True --live_input_device="10" --device=cuda --model medium
# Expression 'paInvalidSampleRate' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2048
# Expression 'PaAlsaStreamComponent_InitialConfigure( &self->capture, inParams, self->primeBuffers, hwParamsCapture, &realSr )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2718
# Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2842
# Consider specifying the language using `--language`. It improves significantly prediction in live transcription.
# �[32mLive stream device: �[37mScarlett 2i2 USB: Audio (hw:2,0)�[0m
# �[32mListening.. �[37m(Ctrl+C to Quit)�[0m
# 
# �[93mQuitting..�[0m
# Traceback (most recent call last):
#   File "/lsiopy/bin/whisper-ctranslate2", line 8, in <module>
#     sys.exit(main())
#              ^^^^^^
#   File "/lsiopy/lib/python3.12/site-packages/src/whisper_ctranslate2/whisper_ctranslate2.py", line 206, in main
#     ).inference()
#       ^^^^^^^^^^^
#   File "/lsiopy/lib/python3.12/site-packages/src/whisper_ctranslate2/live.py", line 170, in inference
#     self.listen()
#   File "/lsiopy/lib/python3.12/site-packages/src/whisper_ctranslate2/live.py", line 158, in listen
#     with sd.InputStream(
#          ^^^^^^^^^^^^^^^
#   File "/lsiopy/lib/python3.12/site-packages/sounddevice.py", line 1440, in __init__
#     _StreamBase.__init__(self, kind='input', wrap_callback='array',
#   File "/lsiopy/lib/python3.12/site-packages/sounddevice.py", line 909, in __init__
#     _check(_lib.Pa_OpenStream(self._ptr, iparameters, oparameters,
#   File "/lsiopy/lib/python3.12/site-packages/sounddevice.py", line 2796, in _check
#     raise PortAudioError(errormsg, err)
# sounddevice.PortAudioError: Error opening InputStream: Invalid sample rate [PaErrorCode -9997]
# to get the sample rate: 
arecord -D "hw:2,0" -f cd 
#Recording WAVE 'stdin' : Signed 16 bit Little Endian, Rate 44100 Hz, Stereo
#arecord: set_params:1352: Sample format non available
#Available formats:
#- S32_LE
# with this PR
python -m src.whisper_ctranslate2.whisper_ctranslate2 --live_transcribe True --live_input_device=10 --live_input_device_sample_rate=44100 --device=cuda --model medium
# Consider specifying the language using `--language`. It improves significantly prediction in live transcription.
# Live stream device: Scarlett 2i2 USB: Audio (hw:2,0)
# Listening.. (Ctrl+C to Quit)
# ....
#  Shhh, shhh, shhh, shhh, shhh.
# ...............
#  This is a test. This is a test.
# 
#

@jordimas
Copy link
Collaborator

jordimas commented Dec 1, 2024

Looks good.
If you can please do:

"make dev"

and commit the changes. This will fix the formatting. Thanks

@karasjoh000
Copy link
Contributor Author

@jordimas amended the changes from make dev

@jordimas
Copy link
Collaborator

jordimas commented Dec 3, 2024

Thanks, some of the tests are failing

----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/runner/work/whisper-ctranslate2/whisper-ctranslate2/tests/testlive.py", line 8, in test_constructor
live = Live(
TypeError: Live.init() missing 1 required positional argument: 'input_device_sample_rate'

They probably just need to be update it

Thanks

@jordimas
Copy link
Collaborator

This has been merged 4d8a052

Thanks!

@jordimas jordimas closed this Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants