A simple CLI application that listens to your voice, converts it to text, and copies it to your clipboard, powered by Google Cloud Speech-To-Text API.
-
I suffer from repetitive strain injury (osteoarthritis in the fingers, I guess), so, it helps a lot if I can type using my voice.
-
I am not a native English speaker — macOS’s dictation fails to accurately recognize my voice accent.
-
macOS’ Dictation does not have a public API that apps can use (not hackable).
-
Google Cloud Text to speech enhanced voice models are much more accurate than macOS’ Dictation and the free webkitSpeechRecognition API.
-
Create a Google Cloud platform project and enable billing on it.
-
Enable the Google Cloud speech API and turn on data logging.
-
Clone this repository, install the dependencies (yarn).
-
Create a
.env
file in the repo:GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
-
Make
vx.js
accessible from the shell:ln -s "$(pwd)/vx.js" ~/bin/vx
-
Run
vx
, speak, and have your spoken words copied into your clipboard.
I have to use the premium "video" voice model which is able to recognize my voice with acceptable accuracy (none of the other models can do this). The model is also much better at recognizing speech with a lot of technical terms, compared to the default model.
It costs USD 0.048 per minute to use. The first 60 minutes per month are free.