A little application listens to my voice and converts to text and copies it to the clipboard. Built using Electron.
-
I suffer from repetitive strain injury (osteoarthritis in the fingers, I guess), so, it helps a lot if I can type using my voice.
-
I am not a native English speaker — macOS’s dictation fails to accurately recognize my voice accent.
-
macOS’ Dictation does not have a public API that apps can use (not hackable).
-
Google Cloud Text to speech enhanced voice models are much more accurate than macOS’ Dictation and the free webkitSpeechRecognition API. It also comes with automatic punctuation insertion, which means it can automatically add full stops, commas, and question marks.
vx
support pluggable speech recognizers. You can choose to use one of the
following:
-
Google Chrome’s webkitSpeechRecognition (free, default)
- Can be used free of charge
- No time limit
- Requires opening a browser tab in background
- Recognition quality is not so great
- No automatic punctuation insertion
-
Google Speech-To-Text API (paid)
- Much better recognition accuracy for English language
- Automatically adds punctuation marks
- Does not require opening a browser tab in background
- Costs money
- Each session is limited to 60 seconds
-
Clone this repository.
-
Install the dependencies for the Electron app:
yarn
-
Install the dependencies for the React app, located in
vxgui
foler:(cd vxgui && yarn)
-
Build the React app:
(cd vxgui && yarn build)
-
Build an
.app
bundle:yarn build
Google Chrome provides the webkitSpeechRecognition API which is available for
free. However it can only be used inside Google Chrome (which means it is
not available in other Chromium-based environment, including Electron).
vx
uses a hacky workaround by launching Google Chrome to a webpage which helps
expose the webkitSpeechRecognition API to the Electron app via socket.io.
-
This is the default behavior; you don't need to configure anything to use this mode.
-
You can configure more options by creating
~/.vxrc.yml
in the home directory with the following configuration:speechProvider: chrome speechProviderOptions: port: 5555 openBrowser: false # default: true app: Google Chrome # see: https://www.npmjs.com/package/opn#app
-
Create a Google Cloud platform project and enable billing on it.
-
Go to Google Cloud API library and enable the Google Cloud Speech API.
-
To get access to enhanced voice models, turn on data logging.
-
Set up authentication with a service account. Download a service account file and save it to your computer.
-
Create
~/.vxrc.yml
with the following configuration:speechProvider: google-cloud speechProviderOptions: serviceAccount: /path/to/service-account.json recordProgram: /usr/bin/rec
serviceAccount
is the full path to your service account filerecordProgram
is the full path to SoX’srec
executable.
-
Launch the Electron app at
dist/mac/vxtron.app
. -
Press Cmd+Shift+L to dictate English text. Press it again to make it stop listening.
-
Press Cmd+Alt+Shift+L to dictate Thai text. Press it again to make it stop listening.
-
As soon as you finish speaking, the recognized text will be copied to the clipboard automatically.
-
The app remembers the past texts (not persistent), and you can use Cmd+Alt+Up and Cmd+Alt+Down to cycle through them. As you cycle through the history, the recalled text will also be copied to the clipboard automatically.
A browser-based development environment is available. It is purely browser-based
and doesn't use Electron APIs or Google Cloud Speech-To-Text. Instead, it uses
the webkitSpeechRecognition
API to recognize your voice.
This means it doesn't cost anything while development, but recognition accuracy will suffer, and automatic punctuation insertion will not be available.
-
Run
yarn start
invxgui
directory:(cd vxgui && yarn start)
-
This will launch create-react-app development server. A browser should open to
localhost:3000
automatically. Make sure you are using Google Chrome (otherwise the speech recognition API will not be available). -
The key bindings are the same, except that you use Ctrl instead of Cmd key. For example, press Ctrl+Alt+L to listen to text.
The accelerator key is changed to prevent conflict between the development version and the electron version, which may be running at the same time.
-
The copy functionality will not work because a webapp may not copy stuff to the clipboard without a user interaction. However, this can be circumvented by exposing Chrome DevTool’s
copy
function into the webapp. You can do that by running the following command in the JavaScript console:copy('...') Object.assign(window, { copy })
Once you finish developing, run yarn build
in vxgui
directory.
(cd vxgui && yarn build)
This will build the files into the vxgui/build
directory.
Sometimes, you really need to test some Electron-specific APIs, and having to rebuild a bundle every time we want to test it is not ideal.
Alternatively, with the development server running, you can run the Electron app
with an environment variable VX_DEV=1
to make the Electron app load the app
from localhost:3000
instead of the built files.
VX_DEV=1 yarn start
There are two main components in this project:
-
The web application, built using React and TypeScript.
- It contains the core application logic, such as how the transcript from the speech recognition service is handled.
- It is designed to run both in Browser environment (for development) and Electron environment (for actual use).
Environment Browser Electron Use case For development For real-world usage Display As a web app As an overlay HUD Activation Only inside web app Available system-wide Speech recognition API webkitSpeechRecognition API Google Cloud Speech-To-Text Recognition quality Not so accurate for me Very accurate Automatic punctuation Not supported Supported Cost of usage Free $0.048/min -
The electron application
- Provides the overlay GUI.
- Provides access to global hotkeys
- Provides access to Google Cloud Speech APIs.
I have to use the premium "video" voice model which is able to recognize my voice with acceptable accuracy (none of the other models can do this). The model is also much better at recognizing speech with a lot of technical terms, compared to the default model.
It costs USD 0.048 per minute to use. The first 60 minutes per month are free.
When the speech API is being used, vx keeps track of its usage log in
~/.vx-google-cloud-speech.log
. It is a TSV file with 3 columns:
- Timestamp
- Usage in seconds, rounded up.
- The pricing plan (1: normal speech recognition at $0.024/min, 2: enhanced video speech recognition at $0.048/min).
There is also a simple Ruby script that displays a summary of how much is spent
on this API per day. You can run it using ruby cost.rb
.