OpenDictaVoice is a voice dictation program written in python3. It works with any programs, for instance your usual programs (text editors, mails, etc.). Just maintain and Ctrl+Shift while you speak! More precisely, it is a graphical user interface that is compatible with several voice recognition systems, e.g. "Google STT" or "CMU Sphinx".
Run the shell command:
git clone https://github.com/OpenDictaVoice/OpenDictaVoice_Software.git
Or, if you prefer, download the repository directly from Github
If you are using Linux or MacOS, it is necessary to install portaudio so that Python will be able to use the microphone once you allow it.
On linux, to install portaudio, run the shell command:
sudo apt-get install portaudio19-dev
On MacOS, to install portaudio, run the shell command:
brew install portaudio
if you are using Linux, it is necessary to install tkinter for python3 (which is installed by default on Windows and MacOs). To install it, run the shell command:
sudo apt-get install python3-tk
OpenDictaVoice need the following modules installed to work (SpeechRecognition, pyaudio, pynput, python-xlib, six)
You can install all dependancies automatically by running, in the downloaded directory, the command:
python -m pip install -r ./requirements.txt
But if you prefer to do it manually, just follow steps below:
- The OpenDictaVoice program uses SpeechRecognotion module to work, wich is available here: https://pypi.org/project/SpeechRecognition/
To install it, run the shell command:
python -m pip install SpeechRecognition
- PyAudio: To install it, run the shell command:
python -m pip install pyaudio
NOTE: If you are installing OpenDictaVoice on Windows, the PyAudio module installation may not work this way.
It is because the installation file provided by this method is not appropriated to your computer.
In this case, go to https://www.lfd.uci.edu/~gohlke/pythonlibs/#pyaudio to download the correct wheel file for your computer, (PyAudio‑0.2.11‑cp39‑cp39‑win_amd64.whl for example), and use it to install pyaudio by running the command
python -m pip install PyAudio‑0.2.11‑cp39‑cp39‑win_amd64.whl
3) pynput: To install it, run the shell command:
python -m pip install pynput
- python-xlib: To install it, run the shell command:
python -m pip install python-xlib
- six: To install it, run the shell command:
python -m pip install six
Go to the the dowloaded directory, then run the shell command:
python3 ./opendictavoice_app/opendictavoice_main.py
Or, launch the program by clicking on the opendictavoice_main.py file in your favorite file browser
Note: you can of course make a shortcut to the ./opendictavoice_app/opendictavoice_main.py
file to launch it easily.
It will open a window with low opacity that is always in the foreground.
By default the recognized language is in french. If you want to change it for the english you can do it with the scrolling menu.
Then, put the focus on the element you want to write in (libreoffice writer here in the example),
and when you want to speak, hold CTRL + SHIFT keys to launch the record.
When you release the CTRL + SHIFT keys, it will stop the record, analyse the recorded sound and then write the text where the focus is.
Note: if you don't want to use the CTRL + SHIFT hotkeys, you can click on the microphone button of the program to launch the record, and click again on it to stop the record. This will then switch the focus using an ALT + TAB shortcut to get the focus on the element you wanted to write at first.
Note 2: You can start a new recording before the analysis of the first one is finished. Each recording is managed by a queue so that the recognized texts will always be written in the order in which they were recorded. It is usefull to play for time.
OpenDictaVoice is a graphical interface to do voice dictation using a Speech To Text Engine. For now it uses "Google STT" or "CMU Sphinx" but if you have your own Speech To Text Engine (Created with Common Voice and DeepSpeech for example, https://research.mozilla.org/machine-learning/), you can use it with this program by modifying the source code.
Moreover, speech recognition is important in many applications (ergonomics, new form of human-machine interface, assistance for disabled people, etc.) and the possibility of having this kind of technology widely accessible in open source is most likely desirable. But the most precious commodity for this is voice samples...
Fortunately, we can all make this possible. There is an open source project you can ALL contribute to : Common Voice (https://voice.mozilla.org/)
Common Voice is a project built by Mozilla to constitute a large database of spoken text, which can be downloaded by everyone, to train algorithms in purpose to have its own voice recognition !
If only 1000 persons did 45 records every day during 6 month, the database would contain 10 000 hours of records which is enough to have a correct voice recognition !
So if you want to make speech recognition incredibly efficient and accessible for everyone as GNU-linux is, contribute in Common voice, even 5 small recordings per week is already great.