A Python UI application that transcribes speech output from your sound card. This can help if you are hearing impaired or just want to have a summary of your meeting.
- Transcribe from any spoken sounds. Either from your sound card or your own microphone (dictation mode).
- The app can transcribe from dozens of languages. Currently, English and German are implemented in the UI, but can be changed easily (2 lines of code).
- Independent of any tool support.
- You can save your transcript (including time codes) to the clipboard or hard disk.
- See what others are saying in an online conference.
- Helpful for the creation of the minutes of meeting (meeting protocol).
- Dictate text in any software (I find the native Windows 10 speech recognition very bad in comparison, tested this a few times).
Further Ideas:
- Create an A.I. which summarizes your text for proper minutes of meeting.
- Create an application that notifies you when your name is mentioned. In chase if you don't want to pay attention to a meeting properly (not recommended 🙂).
-
The application uses the speech recognition service of the Microsoft Azure Cloud. If you don't have an account, the setup is quite easy. There is a free tier that allows you to process 5000 requests (sentences) for free per month. Beyond that, your requests will be throttled. (Taken 20 sentences per page, this would result in roughly 250 pages for free.)
-
Like Alexa, Siri, and Google Assistant, the sound files are transferred to the cloud for further processing. If you don't want this, you can set up a private governance scenario with your own VM and Docker containers.
-
You could also add a further speaker recognition (i.e. who said what). For privacy reasons, I did not include this feature.
- Create an Azure cognitive speech resource. Note the region where you created it, e.g.
West Europe
. - In the Azure portal, go to the created resource, click on
Keys and Endpoint
and note the API KeyKEY 1
.
The software Virtual Audio Cable (VAC) provides access to your sound card stream. It basically creates a virtual microphone that is fed by the output of your sound card. You don't have to install it if you just want to use the dictation mode.
- Download the software. The basic version is free.
- Setup the software: YouTube Configuration Manual
- Under
Sound
, click onRecording
andCable Output
. - Enable
Listen to this device
and select your standard playback device (i.e. sound channel) - This is the important step:
- Under
- Do a
pip install azure-cognitiveservices-speech
- Further required packages:
tkinter
andpprint
All done!
- Call main.py with your noted service region and key:
python main.py -k SERVICE_KEY -r SERVICE_REGION
- Optionally, you can directly specify which microphone to use. For this, you need to retrieve your microphone hardware ID and give it as an input parameter:
python main.py -k SERVICE_KEY -r SERVICE_REGION -m {HW_ID}
For Windows users, I precompiled the Microsoft example as a C# console app (seeget_microphone_id
folder).
- Cick on
EN
orDE
to start the recognition. - The lower window shows the preliminary results, the upper window the final one (including punctuation).
Silent
: disables the UI text output (it is still recorded, though)Save CB
: copies all recorded text to the clip boardSave file
: saves all recorded text into a file in the applications folder.C
: clears all recorded text from the memory.Dict
: Enables dictation mode (it's rather apunctuation mode
). With this active, phrases like "question mark" or "new line" will be replaced by their corresponding characters.
- Automatic language detection. Currently (02/2020) a once detected language is fixed, i.e. you cannot switch on the fly between two languages (this would be useful, though): Documentation.
- Create a routine to select a microphone (voice input). You have to retrieve the microphone ID for this.
- Use proper Logging Framework.
Original photo by Camille Orgel on Unsplash.