In this project we are going to build deep learning model to process and convert African language (Amharic) speech/voice to text format.
The World Food Program wants to deploy an intelligent form that collects nutritional information of food bought and sold at markets in three different countries in Africa - Ethiopia and Kenya.
The design of this intelligent form requires selected people to install an app on their mobile phone, and whenever they buy food, they use their voice to activate the app to register the list of items they just bought in their own language. The intelligent systems in the app are expected to live to transcribe the speech-to-text and organize the information in an easy-to-process way in a database.
It is our obligation to create a deep learning model capable of converting speech to text. The model we create should be precise and resistant to background noise. This project was created during the fourth week of the Machine Learning training session at 10Academy.
- Install Required Python Moduls
git clone https://github.com/Micky373/speech_to_text
cd speech_to_text
pip install -r requirements.txt
- Jupiter Notebook
cd notebooks
jupyter notebook
- Model Training ui (Not implemented yet)
mlflow ui
- Dashboard (Not implemented yet)
streamlit run app.py
The folder is being tarcked with DVC and the files are only shown after cloning and setting up locally. The sub-folder AMHARIC
contain training
and testing
files for our model. Both files contain similar file structure.
wav/
: a folder containing all audio filestext
: file contining the metadata (audio file name and cropsonding transcription)spk2utt
,trsTest.txt
,utt2spk
,wav.scp
: these are files provided with the dataset, Currently they don't have a purpose but could be used for future analysis.
- Preprocessing.ipynb: all the data preprocessing done here before model training.
- data_cleaning.py: contain all the data cleaning and modularizing functions.
- data_viz.py: contain all the visualization related functions.