This project focuses on developing an emotion recognition system for Urdu speech. The process begins with organizing the audio dataset and extracting essential features such as MFCCs, Chroma, and Zero Crossing Rate. Data augmentation techniques, including time stretching and pitch shifting, are applied to enhance the dataset. Machine learning models, including CNN, LSTM, and a hybrid CNN-LSTM, are trained and evaluated on the extracted features. The evaluation metrics include accuracy, confusion matrices, and classification reports to analyze model performance.
URDU-Dataset: https://github.com/siddiquelatif/URDU-Dataset
Audio emotion classification
Multiple neural network models: CNN, LSTM, CNN-LSTM
93.75% peak accuracy
Python | TensorFlow | Librosa | scikit-learn | Pandas | NumPy | Matplotlib | Seaborn