This repository showcases a classification task for Brazilian Portuguese using two model configurations: a 1D Convolutional Neural Network (CNN) combined with Long Short-Term Memory (LSTM), and a standalone 2D CNN. The CNN1D + LSTM model, based on Tostes' work, utilizes a range of frequency values from a spectrogram as input. Meanwhile, the CNN2D model processes images sized at 227x227 pixels.
Convolution 1D with LSTM
![accent](https://private-user-images.githubusercontent.com/46492977/322229278-f8a19345-4f1d-4232-a43c-84c2800d5524.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk1MjYwMjQsIm5iZiI6MTczOTUyNTcyNCwicGF0aCI6Ii80NjQ5Mjk3Ny8zMjIyMjkyNzgtZjhhMTkzNDUtNGYxZC00MjMyLWE0M2MtODRjMjgwMGQ1NTI0LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE0VDA5MzUyNFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTcwMzE3ZjBkNDFhNmRiN2E5Nzk3ZGE3MGM3MDk5YTY0NDQ3MzlmMTEwMjU5ZjRkOTgzYzdkNDEwZWMyZDA0ZDkmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.DdJpxszkTVrxjsMVqTLnVjlS5QkSr5KNc-_RjVdz3jE)
Convolution 2D
![cnn2d](https://private-user-images.githubusercontent.com/46492977/322229617-2ac9a513-25c0-4813-83e3-98e71d0807af.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk1MjYwMjQsIm5iZiI6MTczOTUyNTcyNCwicGF0aCI6Ii80NjQ5Mjk3Ny8zMjIyMjk2MTctMmFjOWE1MTMtMjVjMC00ODEzLTgzZTMtOThlNzFkMDgwN2FmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE0VDA5MzUyNFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTg2NDFkNmU2NTBhOWI1ZTRkNTlmNDMyOTlmOWZjYjMwNjQyYWJjNWFiZTcwYTA2MGRkOWU4MjQ2YmY5ZTI2YzkmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.WdddOwVAT0fg3O1wRhS-tD2C3wXpxbTYTpwnDjzSdEI)
Two subsets of Spotify Podcasts ( Spotify A and Spotify B) more informations is presented in following repository: Spotify Subsets
- Local :
git clone https://github.com/aryamtos/accent-classification-audio.git
pip3 install -r requirements.txt
- Conda Environment 🐍
git clone https://github.com/aryamtos/accent-classification-audio.git
conda create --name myenv
conda install --file requirements.txt
conda list
docker build -t accent:2.0 .
docker images
docker run -it --gpus all --ulimit memlock=-1 --ulimit stack=67108864 -v vol/:/vol/ --name accentBr -d accent:2.0
docker exec -it accentBr /bin/bash