This repository showcases a classification task for Brazilian Portuguese using two model configurations: a 1D Convolutional Neural Network (CNN) combined with Long Short-Term Memory (LSTM), and a standalone 2D CNN. The CNN1D + LSTM model, based on Tostes' work, utilizes a range of frequency values from a spectrogram as input. Meanwhile, the CNN2D model processes images sized at 227x227 pixels.
Convolution 1D with LSTM
![accent](https://private-user-images.githubusercontent.com/46492977/322229278-f8a19345-4f1d-4232-a43c-84c2800d5524.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk1NzY0MDIsIm5iZiI6MTczOTU3NjEwMiwicGF0aCI6Ii80NjQ5Mjk3Ny8zMjIyMjkyNzgtZjhhMTkzNDUtNGYxZC00MjMyLWE0M2MtODRjMjgwMGQ1NTI0LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE0VDIzMzUwMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPThhODNiOTVkODJlYmU2N2E5N2U2MGRkMmJkNjg1ZjNjZGNkMzA5YTU0ZGUwZDkzZTAwNDc1ODQxMmQyMTMxYTImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.FrPntVNz2MeAN3_93oDUdh2mwQFM6fCQW_wJn5wjIO8)
Convolution 2D
![cnn2d](https://private-user-images.githubusercontent.com/46492977/322229617-2ac9a513-25c0-4813-83e3-98e71d0807af.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk1NzY0MDIsIm5iZiI6MTczOTU3NjEwMiwicGF0aCI6Ii80NjQ5Mjk3Ny8zMjIyMjk2MTctMmFjOWE1MTMtMjVjMC00ODEzLTgzZTMtOThlNzFkMDgwN2FmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTQlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE0VDIzMzUwMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTVkZDk4YTRjN2JmZTFkYzAyOGFmM2UzNTc0MzgxNjRiMjk0ZjY5OTg1NWQ4Nzc5NzY3OGVlMDQwYWU2NzllOTAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.WWUdbCOz9pWaAN5BxdhkxmWdKCq_S7uY6ZNi2KPFB94)
Two subsets of Spotify Podcasts ( Spotify A and Spotify B) more informations is presented in following repository: Spotify Subsets
- Local :
git clone https://github.com/aryamtos/accent-classification-audio.git
pip3 install -r requirements.txt
- Conda Environment 🐍
git clone https://github.com/aryamtos/accent-classification-audio.git
conda create --name myenv
conda install --file requirements.txt
conda list
docker build -t accent:2.0 .
docker images
docker run -it --gpus all --ulimit memlock=-1 --ulimit stack=67108864 -v vol/:/vol/ --name accentBr -d accent:2.0
docker exec -it accentBr /bin/bash