Please avoid using it at the moment and patiently await further updates.
Anticipated completion in the next couple of months. π π
Ensure that pyenv
is installed on your local machine before proceeding.
To make use of this project, follow the steps below:
pyenv virtualenv 3.10 toxic-audio
Activate the virtual environment before working on the project.
pyenv local toxic-audio
Install the dependencies
make develop
Create a .env
file in the root directory of the project. Add the necessary environment variables; refer to .env.example
for the required variables.
# .env
KAGGLE_USERNAME=your_kaggle_username
KAGGLE_KEY=your_kaggle_api_key
DATASET_RAW=fangfangz/audio-based-violence-detection-dataset
PREFECT_API_DATABASE_CONNECTION_URL=postgresql+asyncpg://postgres:postgres@database:5432/prefect
PREFECT_API_URL=http://127.0.0.1:4200/api
MLFLOW_TRACKING_URI=http://localhost:5000
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
POSTGRES_DB=prefect
To fetch the raw data, execute the following command:
python src/scripts/fetch_raw_data.py
- use git bash as of now
- change the token as follows in line 8
c.NotebookApp.token = ""
file locationtoxic-audio-detection\conf\.jupyter\jupyter_notebook_config.py
- save it and build the docker image
- run the container from git bash
tadah!! it works right? π₯³
Run a PowerShell terminal as an administrator Run the following command:
Invoke-WebRequest -UseBasicParsing -Uri "https://raw.githubusercontent.com/pyenv-win/pyenv-win/master/pyenv-win/install-pyenv-win.ps1" -OutFile "./install-pyenv-win.ps1"; &"./install-pyenv-win.ps1"
If you are getting any UnauthorizedAccess error Run the following command:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope LocalMachine
Now re-run the above installation command. Installation complete!
After installation make sure that the pyenv variables are set in your machine environment variables.
Windows > Edit the system environment variables > Environment Variables
and check for the variables PYENV, PYENV_HOME, PYENV_ROOT.
The next step is to uninstall your current version of Python in Windows.
Now let's disable aliases related to Python
Windows > Manage application execution aliases
And disable all aliases related to Python
Pyenv for Windows is ready to be used! Please refer to this resource for the Windows pyenv commands.
https://pypi.org/project/pyenv-win/#validate
Let's install the python version for this project Open a CMD in the project toxic-audio-detection Run the following command:
pyenv install 3.10.0
Change the current python version using
pyenv global 3.10.0
pyenv local 3.10.0
Test your changes checking the python version on your shell. Output should be 3.10.0
pyenv shell 3.10.0
python -V
C:\Users\bhair\Desktop\canada_study_permit\Class_AIML\term_2\AML_2404_Lab\git_repo\toxic-audio-detection\conf.jupyter\jupyter_notebook_config.py
Download mingw-get for Windows https://sourceforge.net/projects/mingw/files/Installer/mingw-get-setup.exe/download
Run the installer and make sure the installation path has no whitespaces (Example: C:\MinGW
)
Close the installer
Set mingw in your machine environment variables. Open:
Windows > Edit the system environment variables > Environment Variables
Edit the Path user variable and add the following path
C:\MinGW\bin
Save and close.
Open a terminal and run the following command
mingw-get install mingw32-make
Installation complete! For more help, watch the following step by step video. https://www.youtube.com/watch?v=taCJhnBXG_w
- open your powershell with admin privileges
- run the following command
Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))
- close the powershell and open it again (allowing it to update)
- run
choco install make
- it will automatically set the PATH in environment variables
- go to the project repository and run
make build_notebook
and thenmake start_notebook
Currently --> it allows me to workon jupyter server, but the notebook is not inline with the common repo (updated_06_feb_24_evening)
When working on Windows you have to clone the repository using the following command (to avoid Unix chars and makefile issues)
git clone https://github.com/NILodio/toxic-audio-detection.git --config core.autocrlf=input
https://www.docker.com/products/docker-desktop/
Now that we have pyenv, make, and the project correctly cloned for Windows we can proceed to install dependencies and build our Docker container.
Open Docker Desktop and minimize it Run the following commands in the project folder
pyenv install 3.10.0
pyenv global 3.10.0
pyenv local 3.10.0
mingw32-make install
mingw32-make develop
mingw32-make build_notebook
mingw32-make start_notebook
Open the container port: http://localhost:8888/
When asked for a token insert: 123
Now you have access to the jupyter notebooks!
Go to Kaggle.com and open your settings
On the API section Create New Token. A json file name kaggle.json
should start downloading.
Create a new .env
file in the project repository, copying it from the example.env
file.
Replace your user and token from the kaggle.json
file into the new .env
file.
Add or replace the kaggle.json file into the following path of your machine.
C://Users//<Your username>//.kaggle//kaggle.json
Open a terminal in the project folder and run the following command:
python src/scripts/fetch_raw_data.py
The data should start downloading under /data
within the project repository.
Important: you should be on the project's Python version using pyenv, and with the version set globally and locally. Check previous pyenv steps.
The full setup consists of three steps:
-
Training - A training script trains a model for the Threat dataset with sklearn, training is orchestrated by prefect and the models metrics and artifacts (the actual models) are uploaded to mlflow.
-
Serving - The model is pulled and FastAPI delivers the prediction, a streamlit app serves as the user interface.
The individual services are packaged as docker containers and setup with docker compose.
Prerequisite: Install Docker
Start docker compose (from project folder)
docker-compose --profile all build
or
make build
Access individual services
- Prefect
http://localhost:4200
- Agent
http://localhost:4200
- Minio
http://localhost:9000
- Cli
bash
- Mlflow
http://localhost:5000
- FastAPI (to test)
http://localhost:8086/docs
- Streamlit UI
http://localhost:8501
Create example model
Run deployment in Prefect UI, deploy model artifacts in mlflow, tag it with "production" in mflow.
Note: The UI will only work if there is one "production" model in mlflow.
docker-compose.yaml
contains the definitions for all services.
For every service it contains the docker image (either through build
if based on a Dockerfile, or through image
if a remote image).
Also it opens the relevant ports within your "docker compose network", so that the services can communicate with each other.
Additionally, a common volume for all containers that use mlflow is created and mounted into /mlruns
. For Prometheus/Grafana a few configuration files are also mounted.
To initialize all services the command docker compose up
can be used from the project folder.
The training script and prefect (for orchestration) are packaged into one service.
The training script is placed under flows
.
The prefect service is defined in docker-compose.yaml
and is based on the Dockerfile
in the prefect
folder. It contains the prefect server and agent. The agent is used to execute the flows. The server is used to monitor the flows and artifacts.
FastAPI is a framework for high-performance API. In this project I implemented a /predict
endpoint. If that endpoint is queried
Please note: Currently the script will fetch the first model that is in production. It won't show any error if there is no model or there are multiple models.
Streamlit is a Python library to rapidly build UIs. The app is very simple and only passes input to the API to retrieve results.
Multiple host machines: Kubernetes
This project is meant to be deployed on a single host machine. In practice, you might want to use Kubernetes to deploy it on multiple instances to gain more isolation and scalability. Kompose could be an option to convert your docker compose file to Kubernetes yaml.
Storage on cloud
All artifacts, logs, etc. are saved locally/on docker volumes. In practice, you would save them to the cloud.
Advanced Security
Security - of course. Authentication, SSL encryption, API authentication and what not. Good example using nginx. Example