Whisper Demo

The purpose of this project is to evaluate OpenAI's Whisper library for transcribing audio into text for use in Automatic Speech Recognition (ASR) applications.

Reference: https://github.com/openai/whisper

This project is intended to be built as a Docker container and run via Docker Compose. Both the Dockerfile and docker-compose.yml files are available in the project root directory.

Whisper models are persisted in a Docker Volume, so that they do not need to be re-downloaded each time the container starts up, because the models can be rather large.

IMPORTANT: Whisper must be run on a GPU! Although whisper can run on a CPU, it will run excessively slow regardless of speed or number of cores. Make sure a GPU is installed and available to Docker before running Whisper (see prerequsities section for more details). An Nvidia GPU with CUDA support is strongly preferred as it offers the best performance for Whisper as explained here.

Prerequisites

Nvidia GPU is available with sufficient VRAM for the chosen model (~5GB for medium model). Refer to the OpenAI Whisper documentation for specific requirements.
Nvidia drivers are installed on the host machine (not in WSL)
Nvidia CUDA toolkit is installed and available. Refer to this Gist for more infromation.

Build

Build the Docker container
```
docker build -t kingand/whisper-demo .
```

Run

Create an .env file next to docker-compose.yml to specify the input and output directories
```
INPUT_DIR=/path/to/input/audio
OUTPUT_DIR=/path/to/output/text
```
Ensure the both the input and output directories exist on the host machine with the appropriate permissions for Docker to access
Start up the container(s) via Docker Compose
```
docker compose up
```

Transcription will begin automatically upon container startup using the whisper model and verbosity settings specified in docker-compose.yml (see command section).

To change which Whisper model is used for transcription, modify the command section of docker-compose.yml before running docker compose up.

Docker Compose will automatically shut down the containers when the transcription is complete.

Benchmark

A benchmark script is also available in this project, which repeats the transcription using each Whisper model.

To run the benchmark, modify the command section of docker-compose.yml to specify benchmark.sh instead of transcribe.sh before running docker compose up.

WARNING: Larger models require a significant amount of VRAM. If sufficient VRAM is not available, the container will hang and may cause Docker to become unstable as well. Consult OpenAI's documentation and ensure sufficient VRAM is available before running the benchmark. If needed, larger models can be excluded from the benchmark by commenting them out in benchmark.sh. In this case, the container will need to be rebuilt to pick up the changes to benchmark.sh.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
bin		bin
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper Demo

Prerequisites

Build

Run

Benchmark

About

Releases

Packages

Languages

License

kingand/whisper-demo

Folders and files

Latest commit

History

Repository files navigation

Whisper Demo

Prerequisites

Build

Run

Benchmark

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages