Whisper Transcriber is a command-line application designed to transcribe audio files using OpenAI's Whisper API. It includes features for language selection, logging, and cleanup of temporary files, with built-in checks for file validity and size before processing.
- Transcribe audio files using OpenAI's Whisper API.
- Language selection for transcription (
fr
for French,en
for English). - Logging for debugging and tracking transcription activities.
- Automatic cleanup of temporary files such as
.pyc
,__pycache__
, etc.
-
Clone the repository:
git clone https://github.com/franckferman/whisper-transcriber.git cd whisper-transcriber
-
Install dependencies with Poetry:
poetry install
-
Alternatively, you can use pip:
pip install -r requirements.txt
To transcribe an audio file, use the following command:
poetry run whisper-transcriber transcribe -f <path_to_audio_file> -k <API_KEY> -l <language_code>
poetry run whisper-transcriber transcribe -f "audio/sample.mp3" -k "your_openai_api_key" -l "en"
-f
,--file
: Path to the audio file to transcribe.-k
,--key
: OpenAI API key for authentication.-l
,--lang
: Language code for transcription (fr
oren
).-o
,--output
: File path to save the transcription output as JSON.--debug
: Enable debug logging, which creates a log filetranscription.log
.
To remove temporary files and logs from the project directory:
poetry run whisper-transcriber clean --log
--log
: Enable logging for the cleanup process.
To test the transcription and cleanup functions, ensure all necessary dependencies are installed:
poetry install --with dev
Then, run tests using your preferred test runner.
- The project uses
black
,flake8
, andmypy
for formatting, linting, and type-checking.- Format code with:
black .
- Lint code with:
flake8 .
- Type-check code with:
mypy .
- Format code with:
This project is licensed under the GNU AGPLv3. See the LICENSE
file for details.