Speech Slicing

A repository for extracting setence-by-sentence speech files from audio files or video files. This repository utilizes Whisper V3 and GPT-4o-mini to extract speech files.

Usage

Environment

Install torch 2.0+ compatible to your environment, e.g.:

conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=12.1 -c pytorch -c nvidia

or

pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121

Install ffmpeg. If you are using conda, run

conda install conda-forge::ffmpeg

Install the other dependencies:

pip install -r requirements.txt

OpenAI API Key

Fill api_key.txt with your own OpenAI API Key.

Inference

python slice_speech.py --input_path ./YOUR_INPUT_PATH --extension YOUR_FILES_EXTENSION

For further details, check the help messages by python slice_speech.py --help.

Implementations

Audio Slicing with Sliding
Recursive Whisper
LLM Merging
Postprocessing with length and VAD

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
SpeechSlicer.py		SpeechSlicer.py
api_key.txt		api_key.txt
overview_1.png		overview_1.png
overview_2.png		overview_2.png
overview_3.png		overview_3.png
requirements.txt		requirements.txt
slice_speech.py		slice_speech.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Slicing

Usage

Environment

OpenAI API Key

Inference

Implementations

About

Releases

Packages

Languages

neosapience/SpeechSlicer

Folders and files

Latest commit

History

Repository files navigation

Speech Slicing

Usage

Environment

OpenAI API Key

Inference

Implementations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages