Speech
Translation
Synthesis of Speech
This repository can be used to translate any video to english. It uses OpenAI's whisper to transcribe and translate the video. Then it uses coqui AI's TTS to synthesize the translated transcription. Finally, it combines subtitles and the translated audio with the video.
It was originally built to translate the lectures and exercises of Introduction to AI in the winter term of 2022/23 at Technische Universität Darmstadt.
The course is offered by the Artificial Intelligence and Machine Learning Lab.
Please contact me if you have any suggestions david.pirkl@stud.tu-darmstadt.de.
You can use the Dockerfile for a simple build. This takes care of all requirements/dependencies. You might want to change the base image, depending on the machine you are using. You have to configure a shared folder at /project/data to access the generated files. A simple command to run the image would be:
docker run --rm -v local_dir:/project/data --name lecture_sts name_of_your_docker_image:latest
If you do not want to use docker, you also can perform the steps manually. You need to install:
-
Python3
-
Pip
-
Git
-
FFmpeg
# on MacOS using Homebrew: (https://brew.sh/) brew install ffmpeg # on Ubuntu/Debian: sudo apt update && sudo apt install ffmpeg
Clone the repository:
git clone https://github.com/dpirkl/lecture-sts.git
Afterwards, move to the lecture-sts folder and install the python requirements via:
pip install -r requirements.txt
This translation works best for videos with a lot of speech and few pauses. There is only one voice, so multiple speakers can be confusing. The quality of the translation/transcription is dependent on whisper's performance for the specific language. Check out their repository for more information.
You should make sure, you have the correct folder structure. To do this and download the models you can use this command:
python3 setup.py
Move your videos to the original-video folder.
To translate the audio of the video to english and add subtitles to it, you can run:
python3 translate_lecture.py
Or if you want to keep the original audio and just add english subtitles, run:
python3 subtitles_en.py
Subtitles in english and the original language:
python3 subtitles_en_original.py
To see more details of the process, you can add the -v/--verbose flag the command.
python3 subtitles_en.py -v
To disable rtpt:
python3 subtitles_en_original.py --disable_rtpt
Whether to use stored whisper results:
python3 translate_lecture.py --no_cache
You can control the use of cuda via the --disable cuda flag:
python3 translate_lecture.py --disable_cuda
You can specify the maximum duration of a text segment in seconds for TTS or disable the maximum duration:
python3 translate_lecture.py --max_segment_duration 30
python3 translate_lecture.py --diable_max_duration
|- data/
|- audio/
|- audio-translated/
|- audio-translated-speed/
|- subtitles/ (subtitle files are saved here)
|- variables/ (to avoid reprocessing)
|- video-original/ (where the original videos go)
|- video-original-subtitles/ (original videos with subtitles)
|- video-translated/ (translated videos without subtitles)
|- video-translated-subtitles (translated videos with subtitles)
|- video-without-audio/
|- src/
|- silence.py
|- tts_wrapper.py
|- whisper_wrapper.py
|- utils
|- file_handler.py
|- path_handler.py
|- setup.py
|- subtitles_en.py
|- subtitles_en_original.py
|- translate_lecture.py
...