A plethora of AI tools are currently available.
This is an effort to collage together a suite of model running programs, and get a voice-to-voice assistant, via voice-to-text, text-to-text, and text-to-voice.
- End to end, voice-to-voice.
- Assistance with getting ROCm drivers and custom builds for whisper.cpp/llama.cpp that support ROCm compatible GPUs
- All models are loaded into RAM/VRAM for quick access.
Benchmarks are located here, you are more than welcome to submit yours.
- 🏃 Load piper into VRAM for persistence (remove model load time)
- ⚙️ Setup piper to use AMD GPU (requires custom builds of underlying libs like onnxruntime)
- 🗣️ More naturalistic responses in the voice output
- 📝 Implement usage of command functionality from whisper.cpp
- 💾 Potentially dockerize
- 🛠️ Fine tuning parameters of various components to optimize processing times
- 🤖 Bots? Bots.
- 🪟 Windows implementation
First, you (probably) need to be on linux. If you're here, you might already know ROCm is primarily supported on Redhat, SUSE, and Debian. What you might not know is other distros, like Arch, do support it through user repos.
You're going to need to have Python 3.11 as the system version for the install. After that, you can change it. The recommended way to handle mutliple python versions is something like pyenv
- Kick off the building of the various components with
./setup.sh;
This script:
- Makes directories that are filled with appropriate models
- Optionally downloads default models (if you skip this, see 1b)
- Pulls in the submodules
- Builds the whisper.cpp and llama.cpp models. For llama.cpp you will probably want to either rebuild with clblast flags if your gpu isn't on the rocm compat list. Check here for a comprehensive list of gpus rocm supports. Use the llvm target that you need, and modify the buildAMD.sh script to get that building for your gpu.
1b. Download models for the program to use if you didn't want defaults.
- llama.cpp: instructions here >>
.gguf
goes intollms
folder - whisper.cpp: instructions here >>
.bin
goes into./whisper.cpp/models
folder - piper: instructions here >>
.onnx
and.onnx.json
go intovoices
folder or for some quick defaults, run
./defaultModels.sh
- Load everything up with
run.sh;
Make sure to use the 'Echo' wakeword so it knows you're talking.
That's it!
whisper.cpp, piper, and llama.cpp are licensed under MIT license.
The Echo mascot image was originally generated with the assistance of DALL·E 3. It was further edited by @JohnnySn0w.
- currently, I have noticed that if the microphone and the output are hooked to the same interface (like a Scarlett DAC) then there's a cutoff/delay at the beginning of the ai speech output. Not sure what's happening there since Pulse should handle that sort of thing, and Discord works fine.