Skip to content

Voice-to-voice personal assistant, Full-local

Notifications You must be signed in to change notification settings

JohnnySn0w/Echo

Repository files navigation

Emoji Oread, Echo

— Echo —

A completely-local-compute, AI assistant

A plethora of AI tools are currently available.

This is an effort to collage together a suite of model running programs, and get a voice-to-voice assistant, via voice-to-text, text-to-text, and text-to-voice.

The grand goal of this project is to further the goal of a voice-to-voice interface. One that understands the multitude of contexts you exist in.


Current capabilities

  • End to end, voice-to-voice.
  • Assistance with getting ROCm drivers and custom builds for whisper.cpp/llama.cpp that support ROCm compatible GPUs
  • All models are loaded into RAM/VRAM for quick access.

Benchmarks are located here, you are more than welcome to submit yours.

Goals

  • 🏃 Load piper into VRAM for persistence (remove model load time)
  • ⚙️ Setup piper to use AMD GPU (requires custom builds of underlying libs like onnxruntime)
  • 🗣️ More naturalistic responses in the voice output
  • 📝 Implement usage of command functionality from whisper.cpp
  • 💾 Potentially dockerize
  • 🛠️ Fine tuning parameters of various components to optimize processing times
  • 🤖 Bots? Bots.
  • 🪟 Windows implementation

Setup

Prerequisites

First, you (probably) need to be on linux. If you're here, you might already know ROCm is primarily supported on Redhat, SUSE, and Debian. What you might not know is other distros, like Arch, do support it through user repos.

You're going to need to have Python 3.11 as the system version for the install. After that, you can change it. The recommended way to handle mutliple python versions is something like pyenv

Build & Ship

  1. Kick off the building of the various components with
./setup.sh;

This script:

  • Makes directories that are filled with appropriate models
  • Optionally downloads default models (if you skip this, see 1b)
  • Pulls in the submodules
  • Builds the whisper.cpp and llama.cpp models. For llama.cpp you will probably want to either rebuild with clblast flags if your gpu isn't on the rocm compat list. Check here for a comprehensive list of gpus rocm supports. Use the llvm target that you need, and modify the buildAMD.sh script to get that building for your gpu.

1b. Download models for the program to use if you didn't want defaults.

./defaultModels.sh
  1. Load everything up with
run.sh;

Make sure to use the 'Echo' wakeword so it knows you're talking.

That's it!

Licensing

whisper.cpp, piper, and llama.cpp are licensed under MIT license.

The Echo mascot image was originally generated with the assistance of DALL·E 3. It was further edited by @JohnnySn0w.

Bugs

  • currently, I have noticed that if the microphone and the output are hooked to the same interface (like a Scarlett DAC) then there's a cutoff/delay at the beginning of the ai speech output. Not sure what's happening there since Pulse should handle that sort of thing, and Discord works fine.

Emoji Oread, Echo

About

Voice-to-voice personal assistant, Full-local

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published