Skip to content

Gemini Engineer leverages the power of the new Gemini to create a real-time audio and text assistant.

License

Notifications You must be signed in to change notification settings

mett29/gemini-engineer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gemini Engineer

Description

Inspired by:

An AI assistant built upon the new Gemini 2.0 Flash to allow for real-time audio and text interaction.

Tested on WSL2 Ubuntu 20.04

Important: use headphones.

This script uses the system default audio input and output, which often won't include echo cancellation. So to prevent the model from interrupting itself it is important that you use headphones.

Web Interface

web interface

The mode option works as follows:

  • TEXT: when you press the mic button you can start talking and Gemini will answer by text.
  • AUDIO: when you press the mic button you can start talking and Gemini will answer by voice.

Installation

sudo apt install libasound2-plugins

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and setup
git clone https://github.com/mett29/gemini-engineer.git
cd gemini-engineer
uv sync
source .venv/bin/activate

# Run web interface
python main.py

You can also directly run the CLI:

python src/gemini_engineer.py

About

Gemini Engineer leverages the power of the new Gemini to create a real-time audio and text assistant.

Topics

Resources

License

Stars

Watchers

Forks

Languages