This project combines audio transcription capabilities with AI-powered interaction, allowing users to transcribe audio and engage in conversations with an AI model.
- Audio transcription using Whisper
- Integration with Ollama for AI-powered conversations
- Real-time visual analysis using computer vision and AI
- Describe objects, scenes, and activities captured by webcam or screenshot
- Python 3.8+
- See
requirements.txt
for a full list of dependencies
- Clone the repository
- Create and activate a Python virtual environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
- Install the required packages:
pip install -r requirements.txt
- Install Ollama following the instructions at ollama.ai
- Ensure your virtual environment is activated
- Run the main script:
python bobo.py
- Ask Bobo to tell you what they see or look at your screen!
- Customize AI model prompts as needed
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License
- OpenAI Whisper
- Faster Whisper
- Ollama
- All other open-source libraries used in this project