Skip to content

vocdex/SpottyAI

Repository files navigation

Spotty: Voice Interface for Boston Dynamics Spot

Natural language interface for Spot robot with vision, navigation, and contextual awareness.

Spotty Logo

Video Demo

Spotty Demo Video

More videos available at GDrive:

  • Visual Question Answering
  • Navigating to kitchen
  • Navigating to KUKA robot
  • Navigating to trash can

Features

Voice Control

  • Wake word activation ("Hey Spot")
  • Speech-to-text via OpenAI Whisper
  • Text-to-speech responses
  • Conversation memory

Navigation

  • GraphNav integration with waypoint labeling
  • Location-based commands ("Go to kitchen")
  • Object search and navigation
  • Automatic scene understanding via GPT-4o-mini + CLIP

Vision

  • Scene description and visual Q&A
  • Object detection and environment mapping
  • Multimodal RAG system for location context

Architecture

  • UnifiedSpotInterface: Main orchestrator
  • GraphNav Interface: Map recording and navigation
  • Audio Interface: Wake word detection, speech processing
  • RAG Annotation: Location/object knowledge base
  • Vision System: Camera processing and interpretation

Uses OpenAI GPT-4o-mini, Whisper, TTS, CLIP, and FAISS vector database.

Setup

Prerequisites

  • Boston Dynamics Spot robot
  • Python 3.8+
  • Boston Dynamics SDK
  • OpenAI and Picovoice API keys

Installation

git clone https://github.com/vocdex/SpottyAI.git
cd spotty
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
pip install -r requirements-optional.txt
pip install -e .

export OPENAI_API_KEY="your_key"
export PICOVOICE_ACCESS_KEY="your_key"

Map Setup

  1. Record environment map:

    python scripts/recording_command_line.py --download-filepath /path/to/map ROBOT_IP
  2. Auto-label waypoints:

    python scripts/label_waypoints.py --map-path /path/to/map --label-method clip --prompts kitchen hallway office
  3. Create RAG database:

    python scripts/label_with_rag.py --map-path /path/to/map --vector-db-path /path/to/database --maybe-label
  4. Visualize setup:

    python scripts/visualize_map.py --map-path /path/to/map --rag-path /path/to/database

Run

python main_interface.py

Usage

Voice Commands:

  • "Hey Spot" → activate
  • "Go to the kitchen" → navigate to location
  • "Find a chair(a mug, a plant, etc.)" → search and navigate to object
  • "What do you see?" → describe surroundings
  • "Stand up" / "Sit down" → basic robot control

Configuration

  • spotty/audio/system_prompts.py - Assistant personality
  • spotty/vision/vision_assistant.py - Vision settings
  • spotty/utils/robot_utils.py - Robot connection

Custom wake words: Create at https://console.picovoice.ai/, update KEYWORD_PATH in spotty/__init__.py

Project Structure

spotty/
├── assets/             # Maps, databases, wake words
├── scripts/            # Setup utilities
├── spotty/
│   ├── annotation/     # Waypoint tools
│   ├── audio/          # Speech processing
│   ├── mapping/        # Navigation
│   ├── vision/         # Computer vision
│   └── utils/          # Shared utilities
└── main_interface.py   # Entry point

Pre-recorded assets( maps, voice activation model, RAG database): Download from Google Drive

License

MIT License

Disclaimer

This project is an open-source implementation of Boston Dynamics Demo and is not affiliated with Boston Dynamics. It is intended for educational and research purposes only. Use at your own risk.

Acknowledgements

  • Big thanks to Automatic Control Chair at FAU for providing the robot for my semester project
  • Many thanks to Boston Dynamics’ engineers for their work on Spot SDK
  • HuggingFace, OpenAI, Facebook Research

About

Control Spot legged robot with audio, build semantic navigation maps and support visual question answering

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages