This Python program performs real-time object detection using a webcam feed, leveraging the YOLOv8 model. It provides audio feedback for detected objects and offers directional guidance to avoid obstacles.
- Real-time object detection using YOLOv8
- Audio feedback for detected objects
- Directional guidance to avoid obstacles
- CUDA support for GPU acceleration
- Python 3.6+
- CUDA-capable GPU (optional, for improved performance)
- OpenCV (cv2)
- Ultralytics YOLO
- PyTorch
- pyttsx3
-
Clone this repository or download the script.
-
Install the required dependencies:
pip install opencv-python ultralytics torch pyttsx3
-
Download the YOLOv8 model weights:
- The script uses
yolov8n.pt
by default. You can download it from the Ultralytics YOLO repository. - Place the model file in the same directory as the script.
- The script uses
To enable CUDA for GPU acceleration:
- Ensure you have a CUDA-capable GPU.
- Install the CUDA Toolkit from the NVIDIA website.
- Install the cuDNN library from the NVIDIA Developer website.
- Install the CUDA-enabled version of PyTorch:
(Replace
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
cu118
with your CUDA version if different)
The script will automatically use CUDA if available.
Want to know more of CUDA installation visit ---My CUDA installation guide
Run the script using Python:
python object-detection.py
- The program will access your default webcam and start detecting objects in real-time.
- Detected objects will be announced via audio feedback.
- Directional guidance will be provided to avoid obstacles.
- Press 'q' to quit the program.
- Adjust the
confidence_threshold
variable to change the detection sensitivity. - Modify the
speech_interval
to change how often audio feedback is provided. - Change the YOLOv8 model by replacing
yolov8n.pt
with other variants likeyolov8s.pt
oryolov8m.pt
for different performance/accuracy trade-offs.
- The script initializes the YOLO model and the text-to-speech engine.
- It captures frames from the webcam in real-time.
- Each frame is processed by the YOLO model for object detection.
- Detected objects are announced via audio, with a cooldown period between announcements.
- The program analyzes the position of detected objects and provides directional guidance to avoid obstacles.
- Audio feedback may overlap if many objects are detected in quick succession.
- The accuracy of object detection depends on the chosen YOLO model and the
confidence_threshold
. - Directional guidance is basic and may not account for complex environments.
Feel free to fork this project and submit pull requests with improvements or bug fixes.