This project implements a reinforcement learning framework based on the Paired Open-Ended Trailblazer (POET) algorithm. The goal is to train a 2D car agent to navigate dynamically evolving obstacle courses while simultaneously evolving the environments for progressive learning.
The environment is built using Box2D and Pygame, and the agent is trained using Double Deep Q-Learning (DDQN). The POET algorithm is adapted to handle 2D car-specific challenges, such as ramps, bumps, and holes. The algorithm causes the obstacles we choose to change their size so that the agent improves its performance.
This project was developed for the Reinforcement Learning course A.Y. 2024/2025.
- Vehicle Control:
- Actions: 9 discrete combinations of steering and acceleration.
- Observations: LiDAR-based sensor data and car's position/velocity.
- Obstacle types: ramps, holes, and bumps.
- Mutations: environments evolve dynamically with increasing difficulty.
- Rendered using Pygame with customizable textures.
- Co-evolution: simultaneous evolution of the agent's policies and the difficulties of the environment, in fact POET allows environments to evolve dynamically. The obstacles created are adapted to the agent in its environment having characteristics not too easy and not too difficult and try to improve the performance of the machine by placing obstacles with an adequate and increasing difficulty. The algorithm also manages the location of obstacle creation, to avoid them being generated at coincident points.
- Novelty ranking: promotes diversity by evaluating the novelty of new environments.
- Minimal Criterion (MC): ensures that environments meet specific performance thresholds before being added, allowing these thresholds to dynamically vary in fixed ranges.
- Policy Transfer: transfers learned policies between environments to tackle harder challenges.
- Neural network-based policy optimization with:
- Batch normalization and dropout for stable training.
- Experience replay for sample efficiency.
- the current Q-network is used to choose the actions
- the older Q-network is used to evaluate the actions
Open-ended_Curriculum_Learning_POET_2D_Car_obstacle-course_domain/
├── main.py
├── env.py
├── model.py
├── poet.py
├── utils.py
├── textures/
│ ├── background.png
│ └── wheel.png
├── requirements.txt
├── README.md
└── LICENSE
- Defines the
CarEnvironment
class:- Implements the simulation logic using Box2D.
- Handles obstacles, rendering, and reward calculation.
- Includes car dynamics and LiDAR data collection.
- Implements the POET algorithm:
main_loop
: main POET training and evolution loop. In this it calls the main functions and performs these activities:- create new environments from the active ones
- improve the agents in their environments
- try to move the current agents from one environment to another
mutate_envs
: mutates environments and evaluates offspring.evaluate_candidates
: updates policies using neural network and the DDQN algorithm or by Evolution Strategies.
- The Policy Network
- defines Neural network for Q-value estimation
- The Double DQN (DDQN) agent:
- defines Q
- defines training logic, experience replay buffer, and evaluation methods.
- Utility functions for:
- State dict to vector conversion and vice versa.
- Neural network parameter manipulation.
- Entry point for running the project:
- Initializes the environment, agent and POET.
- Executes the POET main loop.
- Visualizes and evaluates trained agents and environments.
- Python 3.9.6
- Dependencies:
torch
numpy
matplotlib
pygame
gymnasium
Box2D
scipy
scikit-learn
tqdm
-
Clone this repository:
git clone https://github.com/GianmarcoDonnesi/Open-ended_Curriculum_Learning_POET_2D_Car_obstacle-course_domain
-
Install the required packages:
pip install -r requirements.txt
-
(Optional) Set up a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt
Run the main script to execute the POET algorithm and train the agent:
python main.py
- The program will:
- Initialize a simple 2D car environment.
- Train the agent using Double DQN.
- Mutate and evaluate new environments.
After training, you can visualize the agent's performance in various environments:
- Use the arrow keys to navigate between environments.
- Press Right Arrow (→) to view the next environment.
- Press Left Arrow (←) to view the previous environment.
To profile the execution:
python main.py --profile
- Initialize Environment: create a basic 2D car simulation.
- Agent Training: train the car agent using Double DQN on the initial environment.
- Environment Mutation: gradually evolve environments by adding/changing obstacles.
- Policy Transfer: transfer trained policies between environments to tackle harder challenges.
- Evaluation: assess agent performance and log metrics.
The project evaluates the agent's performance using the following metrics:
- Mean Reward per Environment: average cumulative reward the agent receives across episodes.
- Standard Deviation of Rewards: measures the variability in rewards, indicating consistency.
- Minimum and Maximum Rewards: tracks the best and worst performance across episodes.
- Mean and Standard Deviation of Steps per Episode: indicates how efficiently the agent completes tasks.
- Mean, Minimum, and Maximum Final Positions: assesses how far the agent travels in the environment.
-
R. Wang, J. Lehman, J. Clune, and K. O. Stanley
"Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions"
arXiv:1901.01753v3, 2019.
https://arxiv.org/abs/1901.01753 -
Roberto Capobianco
More off-policy & DQN.pdf, Lecture Notes, 2024. -
RL Practical Github
GitHub Repository for Reinforcement Learning Course, 2024.
https://github.com/KRLGroup/RL_2024.git
For questions, feedback, or contributions, feel free to get in touch with us:
Name | Email Address |
---|---|
Gianmarco Donnesi | donnesi.2152311@studenti.uniroma1.it |
Michael Corelli | corelli.1938627@studenti.uniroma1.it |
This project is licensed under the GPL-3.0 License. See the file for more details.