Open-ended Curriculum Learning with POET in a 2D Car Obstacle-Course Domain

Overview

This project implements a reinforcement learning framework based on the Paired Open-Ended Trailblazer (POET) algorithm. The goal is to train a 2D car agent to navigate dynamically evolving obstacle courses while simultaneously evolving the environments for progressive learning.

The environment is built using Box2D and Pygame, and the agent is trained using Double Deep Q-Learning (DDQN). The POET algorithm is adapted to handle 2D car-specific challenges, such as ramps, bumps, and holes. The algorithm causes the obstacles we choose to change their size so that the agent improves its performance.

This project was developed for the Reinforcement Learning course A.Y. 2024/2025.

Key Features

1. Agent

Vehicle Control:
- Actions: 9 discrete combinations of steering and acceleration.
- Observations: LiDAR-based sensor data and car's position/velocity.

2. Environment

Obstacle types: ramps, holes, and bumps.
Mutations: environments evolve dynamically with increasing difficulty.
Rendered using Pygame with customizable textures.

3. POET Algorithm

Co-evolution: simultaneous evolution of the agent's policies and the difficulties of the environment, in fact POET allows environments to evolve dynamically. The obstacles created are adapted to the agent in its environment having characteristics not too easy and not too difficult and try to improve the performance of the machine by placing obstacles with an adequate and increasing difficulty. The algorithm also manages the location of obstacle creation, to avoid them being generated at coincident points.
Novelty ranking: promotes diversity by evaluating the novelty of new environments.
Minimal Criterion (MC): ensures that environments meet specific performance thresholds before being added, allowing these thresholds to dynamically vary in fixed ranges.
Policy Transfer: transfers learned policies between environments to tackle harder challenges.

4. Policy Network

Neural network-based policy optimization with:
- Batch normalization and dropout for stable training.
- Experience replay for sample efficiency.

5. Double DQN

the current Q-network is used to choose the actions
the older Q-network is used to evaluate the actions

File Structure

Open-ended_Curriculum_Learning_POET_2D_Car_obstacle-course_domain/
├── main.py
├── env.py
├── model.py
├── poet.py
├── utils.py
├── textures/
│   ├── background.png
│   └── wheel.png
├── requirements.txt
├── README.md
└── LICENSE

`env.py`

Defines the CarEnvironment class:
- Implements the simulation logic using Box2D.
- Handles obstacles, rendering, and reward calculation.
Includes car dynamics and LiDAR data collection.

`poet.py`

Implements the POET algorithm:
- main_loop: main POET training and evolution loop. In this it calls the main functions and performs these activities:
  - create new environments from the active ones
  - improve the agents in their environments
  - try to move the current agents from one environment to another
- mutate_envs: mutates environments and evaluates offspring.
- evaluate_candidates: updates policies using neural network and the DDQN algorithm or by Evolution Strategies.

`model.py`

The Policy Network
- defines Neural network for Q-value estimation
The Double DQN (DDQN) agent:
- defines Q
- defines training logic, experience replay buffer, and evaluation methods.

`utils.py`

Utility functions for:
- State dict to vector conversion and vice versa.
- Neural network parameter manipulation.

`main.py`

Entry point for running the project:
- Initializes the environment, agent and POET.
- Executes the POET main loop.
- Visualizes and evaluates trained agents and environments.

Setup and Installation

Requirements

Python 3.9.6
Dependencies:
- torch
- numpy
- matplotlib
- pygame
- gymnasium
- Box2D
- scipy
- scikit-learn
- tqdm

Installation

Clone this repository:

git clone https://github.com/GianmarcoDonnesi/Open-ended_Curriculum_Learning_POET_2D_Car_obstacle-course_domain

Install the required packages:
```
pip install -r requirements.txt
```

(Optional) Set up a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

How to Run

1. Training the Agent

Run the main script to execute the POET algorithm and train the agent:

python main.py

The program will:
1. Initialize a simple 2D car environment.
2. Train the agent using Double DQN.
3. Mutate and evaluate new environments.

2. Visualizing the Trained Agent

After training, you can visualize the agent's performance in various environments:

Use the arrow keys to navigate between environments.
Press Right Arrow (→) to view the next environment.
Press Left Arrow (←) to view the previous environment.

3. Profiling (Optional)

To profile the execution:

python main.py --profile

Project Workflow

Initialize Environment: create a basic 2D car simulation.
Agent Training: train the car agent using Double DQN on the initial environment.
Environment Mutation: gradually evolve environments by adding/changing obstacles.
Policy Transfer: transfer trained policies between environments to tackle harder challenges.
Evaluation: assess agent performance and log metrics.

Key Metrics

The project evaluates the agent's performance using the following metrics:

Mean Reward per Environment: average cumulative reward the agent receives across episodes.
Standard Deviation of Rewards: measures the variability in rewards, indicating consistency.
Minimum and Maximum Rewards: tracks the best and worst performance across episodes.
Mean and Standard Deviation of Steps per Episode: indicates how efficiently the agent completes tasks.
Mean, Minimum, and Maximum Final Positions: assesses how far the agent travels in the environment.

References

R. Wang, J. Lehman, J. Clune, and K. O. Stanley
"Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions"
arXiv:1901.01753v3, 2019.
https://arxiv.org/abs/1901.01753
Roberto Capobianco
More off-policy & DQN.pdf, Lecture Notes, 2024.
RL Practical Github
GitHub Repository for Reinforcement Learning Course, 2024.
https://github.com/KRLGroup/RL_2024.git

Contacts

For questions, feedback, or contributions, feel free to get in touch with us:

Name	Email Address
Gianmarco Donnesi	donnesi.2152311@studenti.uniroma1.it
Michael Corelli	corelli.1938627@studenti.uniroma1.it

License

This project is licensed under the GPL-3.0 License. See the file for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open-ended Curriculum Learning with POET in a 2D Car Obstacle-Course Domain

Overview

Key Features

1. Agent

2. Environment

3. POET Algorithm

4. Policy Network

5. Double DQN

File Structure

`env.py`

`poet.py`

`model.py`

`utils.py`

`main.py`

Setup and Installation

Requirements

Installation

How to Run

1. Training the Agent

2. Visualizing the Trained Agent

3. Profiling (Optional)

Project Workflow

Key Metrics

References

Contacts

License

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
textures		textures
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
env.py		env.py
main.py		main.py
model.py		model.py
poet.py		poet.py
requirements.txt		requirements.txt
utils.py		utils.py

License

GianmarcoDonnesi/Open-ended_Curriculum_Learning_POET_2D_Car_obstacle-course_domain

Folders and files

Latest commit

History

Repository files navigation

Open-ended Curriculum Learning with POET in a 2D Car Obstacle-Course Domain

Overview

Key Features

1. Agent

2. Environment

3. POET Algorithm

4. Policy Network

5. Double DQN

File Structure

env.py

poet.py

model.py

utils.py

main.py

Setup and Installation

Requirements

Installation

How to Run

1. Training the Agent

2. Visualizing the Trained Agent

3. Profiling (Optional)

Project Workflow

Key Metrics

References

Contacts

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

`env.py`

`poet.py`

`model.py`

`utils.py`

`main.py`