Inverse Reinforcement Learning on Minigrid

The aim of this project is to provide a tool to train an agent on Minigrid. The human player can make game demonstrations and then the agent is trained from these demonstrations using Inverse Reinforcement Learning techniques.

The IRL algorithms are based on the following paper: Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations [1].

MiniGrid environment

Gym-minigrid [2] is a minimalistic gridworld package for OpenAI Gym.

There are many different environments, you can see some examples below.

The red triangle represents the agent that can move within the environment, while the green square (usually) represents the goal. There may also be other objects that the agent can interact with (doors, keys, etc.) each with a different color.

Graphical Application

The graphical interface allows the user to create, order and manage a set of games n order to create an agent that shows a desired behavior. Below you can see the application windows.

Initial window

Choose an environment to use

Agents management

Browse list of created agents

New agent

Add demonstrations and create a new agent

Agent details

Check trained agent

Neural Networks

Reward Neural Network

Architecture of the Reward Neural Network:

input: MiniGrid observation
output: reward

Trained with T-REX loss. [1]

Policy Neural Network

Architecture of the Policy Neural Network:

input: MiniGrid observation
output: probability distribution of the actions

Trained with loss: -log(action_probability) * discounted_reward

Experiments & results

We made a set of demonstrations to try to get the desired behavior shown on the left in the image below.

Next, the heatmaps of the rewards given by the trained reward network are shown. The different heatmaps represent different directions of the agent, in order: up, right, down, left.

Run the project

go to the directory in which you have downloaded the project
go inside Minigrid_HCI-project folder with the command: cd Minigrid_HCI-project
run the application with the command python agents_window.py

References

[1] Daniel S. Brown, Wonjoon Goo, Prabhat Nagarajan, Scott Niekum. Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations. (Jul 2019) T-REX

[2] Chevalier-Boisvert, Maxime and Willems, Lucas and Pal, Suman. Minimalistic Gridworld Environment for OpenAI Gym, (2018) GitHub repository Gym-minigrid

Name		Name	Last commit message	Last commit date
Latest commit History 159 Commits
figures		figures
img		img
pdf		pdf
policy_nets		policy_nets
reward_nets		reward_nets
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
Ui_main_scrollbar.py		Ui_main_scrollbar.py
Ui_newGame.py		Ui_newGame.py
Ui_scrollbar_v2.py		Ui_scrollbar_v2.py
agent_detail.ui		agent_detail.ui
agent_detail_ui.py		agent_detail_ui.py
agent_detail_window.py		agent_detail_window.py
agents.ui		agents.ui
agents_model.py		agents_model.py
agents_ui.py		agents_ui.py
agents_window.py		agents_window.py
autogenerate_game.py		autogenerate_game.py
games_model.py		games_model.py
games_view.py		games_view.py
games_window.py		games_window.py
heatmap.py		heatmap.py
main_scrollbar.ui		main_scrollbar.ui
minigrid_GUI.py		minigrid_GUI.py
newGame.ui		newGame.ui
play_minigrid.py		play_minigrid.py
plot_rewards.py		plot_rewards.py
scrollbar_v2.ui		scrollbar_v2.ui
train_policy_net.py		train_policy_net.py
train_reward_net.py		train_reward_net.py
trainer.py		trainer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Inverse Reinforcement Learning on Minigrid

MiniGrid environment