Skip to content

Commit

Permalink
Merge branch 'main' into feat/add-sudoku-environment
Browse files Browse the repository at this point in the history
  • Loading branch information
clement-bonnet authored Jun 1, 2023
2 parents 83e087f + 701dc4e commit d6e98d6
Show file tree
Hide file tree
Showing 45 changed files with 4,825 additions and 12 deletions.
18 changes: 8 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@
| [**Docs**](https://instadeepai.github.io/jumanji)
---


<p float="left" align="center">
<img src="docs/env_anim/connector.gif" alt="Connector" width="30%" />
<img src="docs/env_anim/snake.gif" alt="Snake" width="30%" />
Expand All @@ -28,12 +27,11 @@
<img src="docs/env_anim/bin_pack.gif" alt="BinPack" width="30%" />
<img src="docs/env_anim/cvrp.gif" alt="CVRP" width="30%" />
<img src="docs/env_anim/rubiks_cube.gif" alt="RubiksCube" width="30%" />
<img src="docs/env_anim/graph_coloring.gif" alt="GraphColoring" width="30%" />
<img src="docs/env_anim/game_2048.gif" alt="Game2048" width="30%" />
<img src="docs/env_anim/sudoku.gif" alt="Sudoku" width="30%" />
</p>



## Welcome to the Jungle! 🌴

Jumanji is a suite of diverse and challenging reinforcement learning (RL) environments written in
Expand Down Expand Up @@ -70,7 +68,6 @@ JAX-based environments.
- 🏎️ **Training:** example agents that can be used as inspiration for the agents one may implement
in their research.


## Environments 🌍

Jumanji provides a diverse range of environments ranging from simple games to NP-hard combinatorial
Expand All @@ -79,6 +76,7 @@ problems.
| Environment | Category | Registered Version(s) | Source | Description |
|------------------------------------------|----------|------------------------------------------------------|--------------------------------------------------------------------------------------------------|------------------------------------------------------------------------|
| 🔢 Game2048 | Logic | `Game2048-v1` | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/logic/game_2048/) | [doc](https://instadeepai.github.io/jumanji/environments/game_2048/) |
| 🔵🔗🟡🔗🔴 GraphColoring | Logic | `GraphColoring-v0` | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/logic/graph_coloring/) | [doc](https://instadeepai.github.io/jumanji/environments/graph_coloring/) |
| 💣 Minesweeper | Logic | `Minesweeper-v0` | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/logic/minesweeper/) | [doc](https://instadeepai.github.io/jumanji/environments/minesweeper/) |
| 🎲 RubiksCube | Logic | `RubiksCube-v0`<br/>`RubiksCube-partly-scrambled-v0` | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/logic/rubiks_cube/) | [doc](https://instadeepai.github.io/jumanji/environments/rubiks_cube/) |
| ✏️ Sudoku | Logic | `Sudoku-v0` <br/>`Sudoku-very-easy-v0`| [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/logic/sudoku/) | [doc](https://instadeepai.github.io/jumanji/environments/sudoku/) |
Expand All @@ -89,20 +87,24 @@ problems.
| :link: Connector | Routing | `Connector-v1` | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/connector/) | [doc](https://instadeepai.github.io/jumanji/environments/connector/) |
| 🚚 CVRP (Capacitated Vehicle Routing Problem) | Routing | `CVRP-v1` | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/cvrp/) | [doc](https://instadeepai.github.io/jumanji/environments/cvrp/) |
| :mag: Maze | Routing | `Maze-v0` | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/maze/) | [doc](https://instadeepai.github.io/jumanji/environments/maze/) |
| :robot: RobotWarehouse | Routing | `RobotWarehouse-v0` | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/robot_warehouse/) | [doc](https://instadeepai.github.io/jumanji/environments/robot_warehouse/) |
| 🐍 Snake | Routing | `Snake-v1` | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/snake/) | [doc](https://instadeepai.github.io/jumanji/environments/snake/) |
| 📬 TSP (Travelling Salesman Problem) | Routing | `TSP-v1` | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/tsp/) | [doc](https://instadeepai.github.io/jumanji/environments/tsp/) |


## Installation 🎬

You can install the latest release of Jumanji from PyPI:

```bash
pip install jumanji
```

Alternatively, you can install the latest development version directly from GitHub:

```bash
pip install git+https://github.com/instadeepai/jumanji.git
```

Jumanji has been tested on Python 3.8 and 3.9.
Note that because the installation of JAX differs depending on your hardware accelerator,
we advise users to explicitly install the correct JAX version (see the
Expand All @@ -114,7 +116,6 @@ you will need a GUI backend. For example, on Linux, you can install Tk via:
[Matplotlib backends](https://matplotlib.org/stable/users/explain/backends.html) for a list of
backends you can use.


## Quickstart ⚡

RL practitioners will find Jumanji's interface familiar as it combines the widely adopted
Expand Down Expand Up @@ -171,7 +172,6 @@ the version number is incremented by one to prevent potential confusion.
For a full list of registered versions of each environment, check out
[the documentation](https://instadeepai.github.io/jumanji/environments/tsp/).


## Training 🏎️

To showcase how to train RL agents on Jumanji environments, we provide a random agent and a vanilla
Expand All @@ -192,18 +192,17 @@ actor-critic networks in
For more information on how to use the example agents, see the
[training guide](https://instadeepai.github.io/jumanji/guides/training/).


## Contributing 🤝

Contributions are welcome! See our issue tracker for
[good first issues](https://github.com/instadeepai/jumanji/labels/good%20first%20issue). Please read
our [contributing guidelines](https://github.com/instadeepai/jumanji/blob/main/CONTRIBUTING.md) for
details on how to submit pull requests, our Contributor License Agreement, and community guidelines.


## Citing Jumanji ✏️

If you use Jumanji in your work, please cite the library using:

```
@software{jumanji2023github,
author = {Clément Bonnet and Daniel Luo and Donal Byrne and Sasha Abramowitz
Expand All @@ -217,7 +216,6 @@ If you use Jumanji in your work, please cite the library using:
}
```


## See Also 🔎

Other works have embraced the approach of writing RL environments in JAX.
Expand Down
8 changes: 8 additions & 0 deletions docs/api/environments/graph_coloring.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
::: jumanji.environments.logic.graph_coloring.env.GraphColoring
selection:
members:
- __init__
- reset
- step
- observation_spec
- action_spec
8 changes: 8 additions & 0 deletions docs/api/environments/rware.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
::: jumanji.environments.routing.robot_warehouse.env.RobotWarehouse
selection:
members:
- __init__
- reset
- step
- observation_spec
- action_spec
Binary file added docs/env_anim/graph_coloring.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/env_anim/rware.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/env_img/graph_coloring.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/env_img/rware.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
56 changes: 56 additions & 0 deletions docs/environments/graph_coloring.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Graph Coloring Environment

<p align="center">
<img src="../env_img/graph_coloring.png" width="500"/>
</p>

We provide here a Jax JIT-able implementation of the Graph Coloring environment.

Graph coloring is a combinatorial optimization problem where the objective is to assign a color to each vertex of a graph in such a way that no two adjacent vertices share the same color. The problem is usually formulated as minimizing the number of colors used. The `GraphColoring` environment is an episodic, single-agent setting that allows for the exploration of graph coloring algorithms and reinforcement learning methods.

## Observation

The observation in the `GraphColoring` environment includes information about the graph, the colors assigned to the vertices, the action mask, and the current node index.

- `graph`: jax array (bool) of shape `(num_nodes, num_nodes)`, representing the adjacency matrix of the graph.
- For example, a random observation of the graph adjacency matrix:

```[[False, True, False, True],
[ True, False, True, False],
[False, True, False, True],
[ True, False, True, False]]```

- `colors`: a JAX array (int32) of shape `(num_nodes,)`, representing the current color assignments for the vertices. Initially, all elements are set to -1, indicating that no colors have been assigned yet.
- For example, an initial color assignment:
```[-1, -1, -1, -1]```

- `action_mask`: a JAX array of boolean values, shaped `(num_colors,)`, which indicates the valid actions in the current state of the environment. Each position in the array corresponds to a color. True at a position signifies that the corresponding color can be used to color a node, while False indicates the opposite.
- For example, for 4 number of colors available:
```[True, False, True, False]```

- `current_node_index`: an integer representing the current node being colored.
- For example, an initial current_node_index might be 0.

## Action

The action space is a DiscreteArray of integer values in `[0, 1, ..., num_colors - 1]`. Each action corresponds to assigning a color to the current node.

## Reward

The reward in the `GraphColoring` environment is given as follows:

- `sparse reward`: a reward is provided at the end of the episode and equals the negative of the number of unique colors used to color all vertices in the graph.

The agent's goal is to find a valid coloring using as few colors as possible while avoiding conflicts with adjacent nodes.

## Episode Termination

The goal of the agent is to find a valid coloring using as few colors as possible. An episode in the graph coloring environment can terminate under two conditions:

1. All nodes have been assigned a color: the environment iteratively assigns colors to nodes. When all nodes have a color assigned (i.e., there are no nodes with a color value of -1), the episode ends. This is the natural termination condition and ideally the one we'd like the agent to achieve.

2. Invalid action is taken: an action is considered invalid if it tries to assign a color to a node that is not within the allowed color set for that node at that time. The allowed color set for each node is updated after every action. If an invalid action is attempted, the episode immediately terminates and the agent receives a large negative reward. This encourages the agent to learn valid actions and discourages it from making invalid actions.

## Registered Versions 📖

- `GraphColoring-v0`: The default settings for the `GraphColoring` problem with a configurable number of nodes and edge_probability. The default number of nodes is 20, and the default edge probability is 0.8.
46 changes: 46 additions & 0 deletions docs/environments/robot_warehouse.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# RobotWarehouse Environment

<p align="center">
<img src="../env_anim/robot_warehouse.gif" width="600"/>
</p>

We provide a JAX jit-able implementation of the [Robotic Warehouse](https://github.com/semitable/robotic-warehouse/tree/master)
environment.

The Robot Warehouse (RWARE) environment simulates a warehouse with robots moving and delivering requested goods. Real-world applications inspire the simulator, in which robots pick up shelves and deliver them to a workstation. Humans access the content of a shelf, and then robots can return them to empty shelf locations.

The goal is to successfully deliver as many requested shelves in a given time budget.

Once a shelf has been delivered, a new shelf is requested at random. Agents start each episode at random locations within the warehouse.

## Observation

The **observation** seen by the agent is a `NamedTuple` containing the following:

- `agents_view`: jax array (int32) of shape `(num_agents, num_obs_features)`, array representing the agent's view of other agents
and shelves.

- `action_mask`: jax array (bool) of shape `(num_agents, 5)`, array specifying, for each agent,
which action (noop, forward, left, right, toggle_load) is legal.

- `step_count`: jax array (int32) of shape `()`, number of steps elapsed in the current episode.

## Action

The action space is a `MultiDiscreteArray` containing an integer value in `[0, 1, 2, 3, 4]` for each
agent. Each agent can take one of five actions: noop (`0`), forward (`1`), turn left (`2`), turn right (`3`), or toggle_load (`4`).

The episode terminates under the following conditions:

- An invalid action is taken, or

- An agent collides with another agent.

## Reward

The reward is global and shared among the agents. It is equal to the number of shelves which were
delivered successfully during the time step (i.e., +1 for each shelf).

## Registered Versions 📖

- `RobotWarehouse-v0`, a warehouse with 4 agents each with a sensor range of 1, a warehouse floor with 2 shelf rows, 3 shelf columns, a column height of 8, and a shelf request queue of 8.
8 changes: 8 additions & 0 deletions jumanji/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@
# Game2048 - the game of 2048 with the default board size of 4x4.
register(id="Game2048-v1", entry_point="jumanji.environments:Game2048")

# GraphColoring - the graph coloring problem with the default graph of
# 20 number of nodes and 0.8 edge probability.
register(id="GraphColoring-v0", entry_point="jumanji.environments:GraphColoring")

# Minesweeper on a board of size 10x10 with 10 mines.
register(id="Minesweeper-v0", entry_point="jumanji.environments:Minesweeper")

Expand Down Expand Up @@ -104,6 +108,10 @@
# Maze with 10 rows and 10 columns, a time limit of 100 and a random maze generator.
register(id="Maze-v0", entry_point="jumanji.environments:Maze")

# RobotWarehouse with a random generator with 2 shelf rows, 3 shelf columns, a column height of 8,
# 4 agents, a sensor range of 1, and a request queue of size 8.
register(id="RobotWarehouse-v0", entry_point="jumanji.environments:RobotWarehouse")

# Snake game on a board of size 12x12 with a time limit of 4000.
register(id="Snake-v1", entry_point="jumanji.environments:Snake")

Expand Down
12 changes: 11 additions & 1 deletion jumanji/environments/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,18 +16,28 @@

from jumanji.environments.logic import game_2048, minesweeper, rubiks_cube
from jumanji.environments.logic.game_2048.env import Game2048
from jumanji.environments.logic.graph_coloring.env import GraphColoring
from jumanji.environments.logic.minesweeper import Minesweeper
from jumanji.environments.logic.rubiks_cube import RubiksCube
from jumanji.environments.logic.sudoku import Sudoku
from jumanji.environments.packing import bin_pack, job_shop, knapsack
from jumanji.environments.packing.bin_pack.env import BinPack
from jumanji.environments.packing.job_shop.env import JobShop
from jumanji.environments.packing.knapsack.env import Knapsack
from jumanji.environments.routing import cleaner, connector, cvrp, maze, snake, tsp
from jumanji.environments.routing import (
cleaner,
connector,
cvrp,
maze,
robot_warehouse,
snake,
tsp,
)
from jumanji.environments.routing.cleaner.env import Cleaner
from jumanji.environments.routing.connector.env import Connector
from jumanji.environments.routing.cvrp.env import CVRP
from jumanji.environments.routing.maze.env import Maze
from jumanji.environments.routing.robot_warehouse.env import RobotWarehouse
from jumanji.environments.routing.snake.env import Snake
from jumanji.environments.routing.tsp.env import TSP

Expand Down
16 changes: 16 additions & 0 deletions jumanji/environments/logic/graph_coloring/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Copyright 2022 InstaDeep Ltd. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from jumanji.environments.logic.graph_coloring.env import GraphColoring
from jumanji.environments.logic.graph_coloring.types import Observation, State
23 changes: 23 additions & 0 deletions jumanji/environments/logic/graph_coloring/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Copyright 2022 InstaDeep Ltd. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import pytest

from jumanji.environments.logic.graph_coloring import GraphColoring


@pytest.fixture
def graph_coloring() -> GraphColoring:
"""Instantiates a default GraphColoring environment."""
return GraphColoring()
Loading

0 comments on commit d6e98d6

Please sign in to comment.