Merge branch 'main' into feat/add-sudoku-environment

instadeepai · Jun 1, 2023 · d6e98d6 · d6e98d6
2 parents 83e087f + 701dc4e
commit d6e98d6
Show file tree

Hide file tree

Showing 45 changed files with 4,825 additions and 12 deletions.
diff --git a/README.md b/README.md
@@ -19,7 +19,6 @@
 | [**Docs**](https://instadeepai.github.io/jumanji)
 ---
 
-
 <p float="left" align="center">
   <img src="docs/env_anim/connector.gif" alt="Connector" width="30%" />
   <img src="docs/env_anim/snake.gif" alt="Snake" width="30%" />
@@ -28,12 +27,11 @@
   <img src="docs/env_anim/bin_pack.gif" alt="BinPack" width="30%" />
   <img src="docs/env_anim/cvrp.gif" alt="CVRP" width="30%" />
   <img src="docs/env_anim/rubiks_cube.gif" alt="RubiksCube" width="30%" />
+  <img src="docs/env_anim/graph_coloring.gif" alt="GraphColoring" width="30%" />
   <img src="docs/env_anim/game_2048.gif" alt="Game2048" width="30%" />
   <img src="docs/env_anim/sudoku.gif" alt="Sudoku" width="30%" />
 </p>
 
-
-
 ## Welcome to the Jungle! 🌴
 
 Jumanji is a suite of diverse and challenging reinforcement learning (RL) environments written in
@@ -70,7 +68,6 @@ JAX-based environments.
 - 🏎️ **Training:** example agents that can be used as inspiration for the agents one may implement
 in their research.
 
-
 ## Environments 🌍
 
 Jumanji provides a diverse range of environments ranging from simple games to NP-hard combinatorial
@@ -79,6 +76,7 @@ problems.
 | Environment                              | Category | Registered Version(s)                                | Source                                                                                           | Description                                                            |
 |------------------------------------------|----------|------------------------------------------------------|--------------------------------------------------------------------------------------------------|------------------------------------------------------------------------|
 | 🔢 Game2048                              | Logic  | `Game2048-v1`                                        | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/logic/game_2048/)   | [doc](https://instadeepai.github.io/jumanji/environments/game_2048/)   |
+| 🔵🔗🟡🔗🔴 GraphColoring                              | Logic  | `GraphColoring-v0`                                        | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/logic/graph_coloring/)   | [doc](https://instadeepai.github.io/jumanji/environments/graph_coloring/)   |
 | 💣 Minesweeper                           | Logic    | `Minesweeper-v0`                                     | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/logic/minesweeper/) | [doc](https://instadeepai.github.io/jumanji/environments/minesweeper/) |
 | 🎲 RubiksCube                            | Logic    | `RubiksCube-v0`<br/>`RubiksCube-partly-scrambled-v0` | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/logic/rubiks_cube/) | [doc](https://instadeepai.github.io/jumanji/environments/rubiks_cube/) |
 | ✏️ Sudoku                       | Logic    | `Sudoku-v0` <br/>`Sudoku-very-easy-v0`| [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/logic/sudoku/) | [doc](https://instadeepai.github.io/jumanji/environments/sudoku/) |
@@ -89,20 +87,24 @@ problems.
 | :link: Connector                         | Routing  | `Connector-v1`                                       | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/connector/) | [doc](https://instadeepai.github.io/jumanji/environments/connector/)   |
 | 🚚 CVRP (Capacitated Vehicle Routing Problem)  | Routing  | `CVRP-v1`                                            | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/cvrp/)      | [doc](https://instadeepai.github.io/jumanji/environments/cvrp/)        |
 | :mag: Maze   | Routing  | `Maze-v0`                                            | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/maze/)      | [doc](https://instadeepai.github.io/jumanji/environments/maze/)        |
+| :robot: RobotWarehouse  | Routing  | `RobotWarehouse-v0`                                            | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/robot_warehouse/)      | [doc](https://instadeepai.github.io/jumanji/environments/robot_warehouse/)        |
 | 🐍 Snake                                       | Routing  | `Snake-v1`                                           | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/snake/)     | [doc](https://instadeepai.github.io/jumanji/environments/snake/)       |
 | 📬 TSP (Travelling Salesman Problem)           | Routing  | `TSP-v1`                                             | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/tsp/)       | [doc](https://instadeepai.github.io/jumanji/environments/tsp/)         |
 
-
 ## Installation 🎬
 
 You can install the latest release of Jumanji from PyPI:
+
 ```bash
 pip install jumanji
 ```
+
 Alternatively, you can install the latest development version directly from GitHub:
+
 ```bash
 pip install git+https://github.com/instadeepai/jumanji.git
 ```
+
 Jumanji has been tested on Python 3.8 and 3.9.
 Note that because the installation of JAX differs depending on your hardware accelerator,
 we advise users to explicitly install the correct JAX version (see the
@@ -114,7 +116,6 @@ you will need a GUI backend. For example, on Linux, you can install Tk via:
 [Matplotlib backends](https://matplotlib.org/stable/users/explain/backends.html) for a list of
 backends you can use.
 
-
 ## Quickstart ⚡
 
 RL practitioners will find Jumanji's interface familiar as it combines the widely adopted
@@ -171,7 +172,6 @@ the version number is incremented by one to prevent potential confusion.
 For a full list of registered versions of each environment, check out
 [the documentation](https://instadeepai.github.io/jumanji/environments/tsp/).
 
-
 ## Training 🏎️
 
 To showcase how to train RL agents on Jumanji environments, we provide a random agent and a vanilla
@@ -192,18 +192,17 @@ actor-critic networks in
 For more information on how to use the example agents, see the
 [training guide](https://instadeepai.github.io/jumanji/guides/training/).
 
-
 ## Contributing 🤝
 
 Contributions are welcome! See our issue tracker for
 [good first issues](https://github.com/instadeepai/jumanji/labels/good%20first%20issue). Please read
 our [contributing guidelines](https://github.com/instadeepai/jumanji/blob/main/CONTRIBUTING.md) for
 details on how to submit pull requests, our Contributor License Agreement, and community guidelines.
 
-
 ## Citing Jumanji ✏️
 
 If you use Jumanji in your work, please cite the library using:
+
 ```
 @software{jumanji2023github,
   author = {Clément Bonnet and Daniel Luo and Donal Byrne and Sasha Abramowitz
@@ -217,7 +216,6 @@ If you use Jumanji in your work, please cite the library using:
 }
 ```
 
-
 ## See Also 🔎
 
 Other works have embraced the approach of writing RL environments in JAX.

diff --git a/docs/api/environments/graph_coloring.md b/docs/api/environments/graph_coloring.md
@@ -0,0 +1,8 @@
+::: jumanji.environments.logic.graph_coloring.env.GraphColoring
+    selection:
+      members:
+        - __init__
+        - reset
+        - step
+        - observation_spec
+        - action_spec
diff --git a/docs/api/environments/rware.md b/docs/api/environments/rware.md
@@ -0,0 +1,8 @@
+::: jumanji.environments.routing.robot_warehouse.env.RobotWarehouse
+    selection:
+      members:
+        - __init__
+        - reset
+        - step
+        - observation_spec
+        - action_spec
diff --git a/docs/env_anim/graph_coloring.gif b/docs/env_anim/graph_coloring.gif
diff --git a/docs/env_anim/rware.gif b/docs/env_anim/rware.gif
diff --git a/docs/env_img/graph_coloring.png b/docs/env_img/graph_coloring.png
diff --git a/docs/env_img/rware.png b/docs/env_img/rware.png
diff --git a/docs/environments/graph_coloring.md b/docs/environments/graph_coloring.md
@@ -0,0 +1,56 @@
+# Graph Coloring Environment
+
+<p align="center">
+    <img src="../env_img/graph_coloring.png" width="500"/>
+</p>
+
+We provide here a Jax JIT-able implementation of the Graph Coloring environment.
+
+Graph coloring is a combinatorial optimization problem where the objective is to assign a color to each vertex of a graph in such a way that no two adjacent vertices share the same color. The problem is usually formulated as minimizing the number of colors used. The `GraphColoring` environment is an episodic, single-agent setting that allows for the exploration of graph coloring algorithms and reinforcement learning methods.
+
+## Observation
+
+The observation in the `GraphColoring` environment includes information about the graph, the colors assigned to the vertices, the action mask, and the current node index.
+
+- `graph`: jax array (bool) of shape `(num_nodes, num_nodes)`, representing the adjacency matrix of the graph.
+  - For example, a random observation of the graph adjacency matrix:
+
+        ```[[False,  True, False,  True],
+        [ True, False,  True, False],
+        [False,  True, False,  True],
+        [ True, False,  True, False]]```
+
+- `colors`: a JAX array (int32) of shape `(num_nodes,)`, representing the current color assignments for the vertices. Initially, all elements are set to -1, indicating that no colors have been assigned yet.
+  - For example, an initial color assignment:
+    ```[-1, -1, -1, -1]```
+
+- `action_mask`: a JAX array of boolean values, shaped `(num_colors,)`, which indicates the valid actions in the current state of the environment. Each position in the array corresponds to a color. True at a position signifies that the corresponding color can be used to color a node, while False indicates the opposite.
+  - For example, for 4 number of colors available:
+    ```[True, False, True, False]```
+
+- `current_node_index`: an integer representing the current node being colored.
+  - For example, an initial current_node_index might be 0.
+
+## Action
+
+The action space is a DiscreteArray of integer values in `[0, 1, ..., num_colors - 1]`. Each action corresponds to assigning a color to the current node.
+
+## Reward
+
+The reward in the `GraphColoring` environment is given as follows:
+
+- `sparse reward`: a reward is provided at the end of the episode and equals the negative of the number of unique colors used to color all vertices in the graph.
+
+The agent's goal is to find a valid coloring using as few colors as possible while avoiding conflicts with adjacent nodes.
+
+## Episode Termination
+
+The goal of the agent is to find a valid coloring using as few colors as possible. An episode in the graph coloring environment can terminate under two conditions:
+
+1. All nodes have been assigned a color: the environment iteratively assigns colors to nodes. When all nodes have a color assigned (i.e., there are no nodes with a color value of -1), the episode ends. This is the natural termination condition and ideally the one we'd like the agent to achieve.
+
+2. Invalid action is taken: an action is considered invalid if it tries to assign a color to a node that is not within the allowed color set for that node at that time. The allowed color set for each node is updated after every action. If an invalid action is attempted, the episode immediately terminates and the agent receives a large negative reward. This encourages the agent to learn valid actions and discourages it from making invalid actions.
+
+## Registered Versions 📖
+
+- `GraphColoring-v0`: The default settings for the `GraphColoring` problem with a configurable number of nodes and edge_probability. The default number of nodes is 20, and the default edge probability is 0.8.
diff --git a/docs/environments/robot_warehouse.md b/docs/environments/robot_warehouse.md
@@ -0,0 +1,46 @@
+# RobotWarehouse Environment
+
+<p align="center">
+        <img src="../env_anim/robot_warehouse.gif" width="600"/>
+</p>
+
+We provide a JAX jit-able implementation of the [Robotic Warehouse](https://github.com/semitable/robotic-warehouse/tree/master)
+environment.
+
+The Robot Warehouse (RWARE) environment simulates a warehouse with robots moving and delivering requested goods. Real-world applications inspire the simulator, in which robots pick up shelves and deliver them to a workstation. Humans access the content of a shelf, and then robots can return them to empty shelf locations.
+
+The goal is to successfully deliver as many requested shelves in a given time budget.
+
+Once a shelf has been delivered, a new shelf is requested at random. Agents start each episode at random locations within the warehouse.
+
+## Observation
+
+The **observation** seen by the agent is a `NamedTuple` containing the following:
+
+- `agents_view`: jax array (int32) of shape `(num_agents, num_obs_features)`, array representing the agent's view of other agents
+    and shelves.
+
+- `action_mask`: jax array (bool) of shape `(num_agents, 5)`, array specifying, for each agent,
+    which action (noop, forward, left, right, toggle_load) is legal.
+
+- `step_count`: jax array (int32) of shape `()`, number of steps elapsed in the current episode.
+
+## Action
+
+The action space is a `MultiDiscreteArray` containing an integer value in `[0, 1, 2, 3, 4]` for each
+agent. Each agent can take one of five actions: noop (`0`), forward (`1`), turn left (`2`), turn right (`3`), or toggle_load (`4`).
+
+The episode terminates under the following conditions:
+
+- An invalid action is taken, or
+
+- An agent collides with another agent.
+
+## Reward
+
+The reward is global and shared among the agents. It is equal to the number of shelves which were
+delivered successfully during the time step (i.e., +1 for each shelf).
+
+## Registered Versions 📖
+
+- `RobotWarehouse-v0`, a warehouse with 4 agents each with a sensor range of 1, a warehouse floor with 2 shelf rows, 3 shelf columns, a column height of 8, and a shelf request queue of 8.
diff --git a/jumanji/__init__.py b/jumanji/__init__.py
@@ -32,6 +32,10 @@
 # Game2048 - the game of 2048 with the default board size of 4x4.
 register(id="Game2048-v1", entry_point="jumanji.environments:Game2048")
 
+# GraphColoring - the graph coloring problem with the default graph of
+# 20 number of nodes and 0.8 edge probability.
+register(id="GraphColoring-v0", entry_point="jumanji.environments:GraphColoring")
+
 # Minesweeper on a board of size 10x10 with 10 mines.
 register(id="Minesweeper-v0", entry_point="jumanji.environments:Minesweeper")
 
@@ -104,6 +108,10 @@
 # Maze with 10 rows and 10 columns, a time limit of 100 and a random maze generator.
 register(id="Maze-v0", entry_point="jumanji.environments:Maze")
 
+# RobotWarehouse with a random generator with 2 shelf rows, 3 shelf columns, a column height of 8,
+# 4 agents, a sensor range of 1, and a request queue of size 8.
+register(id="RobotWarehouse-v0", entry_point="jumanji.environments:RobotWarehouse")
+
 # Snake game on a board of size 12x12 with a time limit of 4000.
 register(id="Snake-v1", entry_point="jumanji.environments:Snake")
 

diff --git a/jumanji/environments/__init__.py b/jumanji/environments/__init__.py
@@ -16,18 +16,28 @@
 
 from jumanji.environments.logic import game_2048, minesweeper, rubiks_cube
 from jumanji.environments.logic.game_2048.env import Game2048
+from jumanji.environments.logic.graph_coloring.env import GraphColoring
 from jumanji.environments.logic.minesweeper import Minesweeper
 from jumanji.environments.logic.rubiks_cube import RubiksCube
 from jumanji.environments.logic.sudoku import Sudoku
 from jumanji.environments.packing import bin_pack, job_shop, knapsack
 from jumanji.environments.packing.bin_pack.env import BinPack
 from jumanji.environments.packing.job_shop.env import JobShop
 from jumanji.environments.packing.knapsack.env import Knapsack
-from jumanji.environments.routing import cleaner, connector, cvrp, maze, snake, tsp
+from jumanji.environments.routing import (
+    cleaner,
+    connector,
+    cvrp,
+    maze,
+    robot_warehouse,
+    snake,
+    tsp,
+)
 from jumanji.environments.routing.cleaner.env import Cleaner
 from jumanji.environments.routing.connector.env import Connector
 from jumanji.environments.routing.cvrp.env import CVRP
 from jumanji.environments.routing.maze.env import Maze
+from jumanji.environments.routing.robot_warehouse.env import RobotWarehouse
 from jumanji.environments.routing.snake.env import Snake
 from jumanji.environments.routing.tsp.env import TSP
 

diff --git a/jumanji/environments/logic/graph_coloring/__init__.py b/jumanji/environments/logic/graph_coloring/__init__.py
@@ -0,0 +1,16 @@
+# Copyright 2022 InstaDeep Ltd. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from jumanji.environments.logic.graph_coloring.env import GraphColoring
+from jumanji.environments.logic.graph_coloring.types import Observation, State
diff --git a/jumanji/environments/logic/graph_coloring/conftest.py b/jumanji/environments/logic/graph_coloring/conftest.py
@@ -0,0 +1,23 @@
+# Copyright 2022 InstaDeep Ltd. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import pytest
+
+from jumanji.environments.logic.graph_coloring import GraphColoring
+
+
+@pytest.fixture
+def graph_coloring() -> GraphColoring:
+    """Instantiates a default GraphColoring environment."""
+    return GraphColoring()