instadeepai · clement-bonnet · Jun 1, 2023 · May 16, 2023 · May 18, 2023 · May 19, 2023
diff --git a/README.md b/README.md
@@ -19,7 +19,6 @@
 | [**Docs**](https://instadeepai.github.io/jumanji)
 ---
 
-
 <p float="left" align="center">
   <img src="docs/env_anim/connector.gif" alt="Connector" width="30%" />
   <img src="docs/env_anim/snake.gif" alt="Snake" width="30%" />
@@ -32,8 +31,6 @@
   <img src="docs/env_anim/minesweeper.gif" alt="Minesweeper" width="30%" />
 </p>
 
-
-
 ## Welcome to the Jungle! 🌴
 
 Jumanji is a suite of diverse and challenging reinforcement learning (RL) environments written in
@@ -70,7 +67,6 @@ JAX-based environments.
 - 🏎️ **Training:** example agents that can be used as inspiration for the agents one may implement
 in their research.
 
-
 ## Environments 🌍
 
 Jumanji provides a diverse range of environments ranging from simple games to NP-hard combinatorial
@@ -88,20 +84,24 @@ problems.
 | :link: Connector                         | Routing  | `Connector-v1`                                       | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/connector/) | [doc](https://instadeepai.github.io/jumanji/environments/connector/)   |
 | 🚚 CVRP (Capacitated Vehicle Routing Problem)  | Routing  | `CVRP-v1`                                            | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/cvrp/)      | [doc](https://instadeepai.github.io/jumanji/environments/cvrp/)        |
 | :mag: Maze   | Routing  | `Maze-v0`                                            | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/maze/)      | [doc](https://instadeepai.github.io/jumanji/environments/maze/)        |
+| :robot: RobotWarehouse  | Routing  | `RobotWarehouse-v0`                                            | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/robot_warehouse/)      | [doc](https://instadeepai.github.io/jumanji/environments/robot_warehouse/)        |
 | 🐍 Snake                                       | Routing  | `Snake-v1`                                           | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/snake/)     | [doc](https://instadeepai.github.io/jumanji/environments/snake/)       |
 | 📬 TSP (Travelling Salesman Problem)           | Routing  | `TSP-v1`                                             | [code](https://github.com/instadeepai/jumanji/tree/main/jumanji/environments/routing/tsp/)       | [doc](https://instadeepai.github.io/jumanji/environments/tsp/)         |
 
-
 ## Installation 🎬
 
 You can install the latest release of Jumanji from PyPI:
+
 ```bash
 pip install jumanji
 ```
+
 Alternatively, you can install the latest development version directly from GitHub:
+
 ```bash
 pip install git+https://github.com/instadeepai/jumanji.git
 ```
+
 Jumanji has been tested on Python 3.8 and 3.9.
 Note that because the installation of JAX differs depending on your hardware accelerator,
 we advise users to explicitly install the correct JAX version (see the
@@ -113,7 +113,6 @@ you will need a GUI backend. For example, on Linux, you can install Tk via:
 [Matplotlib backends](https://matplotlib.org/stable/users/explain/backends.html) for a list of
 backends you can use.
 
-
 ## Quickstart ⚡
 
 RL practitioners will find Jumanji's interface familiar as it combines the widely adopted
@@ -170,7 +169,6 @@ the version number is incremented by one to prevent potential confusion.
 For a full list of registered versions of each environment, check out
 [the documentation](https://instadeepai.github.io/jumanji/environments/tsp/).
 
-
 ## Training 🏎️
 
 To showcase how to train RL agents on Jumanji environments, we provide a random agent and a vanilla
@@ -191,18 +189,17 @@ actor-critic networks in
 For more information on how to use the example agents, see the
 [training guide](https://instadeepai.github.io/jumanji/guides/training/).
 
-
 ## Contributing 🤝
 
 Contributions are welcome! See our issue tracker for
 [good first issues](https://github.com/instadeepai/jumanji/labels/good%20first%20issue). Please read
 our [contributing guidelines](https://github.com/instadeepai/jumanji/blob/main/CONTRIBUTING.md) for
 details on how to submit pull requests, our Contributor License Agreement, and community guidelines.
 
-
 ## Citing Jumanji ✏️
 
 If you use Jumanji in your work, please cite the library using:
+
 ```
 @software{jumanji2023github,
   author = {Clément Bonnet and Daniel Luo and Donal Byrne and Sasha Abramowitz
@@ -216,7 +213,6 @@ If you use Jumanji in your work, please cite the library using:
 }
 ```
 
-
 ## See Also 🔎
 
 Other works have embraced the approach of writing RL environments in JAX.

diff --git a/docs/api/environments/rware.md b/docs/api/environments/rware.md
@@ -0,0 +1,8 @@
+::: jumanji.environments.routing.robot_warehouse.env.RobotWarehouse
+    selection:
+      members:
+        - __init__
+        - reset
+        - step
+        - observation_spec
+        - action_spec
diff --git a/docs/env_anim/rware.gif b/docs/env_anim/rware.gif
diff --git a/docs/env_img/rware.png b/docs/env_img/rware.png
diff --git a/docs/environments/robot_warehouse.md b/docs/environments/robot_warehouse.md
@@ -0,0 +1,46 @@
+# RobotWarehouse Environment
+
+<p align="center">
+        <img src="../env_anim/robot_warehouse.gif" width="600"/>
+</p>
+
+We provide a JAX jit-able implementation of the [Robotic Warehouse](https://github.com/semitable/robotic-warehouse/tree/master)
+environment.
+
+The Robot Warehouse (RWARE) environment simulates a warehouse with robots moving and delivering requested goods. Real-world applications inspire the simulator, in which robots pick up shelves and deliver them to a workstation. Humans access the content of a shelf, and then robots can return them to empty shelf locations.
+
+The goal is to successfully deliver as many requested shelves in a given time budget.
+
+Once a shelf has been delivered, a new shelf is requested at random. Agents start each episode at random locations within the warehouse.
+
+## Observation
+
+The **observation** seen by the agent is a `NamedTuple` containing the following:
+
+- `agents_view`: jax array (int32) of shape `(num_agents, num_obs_features)`, array representing the agent's view of other agents
+    and shelves.
+
+- `action_mask`: jax array (bool) of shape `(num_agents, 5)`, array specifying, for each agent,
+    which action (noop, forward, left, right, toggle_load) is legal.
+
+- `step_count`: jax array (int32) of shape `()`, number of steps elapsed in the current episode.
+
+## Action
+
+The action space is a `MultiDiscreteArray` containing an integer value in `[0, 1, 2, 3, 4]` for each
+agent. Each agent can take one of five actions: noop (`0`), forward (`1`), turn left (`2`), turn right (`3`), or toggle_load (`4`).
+
+The episode terminates under the following conditions:
+
+- An invalid action is taken, or
+
+- An agent collides with another agent.
+
+## Reward
+
+The reward is global and shared among the agents. It is equal to the number of shelves which were
+delivered successfully during the time step (i.e., +1 for each shelf).
+
+## Registered Versions 📖
+
+- `RobotWarehouse-v0`, a warehouse with 4 agents each with a sensor range of 1, a warehouse floor with 2 shelf rows, 3 shelf columns, a column height of 8, and a shelf request queue of 8.
diff --git a/jumanji/__init__.py b/jumanji/__init__.py
@@ -77,6 +77,10 @@
 # Maze with 10 rows and 10 columns, a time limit of 100 and a random maze generator.
 register(id="Maze-v0", entry_point="jumanji.environments:Maze")
 
+# RobotWarehouse with a random generator with 2 shelf rows, 3 shelf columns, a column height of 8,
+# 4 agents, a sensor range of 1, and a request queue of size 8.
+register(id="RobotWarehouse-v0", entry_point="jumanji.environments:RobotWarehouse")
+
 # Snake game on a board of size 12x12 with a time limit of 4000.
 register(id="Snake-v1", entry_point="jumanji.environments:Snake")
 

diff --git a/jumanji/environments/__init__.py b/jumanji/environments/__init__.py
@@ -22,11 +22,20 @@
 from jumanji.environments.packing.bin_pack.env import BinPack
 from jumanji.environments.packing.job_shop.env import JobShop
 from jumanji.environments.packing.knapsack.env import Knapsack
-from jumanji.environments.routing import cleaner, connector, cvrp, maze, snake, tsp
+from jumanji.environments.routing import (
+    cleaner,
+    connector,
+    cvrp,
+    maze,
+    robot_warehouse,
+    snake,
+    tsp,
+)
 from jumanji.environments.routing.cleaner.env import Cleaner
 from jumanji.environments.routing.connector.env import Connector
 from jumanji.environments.routing.cvrp.env import CVRP
 from jumanji.environments.routing.maze.env import Maze
+from jumanji.environments.routing.robot_warehouse.env import RobotWarehouse
 from jumanji.environments.routing.snake.env import Snake
 from jumanji.environments.routing.tsp.env import TSP
 

diff --git a/jumanji/environments/routing/robot_warehouse/__init__.py b/jumanji/environments/routing/robot_warehouse/__init__.py
@@ -0,0 +1,16 @@
+# Copyright 2022 InstaDeep Ltd. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from jumanji.environments.routing.robot_warehouse.env import RobotWarehouse
+from jumanji.environments.routing.robot_warehouse.types import Observation, State
diff --git a/jumanji/environments/routing/robot_warehouse/conftest.py b/jumanji/environments/routing/robot_warehouse/conftest.py
@@ -0,0 +1,99 @@
+# Copyright 2022 InstaDeep Ltd. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import Tuple
+
+import jax
+import jax.numpy as jnp
+import pytest
+
+from jumanji.environments.routing.robot_warehouse import RobotWarehouse
+from jumanji.environments.routing.robot_warehouse.generator import RandomGenerator
+from jumanji.environments.routing.robot_warehouse.types import (
+    Agent,
+    Position,
+    Shelf,
+    State,
+)
+from jumanji.types import TimeStep
+
+
+@pytest.fixture(scope="module")
+def robot_warehouse_env() -> RobotWarehouse:
+    """Instantiates a default RobotWarehouse environment with 2 agents, 1 shelf row, 3 shelf columns,
+    a column height of 2, sensor range of 1 and a request queue size of 4."""
+    generator = RandomGenerator(
+        shelf_rows=1,
+        shelf_columns=3,
+        column_height=2,
+        num_agents=2,
+        sensor_range=1,
+        request_queue_size=4,
+    )
+
+    env = RobotWarehouse(
+        generator=generator,
+        time_limit=5,
+    )
+    return env
+
+
+@pytest.fixture
+def deterministic_robot_warehouse_env(
+    robot_warehouse_env: RobotWarehouse,
+) -> Tuple[RobotWarehouse, State, TimeStep]:
+    """Instantiates a RobotWarehouse environment with 2 agents and 8 shelves
+    with a step limit of 5."""
+    state, timestep = robot_warehouse_env.reset(jax.random.PRNGKey(42))
+
+    # create agents, shelves and grid
+    def make_agent(x: int, y: int, direction: int, is_carrying: int) -> Agent:
+        return Agent(Position(x=x, y=y), direction=direction, is_carrying=is_carrying)
+
+    def make_shelf(x: int, y: int, is_requested: int) -> Shelf:
+        return Shelf(Position(x=x, y=y), is_requested=is_requested)
+
+    # agent information
+    xs = jnp.array([3, 1])
+    ys = jnp.array([4, 7])
+    dirs = jnp.array([2, 3])
+    carries = jnp.array([0, 0])
+    state.agents = jax.vmap(make_agent)(xs, ys, dirs, carries)
+
+    # shelf information
+    xs = jnp.array([1, 1, 1, 1, 2, 2, 2, 2])
+    ys = jnp.array([1, 2, 7, 8, 1, 2, 7, 8])
+    requested = jnp.array([0, 1, 1, 0, 0, 0, 1, 1])
+    state.shelves = jax.vmap(make_shelf)(xs, ys, requested)
+
+    # create grid
+    state.grid = jnp.array(
+        [
+            [
+                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                [0, 1, 2, 0, 0, 0, 0, 3, 4, 0],
+                [0, 5, 6, 0, 0, 0, 0, 7, 8, 0],
+                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+            ],
+            [
+                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                [0, 0, 0, 0, 0, 0, 0, 2, 0, 0],
+                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+                [0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
+                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
+            ],
+        ]
+    )
+    return robot_warehouse_env, state, timestep
diff --git a/jumanji/environments/routing/robot_warehouse/constants.py b/jumanji/environments/routing/robot_warehouse/constants.py
@@ -0,0 +1,37 @@
+# Copyright 2022 InstaDeep Ltd. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import jax.numpy as jnp
+
+from jumanji.environments.routing.robot_warehouse.types import Direction
+
+# grid channels
+_SHELVES = 0
+_AGENTS = 1
+
+# agent directions
+_POSSIBLE_DIRECTIONS = jnp.array([d.value for d in Direction])
+
+# viewer constants
+_FIGURE_SIZE = (5, 5)
+_SHELF_PADDING = 2
+
+# colors
+_GRID_COLOR = (0, 0, 0)  # black
+_SHELF_COLOR = (72 / 255.0, 61 / 255.0, 139 / 255.0)  # dark slate blue
+_SHELF_REQ_COLOR = (0, 128 / 255.0, 128 / 255.0)  # teal
+_AGENT_COLOR = (1, 140 / 255.0, 0)  # dark orange
+_AGENT_LOADED_COLOR = (1, 0, 0)  # red
+_AGENT_DIR_COLOR = (0, 0, 0)  # black
+_GOAL_COLOR = (60 / 255.0, 60 / 255.0, 60 / 255.0)