Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] No-death wrapper / argument #372

Closed
1 task done
sparisi opened this issue Jul 1, 2023 · 6 comments
Closed
1 task done

[Proposal] No-death wrapper / argument #372

sparisi opened this issue Jul 1, 2023 · 6 comments

Comments

@sparisi
Copy link
Contributor

sparisi commented Jul 1, 2023

Proposal

Have a wrapper or pass an additional argument such that death states like obstacles or lava are ignored, possibly giving a penalty (lava cells do not give any).

Motivation

This would make it possible to investigate exploration and safety. Having lava cells as terminal states makes exploration challenging, but without any penalty it's not really "dangerous".
Making it possible to walk on lava (at a cost) would make exploration more interesting because the agent can find some "aggressive" exploration strategies, like walking over lava to explore as much as possible. This would be a scenario that should be avoided in safe RL.

Checklist

  • I have checked that there is no similar issue in the repo (required)
@pseudo-rnd-thoughts
Copy link
Member

Yes, this is an interesting idea. Would you be able to work on this?

@sparisi
Copy link
Contributor Author

sparisi commented Jul 1, 2023

Yes, what would the preferred method be?
Right now for my code I have this simple wrapper and it works fine

class LavaNoDeath(Wrapper):
    """
    Lava cells do not kill the agent. Instead, they yield a penalty of -10.

    """
    def step(self, action):
        obs, reward, terminated, truncated, info = self.env.step(action)

        current_cell = self.grid.get(*self.agent_pos)
        if current_cell is not None and current_cell.type == "lava":
            terminated = False
            reward = -10

        return obs, reward, terminated, truncated, info

I think the check for obstacles is similar but I need to look at the DynamicObstacle env.
I could make a generic NoDeathWrapper, or would an optional argument to gymnasium.make() be better?

Are there other environmnents / scenarios where the agent can die beside lava and obstacles?

@pseudo-rnd-thoughts
Copy link
Member

I was thinking of a similar solution. My only worry is that on terminated=True then the environment would do some random reset code etc but I can't see any of that currently.

Yes, a wrapper would be the best, and can probably make this a generic wrapper. I think we can have a parameter, no_death_types: tuple[str, ...] which contains a list of valid no-death tiles.

@sparisi
Copy link
Contributor Author

sparisi commented Jul 3, 2023

I made this wrapper, I think it is general enough.
I can make a PR adding it to wrappers.py and to docs/api/wrappers.md. I see there is also a testing function for wrappers, but I am not sure what the best way to test this wrapper would be.

class NoDeath(Wrapper):
    """
    Wrapper to prevent death in specific cells (e.g., lava cells).
    Instead of dying, the agent will receive a negative reward.

    Example:
        >>> import gymnasium as gym
        >>> from minigrid.wrappers import NoDeath
        >>>
        >>> env = gym.make("MiniGrid-LavaCrossingS9N1-v0")
        >>> obs, _ = env.reset(seed=2)
        >>> obs, *_ = env.step(1)
        >>> _, reward, term, *_ = env.step(2)
        >>> reward, term
        (0, True)
        >>>
        >>> env = NoDeath(env, ["lava"], -1)
        >>> obs, _ = env.reset(seed=2)
        >>> obs, *_ = env.step(1)
        >>> _, reward, term, *_ = env.step(2)
        >>> reward, term
        (-1, False)
        >>>
        >>>
        >>> env = gym.make("MiniGrid-Dynamic-Obstacles-5x5-v0")
        >>> obs, _ = env.reset(seed=2)
        >>> _, reward, term, *_ = env.step(2)
        >>> reward, term
        (-1, True)
        >>>
        >>> env = NoDeath(env, ["ball"], -1)
        >>> obs, _ = env.reset(seed=2)
        >>> _, reward, term, *_ = env.step(2)
        >>> reward, term
        (-2, False)
    """

    def __init__(self, env, no_death_types: tuple[str, ...], death_cost: float = -1.):
        """A wrapper to prevent death in specific cells.

        Args:
            env: The environment to apply the wrapper
            no_death_types: List of strings to identify death cells
            death_cost: The negative reward received in death cells

        """
        super().__init__(env)
        self.death_cost = death_cost
        self.no_death_types = no_death_types

    def step(self, action):
        # In Dynamic-Obstacles, obstacles move after the agent moves,
        # so we need to check for collision before self.env.step()
        front_cell = self.grid.get(*self.front_pos)
        going_to_death = (
            action == self.actions.forward and
            front_cell is not None and
            front_cell.type in self.no_death_types
        )

        obs, reward, terminated, truncated, info = self.env.step(action)

        # We also check if the agent stays in death cells (e.g., lava)
        # without moving
        current_cell = self.grid.get(*self.agent_pos)
        in_death = (
            current_cell is not None and
            current_cell.type in self.no_death_types
        )

        if terminated and (going_to_death or in_death):
            terminated = False
            reward += self.death_cost

        return obs, reward, terminated, truncated, info

@pseudo-rnd-thoughts
Copy link
Member

pseudo-rnd-thoughts commented Jul 4, 2023

The code looks good to me and I think those are the correct places, could you make a PR
I think we might want to prevent the goal type being in the no_death_types list

Could you add a test for lava and any other obvious death type that we make a known sequence of actions which should kill the agent and check that terminated=False and the reward is as expected

@sparisi sparisi mentioned this issue Jul 4, 2023
8 tasks
@sparisi sparisi closed this as completed Jul 5, 2023
@pseudo-rnd-thoughts
Copy link
Member

@sparisi FYI, we will cut a release when #371 is merged as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants