-
-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] No-death wrapper / argument #372
Comments
Yes, this is an interesting idea. Would you be able to work on this? |
Yes, what would the preferred method be?
I think the check for obstacles is similar but I need to look at the Are there other environmnents / scenarios where the agent can die beside lava and obstacles? |
I was thinking of a similar solution. My only worry is that on Yes, a wrapper would be the best, and can probably make this a generic wrapper. I think we can have a parameter, |
I made this wrapper, I think it is general enough. class NoDeath(Wrapper):
"""
Wrapper to prevent death in specific cells (e.g., lava cells).
Instead of dying, the agent will receive a negative reward.
Example:
>>> import gymnasium as gym
>>> from minigrid.wrappers import NoDeath
>>>
>>> env = gym.make("MiniGrid-LavaCrossingS9N1-v0")
>>> obs, _ = env.reset(seed=2)
>>> obs, *_ = env.step(1)
>>> _, reward, term, *_ = env.step(2)
>>> reward, term
(0, True)
>>>
>>> env = NoDeath(env, ["lava"], -1)
>>> obs, _ = env.reset(seed=2)
>>> obs, *_ = env.step(1)
>>> _, reward, term, *_ = env.step(2)
>>> reward, term
(-1, False)
>>>
>>>
>>> env = gym.make("MiniGrid-Dynamic-Obstacles-5x5-v0")
>>> obs, _ = env.reset(seed=2)
>>> _, reward, term, *_ = env.step(2)
>>> reward, term
(-1, True)
>>>
>>> env = NoDeath(env, ["ball"], -1)
>>> obs, _ = env.reset(seed=2)
>>> _, reward, term, *_ = env.step(2)
>>> reward, term
(-2, False)
"""
def __init__(self, env, no_death_types: tuple[str, ...], death_cost: float = -1.):
"""A wrapper to prevent death in specific cells.
Args:
env: The environment to apply the wrapper
no_death_types: List of strings to identify death cells
death_cost: The negative reward received in death cells
"""
super().__init__(env)
self.death_cost = death_cost
self.no_death_types = no_death_types
def step(self, action):
# In Dynamic-Obstacles, obstacles move after the agent moves,
# so we need to check for collision before self.env.step()
front_cell = self.grid.get(*self.front_pos)
going_to_death = (
action == self.actions.forward and
front_cell is not None and
front_cell.type in self.no_death_types
)
obs, reward, terminated, truncated, info = self.env.step(action)
# We also check if the agent stays in death cells (e.g., lava)
# without moving
current_cell = self.grid.get(*self.agent_pos)
in_death = (
current_cell is not None and
current_cell.type in self.no_death_types
)
if terminated and (going_to_death or in_death):
terminated = False
reward += self.death_cost
return obs, reward, terminated, truncated, info |
The code looks good to me and I think those are the correct places, could you make a PR Could you add a test for lava and any other obvious death type that we make a known sequence of actions which should kill the agent and check that |
Proposal
Have a wrapper or pass an additional argument such that death states like obstacles or lava are ignored, possibly giving a penalty (lava cells do not give any).
Motivation
This would make it possible to investigate exploration and safety. Having lava cells as terminal states makes exploration challenging, but without any penalty it's not really "dangerous".
Making it possible to walk on lava (at a cost) would make exploration more interesting because the agent can find some "aggressive" exploration strategies, like walking over lava to explore as much as possible. This would be a scenario that should be avoided in safe RL.
Checklist
The text was updated successfully, but these errors were encountered: