Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Action masking for Space.sample() #2906

Merged
merged 26 commits into from
Jun 26, 2022

Conversation

pseudo-rnd-thoughts
Copy link
Contributor

@pseudo-rnd-thoughts pseudo-rnd-thoughts commented Jun 17, 2022

Adds action masking as requested in #2823 to allow spaces to mask certain actions. These masks are the positive case where 1 means that it is possible for the action to be taken and 0 for the action to not be possible. For all of the gym environments, this PR adds a parameter in sample(mask=...) with the particular type required being dependent on the space. Box is a special case where we don't implement masking due to the neural network not being able to provide values for continuous distributions, however, if a good reason is found, this could be added.
To the gym Taxi environment, we add a new info key "action_mask" which is the recommended method for using the masking for custom environments.

Example masks

>>> import numpy as np
>>> from gym import spaces

# Box space doesn't have masks

>>> space = spaces.Discrete(4)
>>> space.sample(mask=np.array([0, 1, 1, 1], dtype=np.int8)) 
2
>>> space.sample(mask=np.array([0, 0, 0, 0], dtype=np.int8)) 
0

>>> space = spaces.MultiDiscrete([4, 2])
>>> space.sample(mask=(np.array([0, 1, 0, 1], dtype=np.int8), np.array([0, 0], dtype=np.int8))) 
[1 0]

>>> space = spaces.MultiDiscrete(np.array([[4, 2], [3, 4]]))
>>> space.sample(mask=((np.array([1, 1, 1, 1], dtype=np.int8), np.array([0, 1], dtype=np.int8)), (np.array([0, 0, 0], dtype=np.int8), np.array([1, 1, 0, 0], dtype=np.int8))))  
[[2 1]
 [0 1]]

>>> space = spaces.MultiBinary([2, 3])
>>> space.sample(mask=np.array([[0, 0, 1], [1, 1, 0]], dtype=np.int8))
[[0 0 0]
 [1 1 0]]

# Composite spaces (Dict, Tuple and Graph)
>>> space = spaces.Dict(a=spaces.Discrete(3), b=spaces.Box(0, 1, (1,)))
>>> space.sample(mask={"a": np.array([0, 1, 1], dtype=np.int8), "b": None}))
OrderedDict([('a', 1), ('b', array([0.6812336], dtype=float32))])

>>> space = spaces.Tuple((spaces.Box(0, 1, (1,)), spaces.Discrete(3)))
>>> space.sample(mask=(None, np.array([0, 0, 0], dtype=np.int8)))  
(array([0.74909943], dtype=float32), 0)

>>> space = spaces.Graph(node_space=spaces.Box(0, 1, (1,)), edge_space=spaces.Discrete(3))
>>> space.sample(mask=(None, np.array([0, 1, 1], dtype=np.int8)), num_nodes=4)) 
GraphInstance(nodes=array([[0.5791068 ], [0.43347424], [0.6848027 ], [0.23124644]], dtype=float32), edges=array([2, 2]), edge_links=array([[1, 0], [2, 1]]))
>>> space.sample(mask=(None, (np.array([1, 1, 1], dtype=np.int8), np.array([1, 0, 0], dtype=np.int8), np.array([0, 1, 0], dtype=np.int8), np.array([0, 1, 1], dtype=np.int8))), num_nodes=4, num_edges=4))  
GraphInstance(nodes=array([[0.21213211], [0.4872798 ], [0.69442934], [0.92085034]], dtype=float32), edges=array([2, 0, 1, 2]), edge_links=array([[0, 2], [0, 0],  [3, 2], [2, 0]]))
  • Add tests for sample mask
  • Add tests for sample with discrete distributions (Discrete, MultiDiscrete, MultiBinary)
  • Add test for Box sample
  • Add docstrings
  • Add taxi docstrings
  • Fix taxi action mask

gym/spaces/box.py Outdated Show resolved Hide resolved
Copy link
Contributor

@vwxyzjn vwxyzjn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A great first pass! Thanks for adding this feature!

gym/core.py Show resolved Hide resolved
gym/envs/toy_text/taxi.py Outdated Show resolved Hide resolved
gym/spaces/box.py Show resolved Hide resolved
gym/spaces/discrete.py Outdated Show resolved Hide resolved
gym/spaces/multi_discrete.py Outdated Show resolved Hide resolved
Copy link
Contributor

@Markus28 Markus28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments.

I have some additional questions regarding the new Graph space, not directly related to this PR:

  • Why do we only allow Box and Discrete spaces for node- and edge-features? I guess it's because of _generate_sample_space but we could easily do without that method.
  • I think we should change the sample method to allow more flexibility (e.g. sampling from a G(n,p) or G(n, M) model, etc.)
  • In _generate_sample_space, why would we expect the base_space to be None?

gym/envs/toy_text/taxi.py Outdated Show resolved Hide resolved
gym/envs/toy_text/taxi.py Outdated Show resolved Hide resolved
gym/spaces/dict.py Outdated Show resolved Hide resolved
gym/spaces/discrete.py Outdated Show resolved Hide resolved
gym/spaces/graph.py Outdated Show resolved Hide resolved
gym/spaces/multi_binary.py Outdated Show resolved Hide resolved
gym/spaces/multi_discrete.py Outdated Show resolved Hide resolved
gym/spaces/multi_discrete.py Outdated Show resolved Hide resolved
gym/spaces/space.py Outdated Show resolved Hide resolved
gym/spaces/graph.py Outdated Show resolved Hide resolved
gym/spaces/graph.py Outdated Show resolved Hide resolved
gym/spaces/graph.py Outdated Show resolved Hide resolved
@vwxyzjn
Copy link
Contributor

vwxyzjn commented Jun 26, 2022

Hi, @PseudoRnd#6426 thanks for this PR. LGTM to be merged as preliminary support for action masking.

That said, making it work with gym-microrts' action space is tricker than I thought. Here is what the gym-microrts' action space roughly looks like

import gym
import numpy as np

height = 16
width = 16
action_space_dims = [6, 4, 4, 4, 4, 7, 7 * 7]
action_plane_space = gym.spaces.MultiDiscrete(action_space_dims)
action_space = np.ones((height, width, len(action_space_dims)))
action_space[:,:,:] = action_space_dims
action_space = gym.spaces.MultiDiscrete(action_space)
print("each unit's action_space has shape", action_plane_space.shape)
print("player's action_space (controlling 256 units at the same time) has shape", action_space.shape)
mask = np.ones((16, 16, sum(action_space_dims)))
print("player's action_space mask has shape", mask.shape)

# each unit's action_space has shape (7,)
# player's action_space (controlling 256 units at the same time) has shape (16, 16, 7)
# player's action_space mask has shape (16, 16, 78)

Notice here the (16, 16) is just a batch dimension, which is the reason I could implement mask that way to make things more efficient. This however does mean integrating with gym's action masking implementation is more difficult. To use the current API I think I need to stack a lot of numpy object such as

mask = np.array(
    np.ones(6), np.ones(4), np.ones(4), np.ones(4), np.ones(4),, np.ones(7),, np.ones(49),
    np.ones(6), np.ones(4), np.ones(4), np.ones(4), np.ones(4),, np.ones(7),, np.ones(49),
    np.ones(6), np.ones(4), np.ones(4), np.ones(4), np.ones(4),, np.ones(7),, np.ones(49),
...
)

The gym-microrts use case is definitely quite specialized, so I don't think it's necessary to support it but it's good that we keep this in mind when considering additional support.

@jkterry1 jkterry1 merged commit 024b0f5 into openai:master Jun 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants