Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] [Feature] Multi-agent collectors #828

Closed
wants to merge 56 commits into from

Conversation

matteobettini
Copy link
Contributor

Description

This PR introduces the mask, described in #807, which can be used in multi-agent settings to decide what parts of env batch size to consider for collection

Example

env = ParallelEnv(n_env_workers, VmasEnv("flocking", n_envs, n_agents))
env.batch_size # (n_env_workers, n_agents, n_envs)

# Case 1
collector = MultiSyncDataCollector([env] * n_collector_workers, policy, env_batch_size_mask=(True,True,True), frames_per_batch=frames_per_batch)
for i, data in enumerate(collector):
    data.batch_size  # (n_env_workers, n_agents, n_envs, frames_per_batch / (n_collector_workers * n_env_workers * n_agents * n_envs))

# Case 2
collector = MultiSyncDataCollector([env] * n_collector_workers, policy, env_batch_size_mask=(True,False,True), frames_per_batch=frames_per_batch)
for i, data in enumerate(collector):
   data.batch_size  # (n_env_workers, n_agents, n_envs, frames_per_batch / (n_collector_workers * n_env_workers * n_envs))

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 13, 2023
@matteobettini matteobettini changed the title [BugFix] [Feature] Multi-agent collectors [WIP] [BugFix] [Feature] Multi-agent collectors Jan 13, 2023
@matteobettini
Copy link
Contributor Author

matteobettini commented Jan 13, 2023

TODO:

  • Multi-collectors and their docs
  • split_traj
  • Tests

@matteobettini
Copy link
Contributor Author

matteobettini commented Jan 15, 2023

@vmoens

In the current state, this PR works when split_traj=False.

Now, we should discuss how we want to change the behavior of split_traj. Currently that function is written to take only into consideration envs with 1 domension (parallel envs) and increases that dimension in order to pad the last one.

Let's take an example to discuss new possibilities, here is the setup:

my_env.batch_size = (n_env_workers, n_agents, n_envs)
env_batch_size_mask = (True, False, True)

frames_per_batch = frames_per_batch
n_colletor_workers = n_colletor_workers
max_frames_per_traj = max_frames_per_traj

Now in case we start collecting, what should the batch_size be at the output? We are interested the case split_traj=True
and split_traj=False.

Here are the options for me.

## Option 1

For the split_traj=False case, we do as we discussed and how is already done in this PR: the size is

(n_env_workers, n_agents, n_envs, frames_per_batch / (n_collector_workers * n_env_workers * n_envs))

Then, for the case of split_traj=True we add an additional dimesion (let's say at the beginning) and the shape becomes

(frames_per_batch / (n_collector_workers * n_env_workers * n_envs * max_frames_per_traj), n_env_workers, n_agents, n_envs, max_frames_per_traj)

Option 2

Alternatively, since this adds yet another dimension in an arbitrary place. We could make so that after collection, collectors squash all the batch dims into 1 single dim according to the mask. This would mean that the collectors with split_traj=False return shapes of

(frames_per_batch , n_agents) # which would be just (frames_per_batch) without a mask

Then, for the case of split_traj=True

(frames_per_batch / max_frames_per_traj, n_agents, max_frames_per_traj)

My take

I really like option 2, but it might be needed to keep a batch_size that contains env.batch_size for some reasons. Let me know. It might be worth noting that is we do 1, 2 should be easily derivable and not the opposite.

@vmoens
Copy link
Contributor

vmoens commented Jan 19, 2023

Related to #845
Do we actually care about split_trajs (as integrated in the collector) in the first place?
As stated in the aforementioned, the usage of split_trajs was to make sure that advantage functions and similar would not overlap computation over consecutive but independent trajectories. I now believe that a properly coded advantage module should not suffer from that issue if the done state is used properly.
For the rest (eg custom loss functions that may be buggy on that regard) we could write a function split_traj with a limited behaviour (ie assumes a certain input shape, eg always a 2d tensordict or smth along that line) and do something like what you propose in Option 2. We would properly document this for those interested. What do you think?

@matteobettini
Copy link
Contributor Author

matteobettini commented Jan 19, 2023

@vmoens

I couldn't agree more. I think done is enough to compute advantages. This will require some work in refactoring advantages and losses but it will make everything look nicer. I am all in for what you said.

@smorad
Copy link
Contributor

smorad commented Jan 19, 2023

@vmoens I would really like to have time-dimension padding for my recurrent models. Can we keep this as a helper feature somewhere?

@vmoens
Copy link
Contributor

vmoens commented Jan 19, 2023

Definitely! What we think is to have it as an option, turned off by default

@matteobettini
Copy link
Contributor Author

So @vmoens if we keep split_traj but just default it to false I think we should make it work in the context of this PR (aka agnostic to the env batch size using the mask). Should I try to work on option 2 for all collectors?

@vmoens
Copy link
Contributor

vmoens commented Jan 19, 2023

So @vmoens if we keep split_traj but just default it to false I think we should make it work in the context of this PR (aka agnostic to the env batch size using the mask). Should I try to work on option 2 for all collectors?

That works for me.
FYI as of #848 all value functions work with contiguous trajectories 🥂 (they're vectorized, no for loop, which makes editing them extremely challenging)

@matteobettini
Copy link
Contributor Author

Oh I didn't see those, I was blindly only using the tests in test_collectors I'll have a look and remark this as draft

@matteobettini matteobettini marked this pull request as draft January 23, 2023 10:30
@matteobettini matteobettini marked this pull request as ready for review January 23, 2023 11:21
@matteobettini
Copy link
Contributor Author

Now we should be good.

@matteobettini
Copy link
Contributor Author

Also attempts to fix #846 with the warning solution

@vmoens
Copy link
Contributor

vmoens commented Jan 23, 2023

Also attempts to fix #846 with the warning solution

Ok, then I would re-open the issue (or keep it open) to make this warning an error. WDYT?

@matteobettini
Copy link
Contributor Author

Yes, I'll leave a comment on that issue to say that we want to implement the error solution

@matteobettini
Copy link
Contributor Author

matteobettini commented Jan 25, 2023

This PR also impacts the way frames are counted after collection.

Currently, in the codebase frames are counted either as mask.sum() or batch.numel().

While these counting mechanisms will still work for enviornments with non set mask_env_batch_size (aka all of the current examples in the code apart from the tests introduced here), they will not give the right number of frames when the mask is set (aka multiagent and so on).

With this PR I have made so that there are new ways to get the number of collected frames.

  1. collector.frames_per_batch will always return the exact number of frames collected (even when the warning is triggered)(when it becomes an error the parameter passed to the collector will also be the true number of frames)
  2. batch.batch_size[0] will be equal to collector.frames_per_batch when the split_traj=False
  3. when split_traj=True i still have to think of a simple way to do it without having access to the collector object (cause the sum will also include the mask_env_batch_sizemasked out dimensions and count the agents as part of the frames). Im sure there is a simple way but i’m too tired now to see it :)

@vmoens
Copy link
Contributor

vmoens commented Jan 26, 2023

Hi @matteobettini
Sorry to keep waiting with this.
I'm working on some improvements for the collectors, and to avoid multiplying the work on both ends I was thinking to complete that before merging your branch. I hope it's ok!

@matteobettini
Copy link
Contributor Author

yep no probs!

# Conflicts:
#	test/test_collector.py
#	torchrl/collectors/collectors.py
#	torchrl/collectors/utils.py
Comment on lines 301 to 305
mask_env_batch_size (Sequence[bool], optional): a sequence of bools of the same length as env.batch_size,
with a value of True it indicates to consider the corresponding dimension of env.batch_size as part of the
batch of environments used to collect frames. A value of False it indicates NOT to consider that dimension
as part of the batch of environments used to collect frames (used for agent dimension in multi-agent settings).
Default is None (corresponding to all True).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mask_env_batch_size (Sequence[bool], optional): a sequence of bools of the same length as env.batch_size,
with a value of True it indicates to consider the corresponding dimension of env.batch_size as part of the
batch of environments used to collect frames. A value of False it indicates NOT to consider that dimension
as part of the batch of environments used to collect frames (used for agent dimension in multi-agent settings).
Default is None (corresponding to all True).
mask_env_batch_size (Sequence[bool], optional): a sequence of bools of the same length as env.batch_size.
A value of ``True`` indicates that the corresponding dimension of ``env.batch_size`` is to be included in the computation of the number of frames collected. A value of ``False`` indicates NOT to consider this particular dimension
as part of the batch of environments used to collect frames (e.g. used for agent dimension in multi-agent settings).
Defaults to ``True`` for all dims.

@matteobettini
Two questions here:

  • What if we provide envs with a different batch size? (It should break when putting the tensordicts together in the sync case but it should work in the async case)...
  • What to do with non batch-locked envs (ie envs that are purely stateless and just read and write in a tensordict). Should we consider this case too, and constrain the mask_env_batch_size to have a length that match the batch dimensions of the tensordicts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matteobettini Two questions here:

  • What if we provide envs with a different batch size? (It should break when putting the tensordicts together in the sync case but it should work in the async case)...

The sync case will try to cat them so i think it would break. The async case will just return the one that finishes first i guess?

  • What to do with non batch-locked envs (ie envs that are purely stateless and just read and write in a tensordict). Should we consider this case too, and constrain the mask_env_batch_size to have a length that match the batch dimensions of the tensordicts?

In this case the mask will be checked to match env.batch_size. If during execution the batch size changes the mask will not work I guess. We could also uese a mask which is just a list of dimensions to exclude from being considered as part of the batch, something like

batch_excluded_dims=[0,2]

Which would work as long as those dimension are part of the batch_size of the non-batch locked env.

torchrl/collectors/collectors.py Outdated Show resolved Hide resolved
torchrl/collectors/collectors.py Outdated Show resolved Hide resolved
torchrl/collectors/utils.py Outdated Show resolved Hide resolved
matteobettini and others added 5 commits February 28, 2023 10:07
Co-authored-by: Vincent Moens <vincentmoens@gmail.com>
Co-authored-by: Vincent Moens <vincentmoens@gmail.com>
Co-authored-by: Vincent Moens <vincentmoens@gmail.com>
# Conflicts:
#	test/test_collector.py
#	torchrl/collectors/collectors.py
@matteobettini matteobettini marked this pull request as draft March 1, 2023 13:58
@matteobettini matteobettini deleted the env_batch_size_mask branch May 11, 2023 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants