[BugFix] [Feature] Multi-agent collectors #828

matteobettini · 2023-01-13T11:36:44Z

Description

This PR introduces the mask, described in #807, which can be used in multi-agent settings to decide what parts of env batch size to consider for collection

Example

env = ParallelEnv(n_env_workers, VmasEnv("flocking", n_envs, n_agents))
env.batch_size # (n_env_workers, n_agents, n_envs)

# Case 1
collector = MultiSyncDataCollector([env] * n_collector_workers, policy, env_batch_size_mask=(True,True,True), frames_per_batch=frames_per_batch)
for i, data in enumerate(collector):
    data.batch_size  # (n_env_workers, n_agents, n_envs, frames_per_batch / (n_collector_workers * n_env_workers * n_agents * n_envs))

# Case 2
collector = MultiSyncDataCollector([env] * n_collector_workers, policy, env_batch_size_mask=(True,False,True), frames_per_batch=frames_per_batch)
for i, data in enumerate(collector):
   data.batch_size  # (n_env_workers, n_agents, n_envs, frames_per_batch / (n_collector_workers * n_env_workers * n_envs))

matteobettini · 2023-01-13T11:42:32Z

TODO:

Multi-collectors and their docs
split_traj
Tests

matteobettini · 2023-01-15T18:39:02Z

@vmoens

In the current state, this PR works when split_traj=False.

Now, we should discuss how we want to change the behavior of split_traj. Currently that function is written to take only into consideration envs with 1 domension (parallel envs) and increases that dimension in order to pad the last one.

Let's take an example to discuss new possibilities, here is the setup:

my_env.batch_size = (n_env_workers, n_agents, n_envs)
env_batch_size_mask = (True, False, True)

frames_per_batch = frames_per_batch
n_colletor_workers = n_colletor_workers
max_frames_per_traj = max_frames_per_traj

Now in case we start collecting, what should the batch_size be at the output? We are interested the case split_traj=True
and split_traj=False.

Here are the options for me.

## Option 1

For the split_traj=False case, we do as we discussed and how is already done in this PR: the size is

(n_env_workers, n_agents, n_envs, frames_per_batch / (n_collector_workers * n_env_workers * n_envs))

Then, for the case of split_traj=True we add an additional dimesion (let's say at the beginning) and the shape becomes

(frames_per_batch / (n_collector_workers * n_env_workers * n_envs * max_frames_per_traj), n_env_workers, n_agents, n_envs, max_frames_per_traj)

Option 2

Alternatively, since this adds yet another dimension in an arbitrary place. We could make so that after collection, collectors squash all the batch dims into 1 single dim according to the mask. This would mean that the collectors with split_traj=False return shapes of

(frames_per_batch , n_agents) # which would be just (frames_per_batch) without a mask

Then, for the case of split_traj=True

(frames_per_batch / max_frames_per_traj, n_agents, max_frames_per_traj)

My take

I really like option 2, but it might be needed to keep a batch_size that contains env.batch_size for some reasons. Let me know. It might be worth noting that is we do 1, 2 should be easily derivable and not the opposite.

vmoens · 2023-01-19T12:00:28Z

Related to #845
Do we actually care about split_trajs (as integrated in the collector) in the first place?
As stated in the aforementioned, the usage of split_trajs was to make sure that advantage functions and similar would not overlap computation over consecutive but independent trajectories. I now believe that a properly coded advantage module should not suffer from that issue if the done state is used properly.
For the rest (eg custom loss functions that may be buggy on that regard) we could write a function split_traj with a limited behaviour (ie assumes a certain input shape, eg always a 2d tensordict or smth along that line) and do something like what you propose in Option 2. We would properly document this for those interested. What do you think?

matteobettini · 2023-01-19T12:29:22Z

@vmoens

I couldn't agree more. I think done is enough to compute advantages. This will require some work in refactoring advantages and losses but it will make everything look nicer. I am all in for what you said.

smorad · 2023-01-19T18:35:58Z

@vmoens I would really like to have time-dimension padding for my recurrent models. Can we keep this as a helper feature somewhere?

vmoens · 2023-01-19T18:53:54Z

Definitely! What we think is to have it as an option, turned off by default

matteobettini · 2023-01-19T20:42:15Z

So @vmoens if we keep split_traj but just default it to false I think we should make it work in the context of this PR (aka agnostic to the env batch size using the mask). Should I try to work on option 2 for all collectors?

vmoens · 2023-01-19T21:16:38Z

So @vmoens if we keep split_traj but just default it to false I think we should make it work in the context of this PR (aka agnostic to the env batch size using the mask). Should I try to work on option 2 for all collectors?

That works for me.
FYI as of #848 all value functions work with contiguous trajectories 🥂 (they're vectorized, no for loop, which makes editing them extremely challenging)

matteobettini · 2023-01-23T10:30:14Z

Oh I didn't see those, I was blindly only using the tests in test_collectors I'll have a look and remark this as draft

matteobettini · 2023-01-23T11:21:51Z

Now we should be good.

matteobettini · 2023-01-23T15:27:18Z

Also attempts to fix #846 with the warning solution

vmoens · 2023-01-23T16:46:24Z

Also attempts to fix #846 with the warning solution

Ok, then I would re-open the issue (or keep it open) to make this warning an error. WDYT?

matteobettini · 2023-01-23T17:19:34Z

Yes, I'll leave a comment on that issue to say that we want to implement the error solution

# Conflicts: # test/test_collector.py # torchrl/collectors/collectors.py # torchrl/collectors/utils.py

matteobettini · 2023-01-25T21:51:32Z

This PR also impacts the way frames are counted after collection.

Currently, in the codebase frames are counted either as mask.sum() or batch.numel().

While these counting mechanisms will still work for enviornments with non set mask_env_batch_size (aka all of the current examples in the code apart from the tests introduced here), they will not give the right number of frames when the mask is set (aka multiagent and so on).

With this PR I have made so that there are new ways to get the number of collected frames.

collector.frames_per_batch will always return the exact number of frames collected (even when the warning is triggered)(when it becomes an error the parameter passed to the collector will also be the true number of frames)
batch.batch_size[0] will be equal to collector.frames_per_batch when the split_traj=False
when split_traj=True i still have to think of a simple way to do it without having access to the collector object (cause the sum will also include the mask_env_batch_sizemasked out dimensions and count the agents as part of the frames). Im sure there is a simple way but i’m too tired now to see it :)

vmoens · 2023-01-26T16:33:21Z

Hi @matteobettini
Sorry to keep waiting with this.
I'm working on some improvements for the collectors, and to avoid multiplying the work on both ends I was thinking to complete that before merging your branch. I hope it's ok!

matteobettini · 2023-01-26T20:37:34Z

yep no probs!

# Conflicts: # test/test_collector.py # torchrl/collectors/collectors.py # torchrl/collectors/utils.py

vmoens · 2023-02-20T17:03:10Z

torchrl/collectors/collectors.py

+        mask_env_batch_size (Sequence[bool], optional): a sequence of bools of the same length as env.batch_size,
+            with a value of True it indicates to consider the corresponding dimension of env.batch_size as part of the
+            batch of environments used to collect frames. A value of False it indicates NOT to consider that dimension
+            as part of the batch of environments used to collect frames (used for agent dimension in multi-agent settings).
+            Default is None (corresponding to all True).


Suggested change

mask_env_batch_size (Sequence[bool], optional): a sequence of bools of the same length as env.batch_size,

with a value of True it indicates to consider the corresponding dimension of env.batch_size as part of the

batch of environments used to collect frames. A value of False it indicates NOT to consider that dimension

as part of the batch of environments used to collect frames (used for agent dimension in multi-agent settings).

Default is None (corresponding to all True).

mask_env_batch_size (Sequence[bool], optional): a sequence of bools of the same length as env.batch_size.

A value of ``True`` indicates that the corresponding dimension of ``env.batch_size`` is to be included in the computation of the number of frames collected. A value of ``False`` indicates NOT to consider this particular dimension

as part of the batch of environments used to collect frames (e.g. used for agent dimension in multi-agent settings).

Defaults to ``True`` for all dims.

@matteobettini
Two questions here:

What if we provide envs with a different batch size? (It should break when putting the tensordicts together in the sync case but it should work in the async case)...

What to do with non batch-locked envs (ie envs that are purely stateless and just read and write in a tensordict). Should we consider this case too, and constrain the mask_env_batch_size to have a length that match the batch dimensions of the tensordicts?

@matteobettini Two questions here:

What if we provide envs with a different batch size? (It should break when putting the tensordicts together in the sync case but it should work in the async case)...

The sync case will try to cat them so i think it would break. The async case will just return the one that finishes first i guess?

What to do with non batch-locked envs (ie envs that are purely stateless and just read and write in a tensordict). Should we consider this case too, and constrain the mask_env_batch_size to have a length that match the batch dimensions of the tensordicts?

In this case the mask will be checked to match env.batch_size. If during execution the batch size changes the mask will not work I guess. We could also uese a mask which is just a list of dimensions to exclude from being considered as part of the batch, something like

batch_excluded_dims=[0,2]

Which would work as long as those dimension are part of the batch_size of the non-batch locked env.

torchrl/collectors/collectors.py

torchrl/collectors/utils.py

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

# Conflicts: # test/test_collector.py # torchrl/collectors/collectors.py

Mask in collectors

6c002d7

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 13, 2023

matteobettini changed the title ~~[BugFix] [Feature] Multi-agent collectors~~ [WIP] [BugFix] [Feature] Multi-agent collectors Jan 13, 2023

matteobettini mentioned this pull request Jan 15, 2023

[Feature Request] Vectorized/multi-agent environments compatibilitiy issues #777

Closed

matteobettini marked this pull request as draft January 15, 2023 14:35

matteobettini added 5 commits January 15, 2023 16:34

Works without traj_split

7245b58

Temp

a772ef5

step count

67c87d6

clone the expand

bb76d9e

link and docs

e1c0df0

This was referenced Jan 19, 2023

[BUG] Memory leak? #845

Closed

[BUG] Collectors of batched environemnts return more frames than requested #846

Closed

vmoens added the enhancement New feature or request label Jan 19, 2023

matteobettini added 9 commits January 20, 2023 09:34

added errors for frames overflow

630f4de

Merge branch 'main' into env_batch_size_mask

29374d4

now all batch dimensions are squashed in first one

cc99b2a

amend

352c9c0

refector

14f8e69

removed numel with mask

571851b

fixed traj_ids

e22d1ed

amend

5fa83ff

amend

9e22fe1

Lint

6702f1b

matteobettini marked this pull request as draft January 23, 2023 10:30

matteobettini added 2 commits January 23, 2023 10:59

tests

e78cd2b

Lint

7ec62a9

matteobettini marked this pull request as ready for review January 23, 2023 11:21

matteobettini mentioned this pull request Jan 24, 2023

[Feature Request] multi-agent tutorial #862

Closed

matteobettini added 6 commits January 25, 2023 20:24

Merge branch 'main' into env_batch_size_mask

e9c4487

# Conflicts: # test/test_collector.py # torchrl/collectors/collectors.py # torchrl/collectors/utils.py

merge main

3b2646c

lint

d778c17

refactor

26de618

adjusted self.frames_per_batch to always give the correct frames

1b2a3b4

typo

b4c8ba2

Merge branch 'main' into env_batch_size_mask

cf66664

# Conflicts: # test/test_collector.py # torchrl/collectors/collectors.py # torchrl/collectors/utils.py

vmoens reviewed Feb 20, 2023

View reviewed changes

matteobettini and others added 5 commits February 28, 2023 10:07

Update torchrl/collectors/collectors.py

1b1d2ea

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

Update torchrl/collectors/collectors.py

f450b39

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

Update torchrl/collectors/utils.py

80f2b55

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

Merge branch 'main' into env_batch_size_mask

1666111

# Conflicts: # test/test_collector.py # torchrl/collectors/collectors.py

refactor doc

3545dd7

matteobettini marked this pull request as draft March 1, 2023 13:58

matteobettini closed this Apr 1, 2023

matteobettini deleted the env_batch_size_mask branch May 11, 2023 08:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] [Feature] Multi-agent collectors #828

[BugFix] [Feature] Multi-agent collectors #828

matteobettini commented Jan 13, 2023

matteobettini commented Jan 13, 2023 •

edited by vmoens

Loading

matteobettini commented Jan 15, 2023 •

edited

Loading

vmoens commented Jan 19, 2023 •

edited

Loading

matteobettini commented Jan 19, 2023 •

edited

Loading

smorad commented Jan 19, 2023

vmoens commented Jan 19, 2023

matteobettini commented Jan 19, 2023

vmoens commented Jan 19, 2023 •

edited

Loading

matteobettini commented Jan 23, 2023

matteobettini commented Jan 23, 2023

matteobettini commented Jan 23, 2023

vmoens commented Jan 23, 2023

matteobettini commented Jan 23, 2023

matteobettini commented Jan 25, 2023 •

edited

Loading

vmoens commented Jan 26, 2023

matteobettini commented Jan 26, 2023

vmoens Feb 20, 2023

matteobettini Feb 28, 2023

[BugFix] [Feature] Multi-agent collectors #828

[BugFix] [Feature] Multi-agent collectors #828

Conversation

matteobettini commented Jan 13, 2023

Description

Example

matteobettini commented Jan 13, 2023 • edited by vmoens Loading

matteobettini commented Jan 15, 2023 • edited Loading

Option 2

My take

vmoens commented Jan 19, 2023 • edited Loading

matteobettini commented Jan 19, 2023 • edited Loading

smorad commented Jan 19, 2023

vmoens commented Jan 19, 2023

matteobettini commented Jan 19, 2023

vmoens commented Jan 19, 2023 • edited Loading

matteobettini commented Jan 23, 2023

matteobettini commented Jan 23, 2023

matteobettini commented Jan 23, 2023

vmoens commented Jan 23, 2023

matteobettini commented Jan 23, 2023

matteobettini commented Jan 25, 2023 • edited Loading

vmoens commented Jan 26, 2023

matteobettini commented Jan 26, 2023

vmoens Feb 20, 2023

Choose a reason for hiding this comment

matteobettini Feb 28, 2023

Choose a reason for hiding this comment

matteobettini commented Jan 13, 2023 •

edited by vmoens

Loading

matteobettini commented Jan 15, 2023 •

edited

Loading

vmoens commented Jan 19, 2023 •

edited

Loading

matteobettini commented Jan 19, 2023 •

edited

Loading

vmoens commented Jan 19, 2023 •

edited

Loading

matteobettini commented Jan 25, 2023 •

edited

Loading