Remove data from state in Collector, and remove preprocess_fn there #1058

MischaPanch · 2024-02-20T11:43:03Z

The Collector tracks the self.data field that is a RolloutBatch. However, rather confusingly, this value of this field is recreated and updated at every call to collect. This leads to a lot of methods mutating the state, and a less comprehensible and debuggable code.

There is no need to bind this to the lifetime of the Collector, instead it should be created and used as a var in collect.

Moreover, there is the possibility to pass preprocess_fn, which

doesn't have a clear interfaces
doesn't have a clear purpose
Is only used in a test and nowhere else
It seems like the original purpose of that was to enable specific logging needs, but these needs can be met by different means.
Significally complicates the Collector

The preprocess_fn in Collector should be removed

This issue blocks #1046 since we need to reduce the complexity of the Collector before doing anything further with it

@bordeauxred is working on this issue

The text was updated successfully, but these errors were encountered:

bordeauxred · 2024-02-21T10:17:41Z

@MischaPanch, concerning the purpose, the preprocess_fn was introduced in #42

MischaPanch · 2024-02-21T10:29:04Z

@Trinkle23897 what do you think, can we remove it? It does appear entirely unused, and whatever the purpose of #42, it can likely be solved in other ways (if it is ever needed)

MischaPanch · 2024-02-21T14:39:36Z

I now had a closer look at the discussion in #42 and came to believe more that the current architecture of preprocess_fn is not the best way to reach the goals stated there. Apart from that, the goals don't seem to be in demand by use cases.

I think we can go ahead and remove it

Trinkle23897 · 2024-02-25T18:04:53Z

My current thought is we should move roller to the lowest level (i.e., env has a roll() method that can create Rollout by itself, and can send to the buffer) to simplify implementation and have better throughput, but that's specific to RLHF experiment. I'm okay with your proposal.

MischaPanch · 2024-02-26T09:15:48Z

My current thought is we should move roller to the lowest level (i.e., env has a roll() method that can create Rollout by itself, and can send to the buffer) to simplify implementation and have better throughput, but that's specific to RLHF experiment. I'm okay with your proposal.

We could do that as next step, could you write an issue briefly explaining why it would improve throughput? Such a method would probably also help a lot in the n_episode version of Collector.collect (see #1042 )

Trinkle23897 · 2024-03-03T17:23:42Z

The main assumption Tianshou holds is that batch-style data transfer can reduce a lot of overhead, so we can improve GPU utilization by sending batch data and the overall system throughput. That's why the initial version of the collector is in batch style.

There are some constraints in front of this assumption:

We cannot sequentially send data to GPU to achieve the same throughput as batch-style easily
The model is relatively small, and it's not memory-bound
The Environment step function takes a small amount of time (including reward calculation), at least shorter than policy forward

These are very strong constraints. If either is not true, we can switch to full async rollout implementation to get better throughput, i.e., achieving shorter wall-clock collector.collect time. For example, in RLHF case:

LLM's completion function can be implemented in a fully-async style to achieve the same throughput as batch completion, as long as you provide enough thread/process to handle per request. That invalids (1) (2);
The environment needs a reward model to calculate rewards. If we do things in batch-style, we have to do all policy sampling first, sync, and do reward calculation. The system might be environment throughput bound by not investigating enough compute for reward. But if you can do policy/reward calculation in a fully async way, you can remove all bubbles. That invalids (3).

MischaPanch · 2024-03-04T14:42:12Z

@Trinkle23897 I agree with you, there's probably a bunch of performance related things that can be optimized in the collection. I will open a separate issue from your comment.

This issue however is just about refactoring the current collector to make it amenable to such improvements in the first place. The collect method is very convoluted, I don't feel comfortable reviewing any function improvements on it (like #1042, which started this whole story) before the code is more structured and readable. Removing unnecessary state from the collectors goes a long way in that direction, which is what this issue is about.

PS: for pypi, I sent you an email

Closes: #1058 ### Api Extensions - Batch received two new methods: `to_dict` and `to_list_of_dicts`. #1063 - `Collector`s can now be closed, and their reset is more granular. #1063 - Trainers can control whether collectors should be reset prior to training. #1063 - Convenience constructor for `CollectStats` called `with_autogenerated_stats`. #1063 ### Internal Improvements - `Collector`s rely less on state, the few stateful things are stored explicitly instead of through a `.data` attribute. #1063 - Introduced a first iteration of a naming convention for vars in `Collector`s. #1063 - Generally improved readability of Collector code and associated tests (still quite some way to go). #1063 - Improved typing for `exploration_noise` and within Collector. #1063 ### Breaking Changes - Removed `.data` attribute from `Collector` and its child classes. #1063 - Collectors no longer reset the environment on initialization. Instead, the user might have to call `reset` expicitly or pass `reset_before_collect=True` . #1063 - VectorEnvs now return an array of info-dicts on reset instead of a list. #1063 - Fixed `iter(Batch(...)` which now behaves the same way as `Batch(...).__iter__()`. Can be considered a bugfix. #1063 --------- Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>

Closes: thu-ml#1058 ### Api Extensions - Batch received two new methods: `to_dict` and `to_list_of_dicts`. thu-ml#1063 - `Collector`s can now be closed, and their reset is more granular. thu-ml#1063 - Trainers can control whether collectors should be reset prior to training. thu-ml#1063 - Convenience constructor for `CollectStats` called `with_autogenerated_stats`. thu-ml#1063 ### Internal Improvements - `Collector`s rely less on state, the few stateful things are stored explicitly instead of through a `.data` attribute. thu-ml#1063 - Introduced a first iteration of a naming convention for vars in `Collector`s. thu-ml#1063 - Generally improved readability of Collector code and associated tests (still quite some way to go). thu-ml#1063 - Improved typing for `exploration_noise` and within Collector. thu-ml#1063 ### Breaking Changes - Removed `.data` attribute from `Collector` and its child classes. thu-ml#1063 - Collectors no longer reset the environment on initialization. Instead, the user might have to call `reset` expicitly or pass `reset_before_collect=True` . thu-ml#1063 - VectorEnvs now return an array of info-dicts on reset instead of a list. thu-ml#1063 - Fixed `iter(Batch(...)` which now behaves the same way as `Batch(...).__iter__()`. Can be considered a bugfix. thu-ml#1063 --------- Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>

MischaPanch added the refactoring No change to functionality label Feb 20, 2024

MischaPanch assigned bordeauxred Feb 21, 2024

MischaPanch mentioned this issue Feb 21, 2024

Collector sampling with multiple environment does not seem to be unbiased with n_episodes #1042

Open

9 tasks

bordeauxred mentioned this issue Feb 23, 2024

Feat/refactor collector #1063

Merged

MischaPanch mentioned this issue Mar 4, 2024

Revisit and maybe optimize Collectors #1069

Open

MischaPanch self-assigned this Mar 26, 2024

MischaPanch closed this as completed in #1063 Mar 28, 2024

github-project-automation bot added this to Overall Tianshou Status Aug 28, 2024

github-project-automation bot moved this to Done in Overall Tianshou Status Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove data from state in Collector, and remove preprocess_fn there #1058

Remove data from state in Collector, and remove preprocess_fn there #1058

MischaPanch commented Feb 20, 2024 •

edited

Loading

bordeauxred commented Feb 21, 2024

MischaPanch commented Feb 21, 2024

MischaPanch commented Feb 21, 2024 •

edited

Loading

Trinkle23897 commented Feb 25, 2024 •

edited

Loading

MischaPanch commented Feb 26, 2024

Trinkle23897 commented Mar 3, 2024

MischaPanch commented Mar 4, 2024 •

edited

Loading

Remove data from state in Collector, and remove preprocess_fn there #1058

Remove data from state in Collector, and remove preprocess_fn there #1058

Comments

MischaPanch commented Feb 20, 2024 • edited Loading

bordeauxred commented Feb 21, 2024

MischaPanch commented Feb 21, 2024

MischaPanch commented Feb 21, 2024 • edited Loading

Trinkle23897 commented Feb 25, 2024 • edited Loading

MischaPanch commented Feb 26, 2024

Trinkle23897 commented Mar 3, 2024

MischaPanch commented Mar 4, 2024 • edited Loading

MischaPanch commented Feb 20, 2024 •

edited

Loading

MischaPanch commented Feb 21, 2024 •

edited

Loading

Trinkle23897 commented Feb 25, 2024 •

edited

Loading

MischaPanch commented Mar 4, 2024 •

edited

Loading