Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove data from state in Collector, and remove preprocess_fn there #1058

Closed
MischaPanch opened this issue Feb 20, 2024 · 7 comments · Fixed by #1063
Closed

Remove data from state in Collector, and remove preprocess_fn there #1058

MischaPanch opened this issue Feb 20, 2024 · 7 comments · Fixed by #1063
Assignees
Labels
refactoring No change to functionality

Comments

@MischaPanch
Copy link
Collaborator

MischaPanch commented Feb 20, 2024

The Collector tracks the self.data field that is a RolloutBatch. However, rather confusingly, this value of this field is recreated and updated at every call to collect. This leads to a lot of methods mutating the state, and a less comprehensible and debuggable code.

There is no need to bind this to the lifetime of the Collector, instead it should be created and used as a var in collect.

Moreover, there is the possibility to pass preprocess_fn, which

  1. doesn't have a clear interfaces
  2. doesn't have a clear purpose
  3. Is only used in a test and nowhere else
    It seems like the original purpose of that was to enable specific logging needs, but these needs can be met by different means.
  4. Significally complicates the Collector

The preprocess_fn in Collector should be removed

This issue blocks #1046 since we need to reduce the complexity of the Collector before doing anything further with it

@bordeauxred is working on this issue

@MischaPanch MischaPanch added the refactoring No change to functionality label Feb 20, 2024
@bordeauxred
Copy link
Contributor

@MischaPanch, concerning the purpose, the preprocess_fn was introduced in #42

@MischaPanch
Copy link
Collaborator Author

@Trinkle23897 what do you think, can we remove it? It does appear entirely unused, and whatever the purpose of #42, it can likely be solved in other ways (if it is ever needed)

@MischaPanch
Copy link
Collaborator Author

MischaPanch commented Feb 21, 2024

I now had a closer look at the discussion in #42 and came to believe more that the current architecture of preprocess_fn is not the best way to reach the goals stated there. Apart from that, the goals don't seem to be in demand by use cases.

I think we can go ahead and remove it

@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Feb 25, 2024

My current thought is we should move roller to the lowest level (i.e., env has a roll() method that can create Rollout by itself, and can send to the buffer) to simplify implementation and have better throughput, but that's specific to RLHF experiment. I'm okay with your proposal.

@MischaPanch
Copy link
Collaborator Author

My current thought is we should move roller to the lowest level (i.e., env has a roll() method that can create Rollout by itself, and can send to the buffer) to simplify implementation and have better throughput, but that's specific to RLHF experiment. I'm okay with your proposal.

We could do that as next step, could you write an issue briefly explaining why it would improve throughput? Such a method would probably also help a lot in the n_episode version of Collector.collect (see #1042 )

@Trinkle23897
Copy link
Collaborator

The main assumption Tianshou holds is that batch-style data transfer can reduce a lot of overhead, so we can improve GPU utilization by sending batch data and the overall system throughput. That's why the initial version of the collector is in batch style.

There are some constraints in front of this assumption:

  1. We cannot sequentially send data to GPU to achieve the same throughput as batch-style easily
  2. The model is relatively small, and it's not memory-bound
  3. The Environment step function takes a small amount of time (including reward calculation), at least shorter than policy forward

These are very strong constraints. If either is not true, we can switch to full async rollout implementation to get better throughput, i.e., achieving shorter wall-clock collector.collect time. For example, in RLHF case:

  • LLM's completion function can be implemented in a fully-async style to achieve the same throughput as batch completion, as long as you provide enough thread/process to handle per request. That invalids (1) (2);
  • The environment needs a reward model to calculate rewards. If we do things in batch-style, we have to do all policy sampling first, sync, and do reward calculation. The system might be environment throughput bound by not investigating enough compute for reward. But if you can do policy/reward calculation in a fully async way, you can remove all bubbles. That invalids (3).

@MischaPanch
Copy link
Collaborator Author

MischaPanch commented Mar 4, 2024

@Trinkle23897 I agree with you, there's probably a bunch of performance related things that can be optimized in the collection. I will open a separate issue from your comment.

This issue however is just about refactoring the current collector to make it amenable to such improvements in the first place. The collect method is very convoluted, I don't feel comfortable reviewing any function improvements on it (like #1042, which started this whole story) before the code is more structured and readable. Removing unnecessary state from the collectors goes a long way in that direction, which is what this issue is about.

PS: for pypi, I sent you an email

@MischaPanch MischaPanch self-assigned this Mar 26, 2024
MischaPanch pushed a commit that referenced this issue Mar 28, 2024
Closes: #1058 

### Api Extensions
- Batch received two new methods: `to_dict` and `to_list_of_dicts`.
#1063
- `Collector`s can now be closed, and their reset is more granular.
#1063
- Trainers can control whether collectors should be reset prior to
training. #1063
- Convenience constructor for `CollectStats` called
`with_autogenerated_stats`. #1063

### Internal Improvements
- `Collector`s rely less on state, the few stateful things are stored
explicitly instead of through a `.data` attribute. #1063
- Introduced a first iteration of a naming convention for vars in
`Collector`s. #1063
- Generally improved readability of Collector code and associated tests
(still quite some way to go). #1063
- Improved typing for `exploration_noise` and within Collector. #1063

### Breaking Changes

- Removed `.data` attribute from `Collector` and its child classes.
#1063
- Collectors no longer reset the environment on initialization. Instead,
the user might have to call `reset`
expicitly or pass `reset_before_collect=True` . #1063
- VectorEnvs now return an array of info-dicts on reset instead of a
list. #1063
- Fixed `iter(Batch(...)` which now behaves the same way as
`Batch(...).__iter__()`. Can be considered a bugfix. #1063

---------

Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
ZhengLi1314 pushed a commit to ZhengLi1314/tianshou_0.5.1 that referenced this issue Apr 15, 2024
Closes: thu-ml#1058 

### Api Extensions
- Batch received two new methods: `to_dict` and `to_list_of_dicts`.
thu-ml#1063
- `Collector`s can now be closed, and their reset is more granular.
thu-ml#1063
- Trainers can control whether collectors should be reset prior to
training. thu-ml#1063
- Convenience constructor for `CollectStats` called
`with_autogenerated_stats`. thu-ml#1063

### Internal Improvements
- `Collector`s rely less on state, the few stateful things are stored
explicitly instead of through a `.data` attribute. thu-ml#1063
- Introduced a first iteration of a naming convention for vars in
`Collector`s. thu-ml#1063
- Generally improved readability of Collector code and associated tests
(still quite some way to go). thu-ml#1063
- Improved typing for `exploration_noise` and within Collector. thu-ml#1063

### Breaking Changes

- Removed `.data` attribute from `Collector` and its child classes.
thu-ml#1063
- Collectors no longer reset the environment on initialization. Instead,
the user might have to call `reset`
expicitly or pass `reset_before_collect=True` . thu-ml#1063
- VectorEnvs now return an array of info-dicts on reset instead of a
list. thu-ml#1063
- Fixed `iter(Batch(...)` which now behaves the same way as
`Batch(...).__iter__()`. Can be considered a bugfix. thu-ml#1063

---------

Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
ZhengLi1314 pushed a commit to ZhengLi1314/tianshou_0.5.1 that referenced this issue Apr 15, 2024
Closes: thu-ml#1058

### Api Extensions
- Batch received two new methods: `to_dict` and `to_list_of_dicts`.
thu-ml#1063
- `Collector`s can now be closed, and their reset is more granular.
thu-ml#1063
- Trainers can control whether collectors should be reset prior to
training. thu-ml#1063
- Convenience constructor for `CollectStats` called
`with_autogenerated_stats`. thu-ml#1063

### Internal Improvements
- `Collector`s rely less on state, the few stateful things are stored
explicitly instead of through a `.data` attribute. thu-ml#1063
- Introduced a first iteration of a naming convention for vars in
`Collector`s. thu-ml#1063
- Generally improved readability of Collector code and associated tests
(still quite some way to go). thu-ml#1063
- Improved typing for `exploration_noise` and within Collector. thu-ml#1063

### Breaking Changes

- Removed `.data` attribute from `Collector` and its child classes.
thu-ml#1063
- Collectors no longer reset the environment on initialization. Instead,
the user might have to call `reset`
expicitly or pass `reset_before_collect=True` . thu-ml#1063
- VectorEnvs now return an array of info-dicts on reset instead of a
list. thu-ml#1063
- Fixed `iter(Batch(...)` which now behaves the same way as
`Batch(...).__iter__()`. Can be considered a bugfix. thu-ml#1063

---------

Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
ZhengLi1314 pushed a commit to ZhengLi1314/tianshou_0.5.1 that referenced this issue Apr 15, 2024
Closes: thu-ml#1058

### Api Extensions
- Batch received two new methods: `to_dict` and `to_list_of_dicts`.
thu-ml#1063
- `Collector`s can now be closed, and their reset is more granular.
thu-ml#1063
- Trainers can control whether collectors should be reset prior to
training. thu-ml#1063
- Convenience constructor for `CollectStats` called
`with_autogenerated_stats`. thu-ml#1063

### Internal Improvements
- `Collector`s rely less on state, the few stateful things are stored
explicitly instead of through a `.data` attribute. thu-ml#1063
- Introduced a first iteration of a naming convention for vars in
`Collector`s. thu-ml#1063
- Generally improved readability of Collector code and associated tests
(still quite some way to go). thu-ml#1063
- Improved typing for `exploration_noise` and within Collector. thu-ml#1063

### Breaking Changes

- Removed `.data` attribute from `Collector` and its child classes.
thu-ml#1063
- Collectors no longer reset the environment on initialization. Instead,
the user might have to call `reset`
expicitly or pass `reset_before_collect=True` . thu-ml#1063
- VectorEnvs now return an array of info-dicts on reset instead of a
list. thu-ml#1063
- Fixed `iter(Batch(...)` which now behaves the same way as
`Batch(...).__iter__()`. Can be considered a bugfix. thu-ml#1063

---------

Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
ZhengLi1314 pushed a commit to ZhengLi1314/tianshou_0.5.1 that referenced this issue Apr 15, 2024
Closes: thu-ml#1058 

### Api Extensions
- Batch received two new methods: `to_dict` and `to_list_of_dicts`.
thu-ml#1063
- `Collector`s can now be closed, and their reset is more granular.
thu-ml#1063
- Trainers can control whether collectors should be reset prior to
training. thu-ml#1063
- Convenience constructor for `CollectStats` called
`with_autogenerated_stats`. thu-ml#1063

### Internal Improvements
- `Collector`s rely less on state, the few stateful things are stored
explicitly instead of through a `.data` attribute. thu-ml#1063
- Introduced a first iteration of a naming convention for vars in
`Collector`s. thu-ml#1063
- Generally improved readability of Collector code and associated tests
(still quite some way to go). thu-ml#1063
- Improved typing for `exploration_noise` and within Collector. thu-ml#1063

### Breaking Changes

- Removed `.data` attribute from `Collector` and its child classes.
thu-ml#1063
- Collectors no longer reset the environment on initialization. Instead,
the user might have to call `reset`
expicitly or pass `reset_before_collect=True` . thu-ml#1063
- VectorEnvs now return an array of info-dicts on reset instead of a
list. thu-ml#1063
- Fixed `iter(Batch(...)` which now behaves the same way as
`Batch(...).__iter__()`. Can be considered a bugfix. thu-ml#1063

---------

Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring No change to functionality
Projects
Development

Successfully merging a pull request may close this issue.

3 participants