-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove data from state in Collector, and remove preprocess_fn there #1058
Comments
@MischaPanch, concerning the purpose, the preprocess_fn was introduced in #42 |
@Trinkle23897 what do you think, can we remove it? It does appear entirely unused, and whatever the purpose of #42, it can likely be solved in other ways (if it is ever needed) |
I now had a closer look at the discussion in #42 and came to believe more that the current architecture of preprocess_fn is not the best way to reach the goals stated there. Apart from that, the goals don't seem to be in demand by use cases. I think we can go ahead and remove it |
My current thought is we should move roller to the lowest level (i.e., env has a roll() method that can create Rollout by itself, and can send to the buffer) to simplify implementation and have better throughput, but that's specific to RLHF experiment. I'm okay with your proposal. |
We could do that as next step, could you write an issue briefly explaining why it would improve throughput? Such a method would probably also help a lot in the |
The main assumption Tianshou holds is that batch-style data transfer can reduce a lot of overhead, so we can improve GPU utilization by sending batch data and the overall system throughput. That's why the initial version of the collector is in batch style. There are some constraints in front of this assumption:
These are very strong constraints. If either is not true, we can switch to full async rollout implementation to get better throughput, i.e., achieving shorter wall-clock
|
@Trinkle23897 I agree with you, there's probably a bunch of performance related things that can be optimized in the collection. I will open a separate issue from your comment. This issue however is just about refactoring the current collector to make it amenable to such improvements in the first place. The PS: for pypi, I sent you an email |
Closes: #1058 ### Api Extensions - Batch received two new methods: `to_dict` and `to_list_of_dicts`. #1063 - `Collector`s can now be closed, and their reset is more granular. #1063 - Trainers can control whether collectors should be reset prior to training. #1063 - Convenience constructor for `CollectStats` called `with_autogenerated_stats`. #1063 ### Internal Improvements - `Collector`s rely less on state, the few stateful things are stored explicitly instead of through a `.data` attribute. #1063 - Introduced a first iteration of a naming convention for vars in `Collector`s. #1063 - Generally improved readability of Collector code and associated tests (still quite some way to go). #1063 - Improved typing for `exploration_noise` and within Collector. #1063 ### Breaking Changes - Removed `.data` attribute from `Collector` and its child classes. #1063 - Collectors no longer reset the environment on initialization. Instead, the user might have to call `reset` expicitly or pass `reset_before_collect=True` . #1063 - VectorEnvs now return an array of info-dicts on reset instead of a list. #1063 - Fixed `iter(Batch(...)` which now behaves the same way as `Batch(...).__iter__()`. Can be considered a bugfix. #1063 --------- Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
Closes: thu-ml#1058 ### Api Extensions - Batch received two new methods: `to_dict` and `to_list_of_dicts`. thu-ml#1063 - `Collector`s can now be closed, and their reset is more granular. thu-ml#1063 - Trainers can control whether collectors should be reset prior to training. thu-ml#1063 - Convenience constructor for `CollectStats` called `with_autogenerated_stats`. thu-ml#1063 ### Internal Improvements - `Collector`s rely less on state, the few stateful things are stored explicitly instead of through a `.data` attribute. thu-ml#1063 - Introduced a first iteration of a naming convention for vars in `Collector`s. thu-ml#1063 - Generally improved readability of Collector code and associated tests (still quite some way to go). thu-ml#1063 - Improved typing for `exploration_noise` and within Collector. thu-ml#1063 ### Breaking Changes - Removed `.data` attribute from `Collector` and its child classes. thu-ml#1063 - Collectors no longer reset the environment on initialization. Instead, the user might have to call `reset` expicitly or pass `reset_before_collect=True` . thu-ml#1063 - VectorEnvs now return an array of info-dicts on reset instead of a list. thu-ml#1063 - Fixed `iter(Batch(...)` which now behaves the same way as `Batch(...).__iter__()`. Can be considered a bugfix. thu-ml#1063 --------- Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
Closes: thu-ml#1058 ### Api Extensions - Batch received two new methods: `to_dict` and `to_list_of_dicts`. thu-ml#1063 - `Collector`s can now be closed, and their reset is more granular. thu-ml#1063 - Trainers can control whether collectors should be reset prior to training. thu-ml#1063 - Convenience constructor for `CollectStats` called `with_autogenerated_stats`. thu-ml#1063 ### Internal Improvements - `Collector`s rely less on state, the few stateful things are stored explicitly instead of through a `.data` attribute. thu-ml#1063 - Introduced a first iteration of a naming convention for vars in `Collector`s. thu-ml#1063 - Generally improved readability of Collector code and associated tests (still quite some way to go). thu-ml#1063 - Improved typing for `exploration_noise` and within Collector. thu-ml#1063 ### Breaking Changes - Removed `.data` attribute from `Collector` and its child classes. thu-ml#1063 - Collectors no longer reset the environment on initialization. Instead, the user might have to call `reset` expicitly or pass `reset_before_collect=True` . thu-ml#1063 - VectorEnvs now return an array of info-dicts on reset instead of a list. thu-ml#1063 - Fixed `iter(Batch(...)` which now behaves the same way as `Batch(...).__iter__()`. Can be considered a bugfix. thu-ml#1063 --------- Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
Closes: thu-ml#1058 ### Api Extensions - Batch received two new methods: `to_dict` and `to_list_of_dicts`. thu-ml#1063 - `Collector`s can now be closed, and their reset is more granular. thu-ml#1063 - Trainers can control whether collectors should be reset prior to training. thu-ml#1063 - Convenience constructor for `CollectStats` called `with_autogenerated_stats`. thu-ml#1063 ### Internal Improvements - `Collector`s rely less on state, the few stateful things are stored explicitly instead of through a `.data` attribute. thu-ml#1063 - Introduced a first iteration of a naming convention for vars in `Collector`s. thu-ml#1063 - Generally improved readability of Collector code and associated tests (still quite some way to go). thu-ml#1063 - Improved typing for `exploration_noise` and within Collector. thu-ml#1063 ### Breaking Changes - Removed `.data` attribute from `Collector` and its child classes. thu-ml#1063 - Collectors no longer reset the environment on initialization. Instead, the user might have to call `reset` expicitly or pass `reset_before_collect=True` . thu-ml#1063 - VectorEnvs now return an array of info-dicts on reset instead of a list. thu-ml#1063 - Fixed `iter(Batch(...)` which now behaves the same way as `Batch(...).__iter__()`. Can be considered a bugfix. thu-ml#1063 --------- Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
Closes: thu-ml#1058 ### Api Extensions - Batch received two new methods: `to_dict` and `to_list_of_dicts`. thu-ml#1063 - `Collector`s can now be closed, and their reset is more granular. thu-ml#1063 - Trainers can control whether collectors should be reset prior to training. thu-ml#1063 - Convenience constructor for `CollectStats` called `with_autogenerated_stats`. thu-ml#1063 ### Internal Improvements - `Collector`s rely less on state, the few stateful things are stored explicitly instead of through a `.data` attribute. thu-ml#1063 - Introduced a first iteration of a naming convention for vars in `Collector`s. thu-ml#1063 - Generally improved readability of Collector code and associated tests (still quite some way to go). thu-ml#1063 - Improved typing for `exploration_noise` and within Collector. thu-ml#1063 ### Breaking Changes - Removed `.data` attribute from `Collector` and its child classes. thu-ml#1063 - Collectors no longer reset the environment on initialization. Instead, the user might have to call `reset` expicitly or pass `reset_before_collect=True` . thu-ml#1063 - VectorEnvs now return an array of info-dicts on reset instead of a list. thu-ml#1063 - Fixed `iter(Batch(...)` which now behaves the same way as `Batch(...).__iter__()`. Can be considered a bugfix. thu-ml#1063 --------- Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
The Collector tracks the
self.data
field that is aRolloutBatch
. However, rather confusingly, this value of this field is recreated and updated at every call tocollect
. This leads to a lot of methods mutating the state, and a less comprehensible and debuggable code.There is no need to bind this to the lifetime of the Collector, instead it should be created and used as a var in
collect
.Moreover, there is the possibility to pass
preprocess_fn
, whichIt seems like the original purpose of that was to enable specific logging needs, but these needs can be met by different means.
The
preprocess_fn
inCollector
should be removedThis issue blocks #1046 since we need to reduce the complexity of the Collector before doing anything further with it
@bordeauxred is working on this issue
The text was updated successfully, but these errors were encountered: