-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High-Level API #970
High-Level API #970
Conversation
So far only for one script (mujoco_ppo_cfg), extension will follow Conflicts: examples/mujoco/mujoco_env.py examples/mujoco/mujoco_ppo.py setup.py
of control flow paths for brevity (regarding return statements)
@Trinkle23897 this is a major extension of tianshou, so it will require a thorough review from your side before being merged. As you see, @opcode81 was very thorough in documenting the many aspects and perspectives on these new features. We already did some internal rounds of reviews and discussions on this, but will continue doing so next week, when I have more time. As promised, the changes are a pure addition (so @spacegoing @nuance1979 it's not creating additional engineering complexity for powerusers - quite the contrary). Together with this PR, the wiki mentioned above should be migrated to this repository and the documentation (both readme and on readthedocs) should be extended. We believe that the new high-level interfaces will provide the main way for the majority of users to interact with tianshou. I will ping you once we are done with internal reviews, but of course, if you already want to start a discussion - feel free :) |
One thing to consider, is whether the sensAI project (also mainly done by @opcode81, and actively developed and used at appliedAI) should eventually become a dependency of tianshou. In terms of transitive dependencies, it would only add scikit-learn, if I'm not mistaken. sensAI is a high level library for general non-RL AI things, and over the years we developed several utilities that are useful in other contexts - in particular in tianshou. For now, however, these utilities were simply copied over. So it's a question for the future (though for the near future). sensAI was never marketed, and we haven't written sufficient documentation for it, but this is about to change, as appliedAI is interested in making it available to broader audience. TL;DR: we borrowed sufficiently many utilities from another OSS project of ours, that it might be worth thinking about including it as dependency, instead of copy-pasting stuff |
Policy objects are now parametrised by converting the parameter dataclass instances to kwargs, using some injectable conversions along the way
* Created mixins for agent factories to reduce code duplication * Further factorised params & mixins for experiment factories * Additional parameter abstractions * Implement high-level MuJoCo TD3 example
because it disallows, in particular, class docstrings that consist only of a summary line
achieving greater separation of concerns and improved maintainability
where a persistable configuration object is passed as an argument, as this can help to ensure persistability (making the requirement explicit)
* Use prefix convention (subclasses have superclass names as prefix) to facilitate discoverability of relevant classes via IDE autocompletion * Use dual naming, adding an alternative concise name that omits the precise OO semantics and retains only the essential part of the name (which can be more pleasing to users not accustomed to convoluted OO naming)
composing the list dynamically instead
* Add common based class for A2C and PPO agent factories * Add default for dist_fn parameter, adding corresponding factories * Add example mujoco_a2c_hl
* Refactored module `module` (split into submodules) * Basic support for discrete environments * Implement Atari env. factory * Implement DQN-based actor factory * Implement notion of reusing agent preprocessing network for critic * Add example atari_ppo_hl
ffc04ae
to
86cca8f
Compare
which stores the entire policy (new default), supporting applications where it is desired to be bale to load the policy without having to instantiate an environment or recreate a corresponding policy object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Trinkle23897 For me this looks good, there are some unresolved discussions about minor things above. Pls have a look when you find time
I changed the merge strategy to no longer squash but instead to add all commits and a merge commit
Sorry I just came back from vacation, will take a look in this weekend |
I don't think that's a good idea. A PR usually goes through many rounds of revisions and there is no point keeping those history in the main repo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only checked some conversations and did a quick skim, will do a full review later
I still recommend using squash instead of merge, to avoid screwing up the master branch and keep a linear history
@@ -5,7 +5,16 @@ | |||
import torch | |||
from torch import nn | |||
|
|||
from tianshou.utils.net.discrete import NoisyLinear | |||
from tianshou.highlevel.env import Environments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah sgtm, maybe in env/atari.py or utils/whatever.py?
@nuance1979, actually, there are some very good reasons to keep the original history. I mention some of them here. There is no inherent value in a linear history, yet there is value
Squashing is more reasonable for smaller PRs, but this one adds thousands of lines of code. |
FYI I will be on holiday for the next 10 days. I will address your further comments when I return. |
I understand your points but would still argue for squashing because:
Anyway, just my personal opinions. |
@@ -5,7 +5,16 @@ | |||
import torch | |||
from torch import nn | |||
|
|||
from tianshou.utils.net.discrete import NoisyLinear | |||
from tianshou.highlevel.env import Environments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you create an issue?
@@ -37,3 +43,46 @@ def make_mujoco_env(task, seed, training_num, test_num, obs_norm): | |||
test_envs = VectorEnvNormObs(test_envs, update_obs_rms=False) | |||
test_envs.set_obs_rms(train_envs.get_obs_rms()) | |||
return env, train_envs, test_envs | |||
|
|||
|
|||
class MujocoEnvObsRmsPersistence(Persistence): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could it be a parameter for ContinuousEnvironments
?
Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll give @MischaPanch the final call
This reverts commit fdb0eba.
4fb3e09
to
dae4000
Compare
I would merge this now, thanks to everybody for the vivid discussion and contributions! I feel like we're on a great path forward here :) I'm merging without squash because:
|
If no objections come within 30 mins, I'm merging. @Trinkle23897 would you be ok to add @opcode81 as maintainer? We will continue working on tianshou together, and from this PR I think you can see that he's a very responsible contributor. Neither of us plans to take away control from you, but it would be more convenient if Dominik could help me with reviewing issues, PRs, making wiki entries, and so on. |
yep, sg |
make format
(required)make commit-checks
(required)This PR closes #938. It introduces all the fundamental concepts and abstractions, and it already covers the majority of the algorithms. It is not a complete and finalised product, however, and we recommend that the high-level API remain in alpha stadium for some time, as already suggested in the issue.
The changes in this PR are described on a wiki page, a copy of which is provided below. (The original page is perhaps more readable, because it does not render line breaks verbatim.)
Introducing the Tianshou High-Level API
The new high-level library was created based on object-oriented design
principles with two primary design goals:
ease of use for the end user (without sacrificing generality)
This is achieved through:
a single, well-defined point of interaction (
ExperimentBuilder
)which uses declarative semantics, allowing the user to focus on
what to do rather than how to do it.
easily injectible parametrisation.
For complex parametrisation involving objects, the respective
library classes are easily discoverable, keeping the need to
browse reference documentation - or, even worse, inspect code or class
hierarchies - to an absolute minimium.
reduced points of failure.
Because the high-level API is at a higher level of abstraction, where
more knowledge is available, we can centrally define reasonable
defaults and apply consistency checks in order to ensure that
illegal configurations result in meaningful errors (and are completely
avoided as long as the users does not modify default behaviour).
For example, we can consider interactions between the nature of the
action space and the neural networks being used.
maintainability for developers
This is achieved through:
partly through the use of mixins and multiple inheritance.
This invariably makes the code slightly more complex, yet it greatly
reduces the lines of code to be written/updated, so it is a reasonable
compromise in this case.
Changeset
The entire high-level library is in its own subpackage
tianshou.highlevel
and almost no changes were made to the original library in order to
support the new APIs.
For the most part, only typing-related changes were made, which have
aligned type annotations with existing example applications or have made
explicit interfaces that were previously implicit.
Furthermore, some helper modules were added to the the
tianshou.util
package(all of which were copied from the sensAI library).
Many example applications were added, based on the existing MuJoCo and Atari
examples (see below).
User-Facing Interface
User Experience Example
To illustrate the UX, consider this video recording (IntelliJ IDEA):
Observe how conveniently relevant classes can be discovered via the IDE's
auto-completion function.
Discoverability is markedly enhanced by using a prefix-based naming convention,
where classes that can be used as parameters use the base class name as a prefix,
allowing all potentially relevant subclasses to be straightforwardly
auto-completed.
Declarative Semantics
A key design principle for the user-facing interface was to achieve
declarative semantics, where the user
is no longer concerned with generating a lengthy procedure that sequentially
constructs components that build upon each other.
Instead, the user focuses purely on
declaring the properties of the learning task he would like to run.
code is defining essential, experiment-specific configuration.
and detect/avoid misspecification.
In order to enable the configuration of interdependent objects without
requiring the user to instantiate the respective objects sequentially, we
heavily employ the factory pattern.
Experiment Builders
The end user's primary entry point is an
ExperimentBuilder
, which isspecialised for each algorithm.
As the name suggests, it uses the builder pattern in order to create
an
Experiment
object, which is then used to run the learning task.configuration, particularly the environment factory.
via an algorithm-specific parameter object.
For instance,
PPOExperimentBuilder
has the methodwith_ppo_params
,which expects an object of type
PPOParams
.(e.g. were multiple specification variants exist) are handled via
dedicated builder methods.
For example, for the specification of the critic component in an
actor-critic algorithm, the following group of functions is provided:
with_critic_factory
(where the user can provide any (user-defined)factory for the critic component)
with_critic_factory_default
(with which the user specifies thatthe default,
Net
-based critic architecture shall be used and has theoption to parametrise it)
with_critic_factory_use_actor
(with which the user indicates that thecritic component shall reuse the preprocessing network from the actor
component)
Examples
Minimal Example
In the simplest of cases, where the user wants to use the default
parametrisation for everything, a user could run a PPO learning task
as follows,
where
MyEnvFactory
is a factory for the agent's environment.The default behaviour will adapt depending on whether the factory
creates environments with discrete or continuous action spaces.
Fully Parametrised MuJoCo Example
Importantly, the user still has the option to configure all the details.
Consider this example, which is from the high-level version of the
mujoco_ppo
example:This is functionally equivalent to the procedural, low-level example.
Compare the scripts here:
In general, find example applications of the high-level API in the
examples/
folder in scripts using the
_hl.py
suffix:Experiments
The
Experiment
representation containsAn exeriment may be run several times, assigning a name (and corresponding
storage location) to each run.
Persistence and Logging
Experiments can be serialized and later be reloaded.
Because the experiment representation is composed purely of configuration
and factories, which themselves are composed purely of configuration and
factories, persisted objects are compact and do not contain state.
Every experiment run produces the following artifacts:
handlers are modular
Running a reloaded experiment can optionally resume training of the serialized
policy.
All relevant objects have meaningful string representations that can appear
in logs, which is conveniently achieved through the use of
ToStringMixin
(from sensAI).Its use furthermore prevents string representations of recurring objects
from being printed more than once.
For example, consider this string representation, which was generated for
the fully parametrised PPO experiment from the example above:
Library Developer Perspective
The presentation thus far has focussed on the user's perspective.
From the perspective of a Tianshou developer, it is important that the
high-level API be clearly structured and maintainable.
Here are the most relevant representations:
Policy parameters are represented as dataclasses (base class
Params
).The goal is for the parameters to be ultimately passed to the corresponding
policy class (e.g.
PPOParams
contains parameters forPPOPolicy
).Parameter transformation:
In part, the parameter dataclass attributes already correspond directly to
policy class parameters.
However, because the high-level interface must, in many cases, abstract away
from the low-level interface,
we establish the notion of a
ParamTransformer
, which transformsone or more parameters into the form that is required by the policy class:
The idea is that the dictionary representation of the dataclass is
successively transformed via
ParamTransformer
s such that the resultingdictionary can ultimately be used as keyword arguments for the policy.
To achieve maintainability, the declaration of parameter transformations
is colocated with the parameters they affect.
Tests ensure that naming issues are detected.
Composition and inheritance:
We use inheritance and mixins to reduce duplication.
Factories are an essential principle of the library.
Because the creation of objects may depend on objects that are not
yet created, a declarative approach necessitates that we transition from
the objects themselves to factories.
EnvFactory
was already mentioned above, as it is a user-facingabstraction.
Its purpose is to create the (vectorized)
Environments
that will beused in the experiments.
AgentFactory
is the central component that creates the policy,the trainer as well as the necessary collectors.
To support a new type of policy, a subclass that handles the policy
creation is required.
In turn, the main task when implementing a new algorithm-specific
ExperimentBuilder
is the creation of the correspondingAgentFactory
.processes, e.g.
OptimizerFactory
for the creation of torch optimizersActorFactory
for the creation of actor modelsCriticFactory
for the creation of critic modelsIntermediateModuleFactory
for the creation of models that produceintermediate/latent representations
EnvParamFactory
for the creation of parameters based on propertiesof the environment
NoiseFactory
for the creation ofBaseNoise
instancesDistributionFunctionFactory
for the creation of functions thatcreate torch distributions from tensors
LRSchedulerFactory
for learning rate schedulersPolicyWrapperFactory
for policy wrappers that extend thefunctionality of the regular policy (e.g. intrinsic curiosity)
AutoAlphaFactory
for automatically tuned regularizationcoefficients (as supported by SAC or REDQ)
LoggerFactory
handles the creation of the experiment logger,but the default implementation already handles the cases that were
used in the examples.
The
ExperimentBuilder
implementations make use of mixins to add commonfunctionality. As mentioned above, the main task in an algorithm-specific
specialization is to create the
AgentFactory
.