Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High-Level API #970

Merged
merged 100 commits into from
Nov 8, 2023
Merged

High-Level API #970

merged 100 commits into from
Nov 8, 2023

Conversation

opcode81
Copy link
Collaborator

@opcode81 opcode81 commented Oct 17, 2023

  • I have marked all applicable categories:
    • exception-raising fix
    • algorithm implementation fix
    • documentation modification
    • new feature
  • I have reformatted the code using make format (required)
  • I have checked the code using make commit-checks (required)
  • If applicable, I have mentioned the relevant/related issue(s)
  • If applicable, I have listed every items in this Pull Request below

This PR closes #938. It introduces all the fundamental concepts and abstractions, and it already covers the majority of the algorithms. It is not a complete and finalised product, however, and we recommend that the high-level API remain in alpha stadium for some time, as already suggested in the issue.

The changes in this PR are described on a wiki page, a copy of which is provided below. (The original page is perhaps more readable, because it does not render line breaks verbatim.)

Introducing the Tianshou High-Level API

The new high-level library was created based on object-oriented design
principles with two primary design goals:

  • ease of use for the end user (without sacrificing generality)

    This is achieved through:

    • a single, well-defined point of interaction (ExperimentBuilder)
      which uses declarative semantics, allowing the user to focus on
      what to do rather than how to do it.

    • easily injectible parametrisation.

      For complex parametrisation involving objects, the respective
      library classes are easily discoverable, keeping the need to
      browse reference documentation - or, even worse, inspect code or class
      hierarchies - to an absolute minimium.

    • reduced points of failure.

      Because the high-level API is at a higher level of abstraction, where
      more knowledge is available, we can centrally define reasonable
      defaults and apply consistency checks in order to ensure that
      illegal configurations result in meaningful errors (and are completely
      avoided as long as the users does not modify default behaviour).
      For example, we can consider interactions between the nature of the
      action space and the neural networks being used.

  • maintainability for developers

    This is achieved through:

    • a modular design with strong separation of concerns
    • a high level of factorisation, which largely avoids duplication,
      partly through the use of mixins and multiple inheritance.
      This invariably makes the code slightly more complex, yet it greatly
      reduces the lines of code to be written/updated, so it is a reasonable
      compromise in this case.

Changeset

The entire high-level library is in its own subpackage tianshou.highlevel
and almost no changes were made to the original library in order to
support the new APIs.
For the most part, only typing-related changes were made, which have
aligned type annotations with existing example applications or have made
explicit interfaces that were previously implicit.

Furthermore, some helper modules were added to the the tianshou.util package
(all of which were copied from the sensAI library).

Many example applications were added, based on the existing MuJoCo and Atari
examples (see below).

User-Facing Interface

User Experience Example

To illustrate the UX, consider this video recording (IntelliJ IDEA):

UX

Observe how conveniently relevant classes can be discovered via the IDE's
auto-completion function.
Discoverability is markedly enhanced by using a prefix-based naming convention,
where classes that can be used as parameters use the base class name as a prefix,
allowing all potentially relevant subclasses to be straightforwardly
auto-completed.

Declarative Semantics

A key design principle for the user-facing interface was to achieve
declarative semantics, where the user
is no longer concerned with generating a lengthy procedure that sequentially
constructs components that build upon each other.
Instead, the user focuses purely on
declaring the properties of the learning task he would like to run.

  • This essentially reduces boiler-plate code to zero, as every part of the
    code is defining essential, experiment-specific configuration.
  • This makes it possible to centrally handle interdependent configuration
    and detect/avoid misspecification.

In order to enable the configuration of interdependent objects without
requiring the user to instantiate the respective objects sequentially, we
heavily employ the factory pattern.

Experiment Builders

The end user's primary entry point is an ExperimentBuilder, which is
specialised for each algorithm.
As the name suggests, it uses the builder pattern in order to create
an Experiment object, which is then used to run the learning task.

  • At builder construction, the user is required to provide only essential
    configuration, particularly the environment factory.
  • The bulk of the algorithm-specific parameters can be provided
    via an algorithm-specific parameter object.
    For instance, PPOExperimentBuilder has the method with_ppo_params,
    which expects an object of type PPOParams.
  • Parametrisation that requires the provision of more complex interfaces
    (e.g. were multiple specification variants exist) are handled via
    dedicated builder methods.
    For example, for the specification of the critic component in an
    actor-critic algorithm, the following group of functions is provided:
    • with_critic_factory (where the user can provide any (user-defined)
      factory for the critic component)
    • with_critic_factory_default (with which the user specifies that
      the default, Net-based critic architecture shall be used and has the
      option to parametrise it)
    • with_critic_factory_use_actor (with which the user indicates that the
      critic component shall reuse the preprocessing network from the actor
      component)

Examples

Minimal Example

In the simplest of cases, where the user wants to use the default
parametrisation for everything, a user could run a PPO learning task
as follows,

experiment = PPOExperimentBuilder(MyEnvFactory()).build()
experiment.run()

where MyEnvFactory is a factory for the agent's environment.
The default behaviour will adapt depending on whether the factory
creates environments with discrete or continuous action spaces.

Fully Parametrised MuJoCo Example

Importantly, the user still has the option to configure all the details.
Consider this example, which is from the high-level version of the
mujoco_ppo example:

log_name = os.path.join(task, "ppo", str(experiment_config.seed), datetime_tag())

sampling_config = SamplingConfig(
    num_epochs=epoch,
    step_per_epoch=step_per_epoch,
    batch_size=batch_size,
    num_train_envs=training_num,
    num_test_envs=test_num,
    buffer_size=buffer_size,
    step_per_collect=step_per_collect,
    repeat_per_collect=repeat_per_collect,
)

env_factory = MujocoEnvFactory(task, experiment_config.seed, obs_norm=True)

experiment = (
    PPOExperimentBuilder(env_factory, experiment_config, sampling_config)
    .with_ppo_params(
        PPOParams(
            discount_factor=gamma,
            gae_lambda=gae_lambda,
            action_bound_method=bound_action_method,
            reward_normalization=rew_norm,
            ent_coef=ent_coef,
            vf_coef=vf_coef,
            max_grad_norm=max_grad_norm,
            value_clip=value_clip,
            advantage_normalization=norm_adv,
            eps_clip=eps_clip,
            dual_clip=dual_clip,
            recompute_advantage=recompute_adv,
            lr=lr,
            lr_scheduler_factory=LRSchedulerFactoryLinear(sampling_config)
            if lr_decay
            else None,
            dist_fn=DistributionFunctionFactoryIndependentGaussians(),
        ),
    )
    .with_actor_factory_default(hidden_sizes, torch.nn.Tanh, continuous_unbounded=True)
    .with_critic_factory_default(hidden_sizes, torch.nn.Tanh)
    .build()
)
experiment.run(log_name)

This is functionally equivalent to the procedural, low-level example.
Compare the scripts here:

In general, find example applications of the high-level API in the examples/
folder in scripts using the _hl.py suffix:

Experiments

The Experiment representation contains

  • the agent factory ,
  • the environment factory,
  • further definitions pertaining to storage & logging.

An exeriment may be run several times, assigning a name (and corresponding
storage location) to each run.

Persistence and Logging

Experiments can be serialized and later be reloaded.

    experiment = Experiment.from_directory("log/my_experiment")

Because the experiment representation is composed purely of configuration
and factories, which themselves are composed purely of configuration and
factories, persisted objects are compact and do not contain state.

Every experiment run produces the following artifacts:

  • the serialized experiment
  • the serialized best policy found during training
  • a log file
  • (optionally) user-defined data, as the persistence
    handlers are modular

Running a reloaded experiment can optionally resume training of the serialized
policy.

All relevant objects have meaningful string representations that can appear
in logs, which is conveniently achieved through the use of
ToStringMixin (from sensAI).
Its use furthermore prevents string representations of recurring objects
from being printed more than once.
For example, consider this string representation, which was generated for
the fully parametrised PPO experiment from the example above:

Experiment[
    config=ExperimentConfig(
        seed=42, 
        device='cuda', 
        policy_restore_directory=None, 
        train=True, 
        watch=True, 
        watch_render=0.0, 
        persistence_base_dir='log', 
        persistence_enabled=True), 
    sampling_config=SamplingConfig[
        num_epochs=100, 
        step_per_epoch=30000, 
        batch_size=64, 
        num_train_envs=64, 
        num_test_envs=10, 
        buffer_size=4096, 
        step_per_collect=2048, 
        repeat_per_collect=10, 
        update_per_step=1.0, 
        start_timesteps=0, 
        start_timesteps_random=False, 
        replay_buffer_ignore_obs_next=False, 
        replay_buffer_save_only_last_obs=False, 
        replay_buffer_stack_num=1], 
    env_factory=MujocoEnvFactory[
        task=Ant-v4, 
        seed=42, 
        obs_norm=True], 
    agent_factory=PPOAgentFactory[
        sampling_config=SamplingConfig[<<], 
        optim_factory=OptimizerFactoryAdam[
            weight_decay=0, 
            eps=1e-08, 
            betas=(0.9, 0.999)], 
        policy_wrapper_factory=None, 
        trainer_callbacks=TrainerCallbacks(
            epoch_callback_train=None, 
            epoch_callback_test=None, 
            stop_callback=None), 
        params=PPOParams[
            gae_lambda=0.95, 
            max_batchsize=256, 
            lr=0.0003, 
            lr_scheduler_factory=LRSchedulerFactoryLinear[sampling_config=SamplingConfig[<<]], 
            action_scaling=default, 
            action_bound_method=clip, 
            discount_factor=0.99, 
            reward_normalization=True, 
            deterministic_eval=False, 
            dist_fn=DistributionFunctionFactoryIndependentGaussians[], 
            vf_coef=0.25, 
            ent_coef=0.0, 
            max_grad_norm=0.5, 
            eps_clip=0.2, 
            dual_clip=None, 
            value_clip=False, 
            advantage_normalization=False, 
            recompute_advantage=True], 
        actor_factory=ActorFactoryTransientStorageDecorator[
            actor_factory=ActorFactoryDefault[
                continuous_actor_type=ContinuousActorType.GAUSSIAN, 
                continuous_unbounded=True, 
                continuous_conditioned_sigma=False, 
                hidden_sizes=[64, 64], 
                hidden_activation=<class 'torch.nn.modules.activation.Tanh'>, 
                discrete_softmax=True]], 
        critic_factory=CriticFactoryDefault[
            hidden_sizes=[64, 64], 
            hidden_activation=<class 'torch.nn.modules.activation.Tanh'>], 
        critic_use_action=False], 
    logger_factory=LoggerFactoryDefault[
        logger_type=tensorboard, 
        wandb_project=None], 
    env_config=None]

Library Developer Perspective

The presentation thus far has focussed on the user's perspective.
From the perspective of a Tianshou developer, it is important that the
high-level API be clearly structured and maintainable.
Here are the most relevant representations:

  • Policy parameters are represented as dataclasses (base class Params).

    The goal is for the parameters to be ultimately passed to the corresponding
    policy class (e.g. PPOParams contains parameters for PPOPolicy).

    • Parameter transformation:
      In part, the parameter dataclass attributes already correspond directly to
      policy class parameters.
      However, because the high-level interface must, in many cases, abstract away
      from the low-level interface,
      we establish the notion of a ParamTransformer, which transforms
      one or more parameters into the form that is required by the policy class:
      The idea is that the dictionary representation of the dataclass is
      successively transformed via ParamTransformers such that the resulting
      dictionary can ultimately be used as keyword arguments for the policy.
      To achieve maintainability, the declaration of parameter transformations
      is colocated with the parameters they affect.
      Tests ensure that naming issues are detected.

    • Composition and inheritance:
      We use inheritance and mixins to reduce duplication.

  • Factories are an essential principle of the library.
    Because the creation of objects may depend on objects that are not
    yet created, a declarative approach necessitates that we transition from
    the objects themselves to factories.

    • The EnvFactory was already mentioned above, as it is a user-facing
      abstraction.
      Its purpose is to create the (vectorized) Environments that will be
      used in the experiments.
    • An AgentFactory is the central component that creates the policy,
      the trainer as well as the necessary collectors.
      To support a new type of policy, a subclass that handles the policy
      creation is required.
      In turn, the main task when implementing a new algorithm-specific
      ExperimentBuilder is the creation of the corresponding AgentFactory.
    • Several types of factories serve to parametrize policies and training
      processes, e.g.
      • OptimizerFactory for the creation of torch optimizers
      • ActorFactory for the creation of actor models
      • CriticFactory for the creation of critic models
      • IntermediateModuleFactory for the creation of models that produce
        intermediate/latent representations
      • EnvParamFactory for the creation of parameters based on properties
        of the environment
      • NoiseFactory for the creation of BaseNoise instances
      • DistributionFunctionFactory for the creation of functions that
        create torch distributions from tensors
      • LRSchedulerFactory for learning rate schedulers
      • PolicyWrapperFactory for policy wrappers that extend the
        functionality of the regular policy (e.g. intrinsic curiosity)
      • AutoAlphaFactory for automatically tuned regularization
        coefficients (as supported by SAC or REDQ)
    • A LoggerFactory handles the creation of the experiment logger,
      but the default implementation already handles the cases that were
      used in the examples.
  • The ExperimentBuilder implementations make use of mixins to add common
    functionality. As mentioned above, the main task in an algorithm-specific
    specialization is to create the AgentFactory.

@MischaPanch MischaPanch self-requested a review October 17, 2023 17:18
@MischaPanch
Copy link
Collaborator

MischaPanch commented Oct 17, 2023

@Trinkle23897 this is a major extension of tianshou, so it will require a thorough review from your side before being merged. As you see, @opcode81 was very thorough in documenting the many aspects and perspectives on these new features.

We already did some internal rounds of reviews and discussions on this, but will continue doing so next week, when I have more time. As promised, the changes are a pure addition (so @spacegoing @nuance1979 it's not creating additional engineering complexity for powerusers - quite the contrary).

Together with this PR, the wiki mentioned above should be migrated to this repository and the documentation (both readme and on readthedocs) should be extended. We believe that the new high-level interfaces will provide the main way for the majority of users to interact with tianshou.

I will ping you once we are done with internal reviews, but of course, if you already want to start a discussion - feel free :)

@MischaPanch
Copy link
Collaborator

One thing to consider, is whether the sensAI project (also mainly done by @opcode81, and actively developed and used at appliedAI) should eventually become a dependency of tianshou. In terms of transitive dependencies, it would only add scikit-learn, if I'm not mistaken.

sensAI is a high level library for general non-RL AI things, and over the years we developed several utilities that are useful in other contexts - in particular in tianshou. For now, however, these utilities were simply copied over. So it's a question for the future (though for the near future). sensAI was never marketed, and we haven't written sufficient documentation for it, but this is about to change, as appliedAI is interested in making it available to broader audience.

TL;DR: we borrowed sufficiently many utilities from another OSS project of ours, that it might be worth thinking about including it as dependency, instead of copy-pasting stuff

opcode81 and others added 14 commits October 18, 2023 20:44
Policy objects are now parametrised by converting the parameter
dataclass instances to kwargs, using some injectable conversions
along the way
 * Created mixins for agent factories to reduce code duplication
 * Further factorised params & mixins for experiment factories
 * Additional parameter abstractions
 * Implement high-level MuJoCo TD3 example
because it disallows, in particular, class docstrings that consist
only of a summary line
achieving greater separation of concerns and improved maintainability
where a persistable configuration object is passed as an
argument, as this can help to ensure persistability (making the
requirement explicit)
* Use prefix convention (subclasses have superclass names as prefix) to
  facilitate discoverability of relevant classes via IDE autocompletion
* Use dual naming, adding an alternative concise name that omits the
  precise OO semantics and retains only the essential part of the name
  (which can be more pleasing to users not accustomed to
  convoluted OO naming)
composing the list dynamically instead
* Add common based class for A2C and PPO agent factories
* Add default for dist_fn parameter, adding corresponding factories
* Add example mujoco_a2c_hl
* Refactored module `module` (split into submodules)
* Basic support for discrete environments
* Implement Atari env. factory
* Implement DQN-based actor factory
* Implement notion of reusing agent preprocessing network for critic
* Add example atari_ppo_hl
@opcode81 opcode81 force-pushed the feat/high-level-api branch from ffc04ae to 86cca8f Compare October 26, 2023 10:50
which stores the entire policy (new default), supporting applications
where it is desired to be bale to load the policy without having
to instantiate an environment or recreate a corresponding policy
object
@MischaPanch MischaPanch marked this pull request as ready for review October 26, 2023 20:54
MischaPanch
MischaPanch previously approved these changes Oct 26, 2023
Copy link
Collaborator

@MischaPanch MischaPanch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Trinkle23897 For me this looks good, there are some unresolved discussions about minor things above. Pls have a look when you find time

I changed the merge strategy to no longer squash but instead to add all commits and a merge commit

@Trinkle23897
Copy link
Collaborator

Sorry I just came back from vacation, will take a look in this weekend

@nuance1979
Copy link
Collaborator

I changed the merge strategy to no longer squash but instead to add all commits and a merge commit

I don't think that's a good idea. A PR usually goes through many rounds of revisions and there is no point keeping those history in the main repo.

Copy link
Collaborator

@Trinkle23897 Trinkle23897 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only checked some conversations and did a quick skim, will do a full review later

I still recommend using squash instead of merge, to avoid screwing up the master branch and keep a linear history

tianshou/utils/string.py Show resolved Hide resolved
pyproject.toml Show resolved Hide resolved
examples/mujoco/mujoco_a2c_hl.py Show resolved Hide resolved
@@ -5,7 +5,16 @@
import torch
from torch import nn

from tianshou.utils.net.discrete import NoisyLinear
from tianshou.highlevel.env import Environments
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah sgtm, maybe in env/atari.py or utils/whatever.py?

@opcode81
Copy link
Collaborator Author

opcode81 commented Oct 27, 2023

I changed the merge strategy to no longer squash but instead to add all commits and a merge commit

I don't think that's a good idea. A PR usually goes through many rounds of revisions and there is no point keeping those history in the main repo.

@nuance1979, actually, there are some very good reasons to keep the original history. I mention some of them here.

There is no inherent value in a linear history, yet there is value

  • in the information contained in the individual commit messages and
  • in the flexibility gained by
    • being able to revert individual commits
    • being able to build on a branch's history (which can't be done if it is rewritten)
    • being able to bissect the history at a lower level of granularity

Squashing is more reasonable for smaller PRs, but this one adds thousands of lines of code.

@opcode81
Copy link
Collaborator Author

FYI I will be on holiday for the next 10 days. I will address your further comments when I return.

@nuance1979
Copy link
Collaborator

I changed the merge strategy to no longer squash but instead to add all commits and a merge commit

I don't think that's a good idea. A PR usually goes through many rounds of revisions and there is no point keeping those history in the main repo.

@nuance1979, actually, there are some very good reasons to keep the original history. I mention some of them here.

There is no inherent value in a linear history, yet there is value

  • in the information contained in the individual commit messages and

  • in the flexibility gained by

    • being able to revert individual commits
    • being able to build on a branch's history (which can't be done if it is rewritten)
    • being able to bissect the history at a lower level of granularity

Squashing is more reasonable for smaller PRs, but this one adds thousands of lines of code.

I understand your points but would still argue for squashing because:

  • The information contained in the individual commit messages are still preserved in the PR even after squashing.
  • Individual commits inside a PR is rarely used to revert. In fact, I would argue that PR is the proper unit for reversion, i.e., PR should be a self-contained change that can be reverted without a broken tip. we don't usually care if a commit inside a PR passes full CI/CD tests so reverting to it would mean a broken main branch; for a PR, we do expect it to pass.
  • The granularity is controlled by having smaller PRs. Basically, as long as it is self contained, a smaller PR is preferred to a larger one because the time it takes to review a large PR is an impediment to development velocity.

Anyway, just my personal opinions.

@@ -5,7 +5,16 @@
import torch
from torch import nn

from tianshou.utils.net.discrete import NoisyLinear
from tianshou.highlevel.env import Environments
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you create an issue?

examples/mujoco/mujoco_a2c_hl.py Show resolved Hide resolved
@@ -37,3 +43,46 @@ def make_mujoco_env(task, seed, training_num, test_num, obs_norm):
test_envs = VectorEnvNormObs(test_envs, update_obs_rms=False)
test_envs.set_obs_rms(train_envs.get_obs_rms())
return env, train_envs, test_envs


class MujocoEnvObsRmsPersistence(Persistence):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could it be a parameter for ContinuousEnvironments?

examples/mujoco/mujoco_env.py Show resolved Hide resolved
examples/mujoco/mujoco_npg_hl.py Show resolved Hide resolved
tianshou/utils/net/common.py Show resolved Hide resolved
tianshou/trainer/base.py Show resolved Hide resolved
tianshou/highlevel/params/noise.py Show resolved Hide resolved
tianshou/highlevel/persistence.py Show resolved Hide resolved
tianshou/highlevel/experiment.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@Trinkle23897 Trinkle23897 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give @MischaPanch the final call

@MischaPanch
Copy link
Collaborator

I would merge this now, thanks to everybody for the vivid discussion and contributions! I feel like we're on a great path forward here :)

I'm merging without squash because:

  1. @opcode81 is strongly in favor of it
  2. I am rather in favor
  3. @Trinkle23897 and @nuance1979 are against it, but not very strongly, if I interpret it correctly.
  4. The commit history here is kept clean through @opcode81 continuously rebasing and force-pushing
  5. Importantly, we will have an option to revert to the sensai-dependent version without effort if we want that in the future

@MischaPanch
Copy link
Collaborator

If no objections come within 30 mins, I'm merging.

@Trinkle23897 would you be ok to add @opcode81 as maintainer? We will continue working on tianshou together, and from this PR I think you can see that he's a very responsible contributor. Neither of us plans to take away control from you, but it would be more convenient if Dominik could help me with reviewing issues, PRs, making wiki entries, and so on.

@MischaPanch MischaPanch merged commit 962c6d1 into thu-ml:master Nov 8, 2023
5 checks passed
@MischaPanch MischaPanch deleted the feat/high-level-api branch November 8, 2023 23:16
@Trinkle23897
Copy link
Collaborator

yep, sg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create high level interfaces for config and experiments
5 participants