High-Level API #970

opcode81 · 2023-10-17T16:03:29Z

I have marked all applicable categories:
- exception-raising fix
- algorithm implementation fix
- documentation modification
- new feature
I have reformatted the code using make format (required)
I have checked the code using make commit-checks (required)
If applicable, I have mentioned the relevant/related issue(s)
If applicable, I have listed every items in this Pull Request below

This PR closes #938. It introduces all the fundamental concepts and abstractions, and it already covers the majority of the algorithms. It is not a complete and finalised product, however, and we recommend that the high-level API remain in alpha stadium for some time, as already suggested in the issue.

The changes in this PR are described on a wiki page, a copy of which is provided below. (The original page is perhaps more readable, because it does not render line breaks verbatim.)

Introducing the Tianshou High-Level API

The new high-level library was created based on object-oriented design
principles with two primary design goals:

ease of use for the end user (without sacrificing generality)

This is achieved through:
- a single, well-defined point of interaction (ExperimentBuilder)
  which uses declarative semantics, allowing the user to focus on
  what to do rather than how to do it.
- easily injectible parametrisation.
  
  For complex parametrisation involving objects, the respective
  library classes are easily discoverable, keeping the need to
  browse reference documentation - or, even worse, inspect code or class
  hierarchies - to an absolute minimium.
- reduced points of failure.
  
  Because the high-level API is at a higher level of abstraction, where
  more knowledge is available, we can centrally define reasonable
  defaults and apply consistency checks in order to ensure that
  illegal configurations result in meaningful errors (and are completely
  avoided as long as the users does not modify default behaviour).
  For example, we can consider interactions between the nature of the
  action space and the neural networks being used.
maintainability for developers

This is achieved through:
- a modular design with strong separation of concerns
- a high level of factorisation, which largely avoids duplication,
  partly through the use of mixins and multiple inheritance.
  This invariably makes the code slightly more complex, yet it greatly
  reduces the lines of code to be written/updated, so it is a reasonable
  compromise in this case.

Changeset

The entire high-level library is in its own subpackage tianshou.highlevel
and almost no changes were made to the original library in order to
support the new APIs.
For the most part, only typing-related changes were made, which have
aligned type annotations with existing example applications or have made
explicit interfaces that were previously implicit.

Furthermore, some helper modules were added to the the tianshou.util package
(all of which were copied from the sensAI library).

Many example applications were added, based on the existing MuJoCo and Atari
examples (see below).

User-Facing Interface

User Experience Example

To illustrate the UX, consider this video recording (IntelliJ IDEA):

Observe how conveniently relevant classes can be discovered via the IDE's
auto-completion function.
Discoverability is markedly enhanced by using a prefix-based naming convention,
where classes that can be used as parameters use the base class name as a prefix,
allowing all potentially relevant subclasses to be straightforwardly
auto-completed.

Declarative Semantics

A key design principle for the user-facing interface was to achieve
declarative semantics, where the user
is no longer concerned with generating a lengthy procedure that sequentially
constructs components that build upon each other.
Instead, the user focuses purely on
declaring the properties of the learning task he would like to run.

This essentially reduces boiler-plate code to zero, as every part of the
code is defining essential, experiment-specific configuration.
This makes it possible to centrally handle interdependent configuration
and detect/avoid misspecification.

In order to enable the configuration of interdependent objects without
requiring the user to instantiate the respective objects sequentially, we
heavily employ the factory pattern.

Experiment Builders

The end user's primary entry point is an ExperimentBuilder, which is
specialised for each algorithm.
As the name suggests, it uses the builder pattern in order to create
an Experiment object, which is then used to run the learning task.

At builder construction, the user is required to provide only essential
configuration, particularly the environment factory.
The bulk of the algorithm-specific parameters can be provided
via an algorithm-specific parameter object.
For instance, PPOExperimentBuilder has the method with_ppo_params,
which expects an object of type PPOParams.
Parametrisation that requires the provision of more complex interfaces
(e.g. were multiple specification variants exist) are handled via
dedicated builder methods.
For example, for the specification of the critic component in an
actor-critic algorithm, the following group of functions is provided:
- with_critic_factory (where the user can provide any (user-defined)
  factory for the critic component)
- with_critic_factory_default (with which the user specifies that
  the default, Net-based critic architecture shall be used and has the
  option to parametrise it)
- with_critic_factory_use_actor (with which the user indicates that the
  critic component shall reuse the preprocessing network from the actor
  component)

Examples

Minimal Example

In the simplest of cases, where the user wants to use the default
parametrisation for everything, a user could run a PPO learning task
as follows,

experiment = PPOExperimentBuilder(MyEnvFactory()).build()
experiment.run()

where MyEnvFactory is a factory for the agent's environment.
The default behaviour will adapt depending on whether the factory
creates environments with discrete or continuous action spaces.

Fully Parametrised MuJoCo Example

Importantly, the user still has the option to configure all the details.
Consider this example, which is from the high-level version of the
mujoco_ppo example:

log_name = os.path.join(task, "ppo", str(experiment_config.seed), datetime_tag())

sampling_config = SamplingConfig(
    num_epochs=epoch,
    step_per_epoch=step_per_epoch,
    batch_size=batch_size,
    num_train_envs=training_num,
    num_test_envs=test_num,
    buffer_size=buffer_size,
    step_per_collect=step_per_collect,
    repeat_per_collect=repeat_per_collect,
)

env_factory = MujocoEnvFactory(task, experiment_config.seed, obs_norm=True)

experiment = (
    PPOExperimentBuilder(env_factory, experiment_config, sampling_config)
    .with_ppo_params(
        PPOParams(
            discount_factor=gamma,
            gae_lambda=gae_lambda,
            action_bound_method=bound_action_method,
            reward_normalization=rew_norm,
            ent_coef=ent_coef,
            vf_coef=vf_coef,
            max_grad_norm=max_grad_norm,
            value_clip=value_clip,
            advantage_normalization=norm_adv,
            eps_clip=eps_clip,
            dual_clip=dual_clip,
            recompute_advantage=recompute_adv,
            lr=lr,
            lr_scheduler_factory=LRSchedulerFactoryLinear(sampling_config)
            if lr_decay
            else None,
            dist_fn=DistributionFunctionFactoryIndependentGaussians(),
        ),
    )
    .with_actor_factory_default(hidden_sizes, torch.nn.Tanh, continuous_unbounded=True)
    .with_critic_factory_default(hidden_sizes, torch.nn.Tanh)
    .build()
)
experiment.run(log_name)

This is functionally equivalent to the procedural, low-level example.
Compare the scripts here:

In general, find example applications of the high-level API in the examples/
folder in scripts using the _hl.py suffix:

Experiments

The Experiment representation contains

the agent factory ,
the environment factory,
further definitions pertaining to storage & logging.

An exeriment may be run several times, assigning a name (and corresponding
storage location) to each run.

Persistence and Logging

Experiments can be serialized and later be reloaded.

    experiment = Experiment.from_directory("log/my_experiment")

Because the experiment representation is composed purely of configuration
and factories, which themselves are composed purely of configuration and
factories, persisted objects are compact and do not contain state.

Every experiment run produces the following artifacts:

the serialized experiment
the serialized best policy found during training
a log file
(optionally) user-defined data, as the persistence
handlers are modular

Running a reloaded experiment can optionally resume training of the serialized
policy.

All relevant objects have meaningful string representations that can appear
in logs, which is conveniently achieved through the use of
ToStringMixin (from sensAI).
Its use furthermore prevents string representations of recurring objects
from being printed more than once.
For example, consider this string representation, which was generated for
the fully parametrised PPO experiment from the example above:

Experiment[
    config=ExperimentConfig(
        seed=42, 
        device='cuda', 
        policy_restore_directory=None, 
        train=True, 
        watch=True, 
        watch_render=0.0, 
        persistence_base_dir='log', 
        persistence_enabled=True), 
    sampling_config=SamplingConfig[
        num_epochs=100, 
        step_per_epoch=30000, 
        batch_size=64, 
        num_train_envs=64, 
        num_test_envs=10, 
        buffer_size=4096, 
        step_per_collect=2048, 
        repeat_per_collect=10, 
        update_per_step=1.0, 
        start_timesteps=0, 
        start_timesteps_random=False, 
        replay_buffer_ignore_obs_next=False, 
        replay_buffer_save_only_last_obs=False, 
        replay_buffer_stack_num=1], 
    env_factory=MujocoEnvFactory[
        task=Ant-v4, 
        seed=42, 
        obs_norm=True], 
    agent_factory=PPOAgentFactory[
        sampling_config=SamplingConfig[<<], 
        optim_factory=OptimizerFactoryAdam[
            weight_decay=0, 
            eps=1e-08, 
            betas=(0.9, 0.999)], 
        policy_wrapper_factory=None, 
        trainer_callbacks=TrainerCallbacks(
            epoch_callback_train=None, 
            epoch_callback_test=None, 
            stop_callback=None), 
        params=PPOParams[
            gae_lambda=0.95, 
            max_batchsize=256, 
            lr=0.0003, 
            lr_scheduler_factory=LRSchedulerFactoryLinear[sampling_config=SamplingConfig[<<]], 
            action_scaling=default, 
            action_bound_method=clip, 
            discount_factor=0.99, 
            reward_normalization=True, 
            deterministic_eval=False, 
            dist_fn=DistributionFunctionFactoryIndependentGaussians[], 
            vf_coef=0.25, 
            ent_coef=0.0, 
            max_grad_norm=0.5, 
            eps_clip=0.2, 
            dual_clip=None, 
            value_clip=False, 
            advantage_normalization=False, 
            recompute_advantage=True], 
        actor_factory=ActorFactoryTransientStorageDecorator[
            actor_factory=ActorFactoryDefault[
                continuous_actor_type=ContinuousActorType.GAUSSIAN, 
                continuous_unbounded=True, 
                continuous_conditioned_sigma=False, 
                hidden_sizes=[64, 64], 
                hidden_activation=<class 'torch.nn.modules.activation.Tanh'>, 
                discrete_softmax=True]], 
        critic_factory=CriticFactoryDefault[
            hidden_sizes=[64, 64], 
            hidden_activation=<class 'torch.nn.modules.activation.Tanh'>], 
        critic_use_action=False], 
    logger_factory=LoggerFactoryDefault[
        logger_type=tensorboard, 
        wandb_project=None], 
    env_config=None]

Library Developer Perspective

The presentation thus far has focussed on the user's perspective.
From the perspective of a Tianshou developer, it is important that the
high-level API be clearly structured and maintainable.
Here are the most relevant representations:

Policy parameters are represented as dataclasses (base class Params).

The goal is for the parameters to be ultimately passed to the corresponding
policy class (e.g. PPOParams contains parameters for PPOPolicy).
- Parameter transformation:
  In part, the parameter dataclass attributes already correspond directly to
  policy class parameters.
  However, because the high-level interface must, in many cases, abstract away
  from the low-level interface,
  we establish the notion of a ParamTransformer, which transforms
  one or more parameters into the form that is required by the policy class:
  The idea is that the dictionary representation of the dataclass is
  successively transformed via ParamTransformers such that the resulting
  dictionary can ultimately be used as keyword arguments for the policy.
  To achieve maintainability, the declaration of parameter transformations
  is colocated with the parameters they affect.
  Tests ensure that naming issues are detected.
- Composition and inheritance:
  We use inheritance and mixins to reduce duplication.
Factories are an essential principle of the library.
Because the creation of objects may depend on objects that are not
yet created, a declarative approach necessitates that we transition from
the objects themselves to factories.
- The EnvFactory was already mentioned above, as it is a user-facing
  abstraction.
  Its purpose is to create the (vectorized) Environments that will be
  used in the experiments.
- An AgentFactory is the central component that creates the policy,
  the trainer as well as the necessary collectors.
  To support a new type of policy, a subclass that handles the policy
  creation is required.
  In turn, the main task when implementing a new algorithm-specific
  ExperimentBuilder is the creation of the corresponding AgentFactory.
- Several types of factories serve to parametrize policies and training
  processes, e.g.
  - OptimizerFactory for the creation of torch optimizers
  - ActorFactory for the creation of actor models
  - CriticFactory for the creation of critic models
  - IntermediateModuleFactory for the creation of models that produce
    intermediate/latent representations
  - EnvParamFactory for the creation of parameters based on properties
    of the environment
  - NoiseFactory for the creation of BaseNoise instances
  - DistributionFunctionFactory for the creation of functions that
    create torch distributions from tensors
  - LRSchedulerFactory for learning rate schedulers
  - PolicyWrapperFactory for policy wrappers that extend the
    functionality of the regular policy (e.g. intrinsic curiosity)
  - AutoAlphaFactory for automatically tuned regularization
    coefficients (as supported by SAC or REDQ)
- A LoggerFactory handles the creation of the experiment logger,
  but the default implementation already handles the cases that were
  used in the examples.
The ExperimentBuilder implementations make use of mixins to add common
functionality. As mentioned above, the main task in an algorithm-specific
specialization is to create the AgentFactory.

So far only for one script (mujoco_ppo_cfg), extension will follow Conflicts: examples/mujoco/mujoco_env.py examples/mujoco/mujoco_ppo.py setup.py

in mujoco_ppo_hl

of control flow paths for brevity (regarding return statements)

MischaPanch · 2023-10-17T17:25:13Z

@Trinkle23897 this is a major extension of tianshou, so it will require a thorough review from your side before being merged. As you see, @opcode81 was very thorough in documenting the many aspects and perspectives on these new features.

We already did some internal rounds of reviews and discussions on this, but will continue doing so next week, when I have more time. As promised, the changes are a pure addition (so @spacegoing @nuance1979 it's not creating additional engineering complexity for powerusers - quite the contrary).

Together with this PR, the wiki mentioned above should be migrated to this repository and the documentation (both readme and on readthedocs) should be extended. We believe that the new high-level interfaces will provide the main way for the majority of users to interact with tianshou.

I will ping you once we are done with internal reviews, but of course, if you already want to start a discussion - feel free :)

MischaPanch · 2023-10-17T17:37:23Z

One thing to consider, is whether the sensAI project (also mainly done by @opcode81, and actively developed and used at appliedAI) should eventually become a dependency of tianshou. In terms of transitive dependencies, it would only add scikit-learn, if I'm not mistaken.

sensAI is a high level library for general non-RL AI things, and over the years we developed several utilities that are useful in other contexts - in particular in tianshou. For now, however, these utilities were simply copied over. So it's a question for the future (though for the near future). sensAI was never marketed, and we haven't written sufficient documentation for it, but this is about to change, as appliedAI is interested in making it available to broader audience.

TL;DR: we borrowed sufficiently many utilities from another OSS project of ours, that it might be worth thinking about including it as dependency, instead of copy-pasting stuff

Policy objects are now parametrised by converting the parameter dataclass instances to kwargs, using some injectable conversions along the way

* Created mixins for agent factories to reduce code duplication * Further factorised params & mixins for experiment factories * Additional parameter abstractions * Implement high-level MuJoCo TD3 example

because it disallows, in particular, class docstrings that consist only of a summary line

achieving greater separation of concerns and improved maintainability

where a persistable configuration object is passed as an argument, as this can help to ensure persistability (making the requirement explicit)

* Use prefix convention (subclasses have superclass names as prefix) to facilitate discoverability of relevant classes via IDE autocompletion * Use dual naming, adding an alternative concise name that omits the precise OO semantics and retains only the essential part of the name (which can be more pleasing to users not accustomed to convoluted OO naming)

composing the list dynamically instead

* Add common based class for A2C and PPO agent factories * Add default for dist_fn parameter, adding corresponding factories * Add example mujoco_a2c_hl

…l API

* Refactored module `module` (split into submodules) * Basic support for discrete environments * Implement Atari env. factory * Implement DQN-based actor factory * Implement notion of reusing agent preprocessing network for critic * Add example atari_ppo_hl

which stores the entire policy (new default), supporting applications where it is desired to be bale to load the policy without having to instantiate an environment or recreate a corresponding policy object

MischaPanch

@Trinkle23897 For me this looks good, there are some unresolved discussions about minor things above. Pls have a look when you find time

I changed the merge strategy to no longer squash but instead to add all commits and a merge commit

Trinkle23897 · 2023-10-26T21:04:53Z

Sorry I just came back from vacation, will take a look in this weekend

nuance1979 · 2023-10-27T01:18:00Z

I changed the merge strategy to no longer squash but instead to add all commits and a merge commit

I don't think that's a good idea. A PR usually goes through many rounds of revisions and there is no point keeping those history in the main repo.

Trinkle23897

I only checked some conversations and did a quick skim, will do a full review later

I still recommend using squash instead of merge, to avoid screwing up the master branch and keep a linear history

tianshou/utils/string.py

pyproject.toml

examples/mujoco/mujoco_a2c_hl.py

Trinkle23897 · 2023-10-27T03:45:05Z

examples/atari/atari_network.py

@@ -5,7 +5,16 @@
 import torch
 from torch import nn

-from tianshou.utils.net.discrete import NoisyLinear
+from tianshou.highlevel.env import Environments


yeah sgtm, maybe in env/atari.py or utils/whatever.py?

opcode81 · 2023-10-27T18:29:50Z

I changed the merge strategy to no longer squash but instead to add all commits and a merge commit

I don't think that's a good idea. A PR usually goes through many rounds of revisions and there is no point keeping those history in the main repo.

@nuance1979, actually, there are some very good reasons to keep the original history. I mention some of them here.

There is no inherent value in a linear history, yet there is value

in the information contained in the individual commit messages and
in the flexibility gained by
- being able to revert individual commits
- being able to build on a branch's history (which can't be done if it is rewritten)
- being able to bissect the history at a lower level of granularity

Squashing is more reasonable for smaller PRs, but this one adds thousands of lines of code.

opcode81 · 2023-10-27T18:36:03Z

FYI I will be on holiday for the next 10 days. I will address your further comments when I return.

nuance1979 · 2023-10-28T18:14:35Z

I changed the merge strategy to no longer squash but instead to add all commits and a merge commit

I don't think that's a good idea. A PR usually goes through many rounds of revisions and there is no point keeping those history in the main repo.

@nuance1979, actually, there are some very good reasons to keep the original history. I mention some of them here.

There is no inherent value in a linear history, yet there is value

in the information contained in the individual commit messages and

in the flexibility gained by

being able to revert individual commits

being able to build on a branch's history (which can't be done if it is rewritten)

being able to bissect the history at a lower level of granularity

Squashing is more reasonable for smaller PRs, but this one adds thousands of lines of code.

I understand your points but would still argue for squashing because:

The information contained in the individual commit messages are still preserved in the PR even after squashing.
Individual commits inside a PR is rarely used to revert. In fact, I would argue that PR is the proper unit for reversion, i.e., PR should be a self-contained change that can be reverted without a broken tip. we don't usually care if a commit inside a PR passes full CI/CD tests so reverting to it would mean a broken main branch; for a PR, we do expect it to pass.
The granularity is controlled by having smaller PRs. Basically, as long as it is self contained, a smaller PR is preferred to a larger one because the time it takes to review a large PR is an impediment to development velocity.

Anyway, just my personal opinions.

Trinkle23897 · 2023-11-05T16:28:02Z

examples/atari/atari_network.py

@@ -5,7 +5,16 @@
 import torch
 from torch import nn

-from tianshou.utils.net.discrete import NoisyLinear
+from tianshou.highlevel.env import Environments


could you create an issue?

examples/mujoco/mujoco_a2c_hl.py

Trinkle23897 · 2023-11-05T16:37:00Z

examples/mujoco/mujoco_env.py

@@ -37,3 +43,46 @@ def make_mujoco_env(task, seed, training_num, test_num, obs_norm):
        test_envs = VectorEnvNormObs(test_envs, update_obs_rms=False)
        test_envs.set_obs_rms(train_envs.get_obs_rms())
    return env, train_envs, test_envs
+
+
+class MujocoEnvObsRmsPersistence(Persistence):


could it be a parameter for ContinuousEnvironments?

examples/mujoco/mujoco_env.py

examples/mujoco/mujoco_npg_hl.py

tianshou/utils/net/common.py

tianshou/trainer/base.py

tianshou/highlevel/params/noise.py

tianshou/highlevel/persistence.py

tianshou/highlevel/experiment.py

Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>

Trinkle23897

I'll give @MischaPanch the final call

This reverts commit fdb0eba.

MischaPanch · 2023-11-08T22:44:20Z

I would merge this now, thanks to everybody for the vivid discussion and contributions! I feel like we're on a great path forward here :)

I'm merging without squash because:

@opcode81 is strongly in favor of it
I am rather in favor
@Trinkle23897 and @nuance1979 are against it, but not very strongly, if I interpret it correctly.
The commit history here is kept clean through @opcode81 continuously rebasing and force-pushing
Importantly, we will have an option to revert to the sensai-dependent version without effort if we want that in the future

MischaPanch · 2023-11-08T22:47:55Z

If no objections come within 30 mins, I'm merging.

@Trinkle23897 would you be ok to add @opcode81 as maintainer? We will continue working on tianshou together, and from this PR I think you can see that he's a very responsible contributor. Neither of us plans to take away control from you, but it would be more convenient if Dominik could help me with reviewing issues, PRs, making wiki entries, and so on.

Trinkle23897 · 2023-11-08T23:25:19Z

yep, sg

opcode81 and others added 12 commits October 9, 2023 13:01

Add dev dependencies jsonargparse and docstring_parser

42fc181

Addition of dataclasses based config for scripts, major refactoring

a54aade

So far only for one script (mujoco_ppo_cfg), extension will follow Conflicts: examples/mujoco/mujoco_env.py examples/mujoco/mujoco_ppo.py setup.py

Initial high-level interfaces, demonstrated in mujoco_ppo_hl

16ed5fd

Ignore D106: Missing docstring in public nested class

25c6bbd

Enable ruff setting ignore-init-module-imports

2a1cc6b

Add SAC high-level interface

316eb3c

Refactoring, dropping package config

997b520

Remove LoggerConfig

adc3240

Use experiment-specific config in mujoco_sac_hl, adding auto-alpha

d26b8cb

Move RLSamplingConfig to separate module config, fixing cyclic import

8ec4200

Unify PPO configuration objects, use experiment-specific configuration

3fd60f9

in mujoco_ppo_hl

Ignore Ruff rule RET505, because it sacrifices visual discernability

4d53d34

of control flow paths for brevity (regarding return statements)

MischaPanch self-requested a review October 17, 2023 17:18

MischaPanch requested a review from Trinkle23897 October 17, 2023 17:25

opcode81 and others added 14 commits October 18, 2023 20:44

Add high-level experiment builder interface

37dc07e

Improve high-level policy parametrisation

367778d

Policy objects are now parametrised by converting the parameter dataclass instances to kwargs, using some injectable conversions along the way

WandbLogger: Use less restrictive type annotation for config

6a73938

Add high-level API support for TD3

e993425

* Created mixins for agent factories to reduce code duplication * Further factorised params & mixins for experiment factories * Additional parameter abstractions * Implement high-level MuJoCo TD3 example

Disable Ruff rule D205 (blank-line-after-summary)

38cf982

because it disallows, in particular, class docstrings that consist only of a summary line

Move parameter transformation directly into parameter objects,

d4e604b

achieving greater separation of concerns and improved maintainability

Add alternative functional interface for environment creation

5bcf514

where a persistable configuration object is passed as an argument, as this can help to ensure persistability (making the requirement explicit)

Remove parameter transformers from config object state,

acd89fa

composing the list dynamically instead

Add A2C high-level API

cd79cf8

* Add common based class for A2C and PPO agent factories * Add default for dist_fn parameter, adding corresponding factories * Add example mujoco_a2c_hl

Add base class BaseActor with method get_preprocess_net for high-leve…

e0e7349

…l API

Add support for discrete PPO

6b6d9ea

* Refactored module `module` (split into submodules) * Basic support for discrete environments * Implement Atari env. factory * Implement DQN-based actor factory * Implement notion of reusing agent preprocessing network for critic * Add example atari_ppo_hl

Add DDPG high-level API and MuJoCo example

2671580

Add string module from sensAI

de70147

opcode81 force-pushed the feat/high-level-api branch from ffc04ae to 86cca8f Compare October 26, 2023 10:50

Allow to configure the policy persistence mode, adding a new mode

a3dbe90

which stores the entire policy (new default), supporting applications where it is desired to be bale to load the policy without having to instantiate an environment or recreate a corresponding policy object

MischaPanch marked this pull request as ready for review October 26, 2023 20:54

MischaPanch previously approved these changes Oct 26, 2023

View reviewed changes

Trinkle23897 reviewed Oct 27, 2023

View reviewed changes

opcode81 added 2 commits October 27, 2023 18:59

Add option to disable file logging

5952993

Depend on sensAI instead of copying its utils (logging, string)

fdb0eba

opcode81 dismissed MischaPanch’s stale review via fdb0eba October 27, 2023 18:17

Trinkle23897 reviewed Nov 6, 2023

View reviewed changes

opcode81 and others added 3 commits November 6, 2023 16:17

Fix index error in call to _with_critic_factory_default

5c8d57a

Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>

Rename class ActorCriticModuleOpt -> ActorCriticOpt

7e6d3d6

Add docstring for ActorFactoryTransientStorageDecorator

ac672f6

Trinkle23897 mentioned this pull request Nov 6, 2023

High-level API document improvement #985

Open

Trinkle23897 reviewed Nov 6, 2023

View reviewed changes

Revert "Depend on sensAI instead of copying its utils (logging, string)"

dae4000

This reverts commit fdb0eba.

opcode81 force-pushed the feat/high-level-api branch from 4fb3e09 to dae4000 Compare November 8, 2023 18:13

This was referenced Nov 8, 2023

Centrally handle persistence of running mean/std for the normalization of observations #987

Open

Include parts of atari/mujoco helpers in package code #988

Open

MischaPanch approved these changes Nov 8, 2023

View reviewed changes

MischaPanch merged commit 962c6d1 into thu-ml:master Nov 8, 2023
5 checks passed

MischaPanch deleted the feat/high-level-api branch November 8, 2023 23:16

MischaPanch mentioned this pull request Nov 10, 2023

Roadmap towards 1.0.0 Release #929

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High-Level API #970

High-Level API #970

opcode81 commented Oct 17, 2023 •

edited

Loading

MischaPanch commented Oct 17, 2023 •

edited

Loading

MischaPanch commented Oct 17, 2023

MischaPanch left a comment

Trinkle23897 commented Oct 26, 2023

nuance1979 commented Oct 27, 2023

Trinkle23897 left a comment •

edited

Loading

Trinkle23897 Oct 27, 2023

opcode81 commented Oct 27, 2023 •

edited

Loading

opcode81 commented Oct 27, 2023

nuance1979 commented Oct 28, 2023

Trinkle23897 Nov 5, 2023

Trinkle23897 Nov 5, 2023

Trinkle23897 left a comment

MischaPanch commented Nov 8, 2023

MischaPanch commented Nov 8, 2023

Trinkle23897 commented Nov 8, 2023

High-Level API #970

High-Level API #970

Conversation

opcode81 commented Oct 17, 2023 • edited Loading

Introducing the Tianshou High-Level API

Changeset

User-Facing Interface

User Experience Example

Declarative Semantics

Experiment Builders

Examples

Minimal Example

Fully Parametrised MuJoCo Example

Experiments

Persistence and Logging

Library Developer Perspective

MischaPanch commented Oct 17, 2023 • edited Loading

MischaPanch commented Oct 17, 2023

MischaPanch left a comment

Choose a reason for hiding this comment

Trinkle23897 commented Oct 26, 2023

nuance1979 commented Oct 27, 2023

Trinkle23897 left a comment • edited Loading

Choose a reason for hiding this comment

Trinkle23897 Oct 27, 2023

Choose a reason for hiding this comment

opcode81 commented Oct 27, 2023 • edited Loading

opcode81 commented Oct 27, 2023

nuance1979 commented Oct 28, 2023

Trinkle23897 Nov 5, 2023

Choose a reason for hiding this comment

Trinkle23897 Nov 5, 2023

Choose a reason for hiding this comment

Trinkle23897 left a comment

Choose a reason for hiding this comment

MischaPanch commented Nov 8, 2023

MischaPanch commented Nov 8, 2023

Trinkle23897 commented Nov 8, 2023

opcode81 commented Oct 17, 2023 •

edited

Loading

MischaPanch commented Oct 17, 2023 •

edited

Loading

Trinkle23897 left a comment •

edited

Loading

opcode81 commented Oct 27, 2023 •

edited

Loading