Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs/overhaul #999

Merged
merged 63 commits into from
Dec 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
6b6ce0f
Add notebooks from ./docs/tutorials/get_started.rst to ./notebooks
carlocagnetta Oct 17, 2023
efaadec
Removed notebook outputs
carlocagnetta Oct 17, 2023
6a60396
Add jupyter pkg dependency to poetry env
carlocagnetta Oct 17, 2023
acc671b
Update gitignore
carlocagnetta Oct 20, 2023
7ed0703
Update gitignore
carlocagnetta Oct 20, 2023
de3a021
Setup jupyter-book
carlocagnetta Oct 26, 2023
148e11a
Update notebooks to current version
carlocagnetta Oct 26, 2023
96704a8
Update jupyter-book _config.yml to comply with template
carlocagnetta Oct 27, 2023
04c374c
add git-action to compile notebooks
carlocagnetta Nov 2, 2023
0e9837f
Fix worklfow action compiling notebooks
carlocagnetta Nov 2, 2023
bbe5da4
Add compiled books for documentation
carlocagnetta Nov 2, 2023
c175f76
Fix notebooks.yml
carlocagnetta Nov 2, 2023
23eadb1
Change books publish dir
carlocagnetta Nov 2, 2023
1170ad3
Change gh-pages publish dir
carlocagnetta Nov 2, 2023
60f54fd
Change gh-pages publish dir
carlocagnetta Nov 2, 2023
0108726
fix build books action command
carlocagnetta Nov 2, 2023
30fc884
Publish notebooks only on master
carlocagnetta Nov 2, 2023
624998d
fix notebook action
carlocagnetta Nov 2, 2023
b74ea40
Add git-action to compile notebooks
carlocagnetta Nov 2, 2023
8489fde
Publish compiled Jupyter books to github-pages
carlocagnetta Nov 2, 2023
3a87b33
Publish notebooks only from master
carlocagnetta Nov 2, 2023
6df5616
Move notebooks to doc and resolve spellcheck
carlocagnetta Nov 9, 2023
8f0c62a
Documentation update: jupyter-book running on ReadTheDocs including t…
carlocagnetta Nov 10, 2023
102045c
Fix .readthedocs.yaml
carlocagnetta Nov 10, 2023
ca2e4a1
Fix docs index
carlocagnetta Nov 10, 2023
fb97091
Fix action gh-pages
carlocagnetta Nov 10, 2023
9aad2c9
Fix readthedocs to install poetry
carlocagnetta Nov 10, 2023
64af97b
Fix RTD and gh-pages docu auto-generation
carlocagnetta Nov 10, 2023
b1b7f24
Fix docs/requirements.txt
carlocagnetta Nov 10, 2023
08f1770
Fix docs/requirements.txt
carlocagnetta Nov 10, 2023
573d53d
Fix docs/requirements.txt
carlocagnetta Nov 10, 2023
6509a20
Add autogenerated api to gitignore
carlocagnetta Nov 10, 2023
396f20b
Fix docs/requirements.txt
carlocagnetta Nov 10, 2023
4693b0b
Remove autogenerated docs/api/highllevel
carlocagnetta Nov 10, 2023
06d2703
Fix docs/requirements.txt
carlocagnetta Nov 10, 2023
9ab5d35
Fix docs/requirements.txt
carlocagnetta Nov 10, 2023
42d9599
Fix docs/requirements.txt
carlocagnetta Nov 10, 2023
89d8cf3
Removed action for gh-pages
carlocagnetta Nov 10, 2023
6f739cc
update docs/.gitignore
carlocagnetta Nov 15, 2023
6fa536f
Update Documentation building
carlocagnetta Nov 15, 2023
a8bceff
Moved all docs images in docs/_static
carlocagnetta Nov 15, 2023
cf3e94a
Update .readthedocs.yaml
carlocagnetta Nov 15, 2023
f5041f4
Replaced .png images with .svg where possible
carlocagnetta Nov 17, 2023
a12b157
Add launch button for notebooks in colab
carlocagnetta Nov 17, 2023
fa55217
Remove get_started.rst page with links to outdated notebooks
carlocagnetta Nov 17, 2023
830969d
Update .readthe
carlocagnetta Nov 17, 2023
1515ff9
Compressed .png and .jpg images
carlocagnetta Nov 17, 2023
5d6abfa
revert .readthedocs.yaml
carlocagnetta Nov 19, 2023
d4b6d9b
WIP - restructure doc files
Nov 17, 2023
006577d
WIP - restructure doc files
Nov 23, 2023
a568561
Docs: generate all api docs automatically
Dec 4, 2023
4cfefcf
Docs: removed conflicting sphinx stuff from a docstring
Dec 4, 2023
5af2947
Docs: removed capitalization
Dec 4, 2023
b129836
Docs: added sorting order for autogenerated toc
Dec 4, 2023
28fda00
Docs: added links to source code, readded some ruff ignore rules
Dec 4, 2023
2e39a25
Docstring: minor changes to let ruff pass
Dec 4, 2023
a846b52
Typing: fixed multiple typing issues
Dec 5, 2023
0b67447
Docs: fixing spelling, re-adding spellcheck to pipeline
Dec 5, 2023
19e129d
Fix rtd build
Dec 5, 2023
c50e74f
Fix rtd build, improvements in task running
Dec 5, 2023
9d14407
Deal with .jupyter_cache
Dec 5, 2023
5f4a02c
Docs: improve API landing page
Dec 5, 2023
4c24dc6
Formatting
Dec 5, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,9 @@ videos/
# might be needed for IDE plugins that can't read ruff config
.flake8

docs/notebooks/_build/
docs/conf.py

# temporary scripts (for ad-hoc testing), temp folder
/temp
/temp*.py
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@ repos:
pass_filenames: false
- id: poetry-lock-check
name: poetry lock check
entry: poetry lock
args: [--check]
entry: poetry check
args: [--lock]
language: system
pass_filenames: false
- id: mypy
Expand Down
23 changes: 11 additions & 12 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,14 @@ build:
os: ubuntu-22.04
tools:
python: "3.11"
jobs:
pre_build:
- pip install .

# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/conf.py
# We recommend specifying your dependencies to enable reproducible builds:
# https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
python:
install:
- requirements: docs/requirements.txt
commands:
- mkdir -p $READTHEDOCS_OUTPUT/html
- curl -sSL https://install.python-poetry.org | python -
# - ~/.local/bin/poetry config virtualenvs.create false
- ~/.local/bin/poetry install --with dev
## Same as poe tasks, but unfortunately poe doesn't work with poetry not creating virtualenvs
- ~/.local/bin/poetry run python docs/autogen_rst.py
- ~/.local/bin/poetry run which jupyter-book
- ~/.local/bin/poetry run python docs/create_toc.py
- ~/.local/bin/poetry run jupyter-book config sphinx docs/
- ~/.local/bin/poetry run sphinx-build -W -b html docs $READTHEDOCS_OUTPUT/html
6 changes: 4 additions & 2 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
# auto-generated content
/api/tianshou.highlevel
/03_api/*
jupyter_execute
_toc.yml
.jupyter_cache
2 changes: 1 addition & 1 deletion docs/tutorials/dqn.rst → docs/01_tutorials/00_dqn.rst
Original file line number Diff line number Diff line change
Expand Up @@ -308,7 +308,7 @@ Tianshou supports user-defined training code. Here is the code snippet:
# train policy with a sampled batch data from buffer
losses = policy.update(64, train_collector.buffer)

For further usage, you can refer to the :doc:`/tutorials/cheatsheet`.
For further usage, you can refer to the :doc:`/01_tutorials/07_cheatsheet`.

.. rubric:: References

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -339,7 +339,7 @@ Thus, we need a time-related interface for calculating the 2-step return. :meth:

This code does not consider the done flag, so it may not work very well. It shows two ways to get :math:`s_{t + 2}` from the replay buffer easily in :meth:`~tianshou.policy.BasePolicy.process_fn`.

For other method, you can check out :doc:`/api/tianshou.policy`. We give the usage of policy class a high-level explanation in :ref:`pseudocode`.
For other method, you can check out :doc:`/03_api/policy/index`. We give the usage of policy class a high-level explanation in :ref:`pseudocode`.


Collector
Expand Down Expand Up @@ -382,7 +382,7 @@ Trainer

Once you have a collector and a policy, you can start writing the training method for your RL agent. Trainer, to be honest, is a simple wrapper. It helps you save energy for writing the training loop. You can also construct your own trainer: :ref:`customized_trainer`.

Tianshou has three types of trainer: :func:`~tianshou.trainer.onpolicy_trainer` for on-policy algorithms such as Policy Gradient, :func:`~tianshou.trainer.offpolicy_trainer` for off-policy algorithms such as DQN, and :func:`~tianshou.trainer.offline_trainer` for offline algorithms such as BCQ. Please check out :doc:`/api/tianshou.trainer` for the usage.
Tianshou has three types of trainer: :func:`~tianshou.trainer.onpolicy_trainer` for on-policy algorithms such as Policy Gradient, :func:`~tianshou.trainer.offpolicy_trainer` for off-policy algorithms such as DQN, and :func:`~tianshou.trainer.offline_trainer` for offline algorithms such as BCQ. Please check out :doc:`/03_api/trainer/index` for the usage.

We also provide the corresponding iterator-based trainer classes :class:`~tianshou.trainer.OnpolicyTrainer`, :class:`~tianshou.trainer.OffpolicyTrainer`, :class:`~tianshou.trainer.OfflineTrainer` to facilitate users writing more flexible training logic:
::
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ The figure in the right gives an intuitive comparison among synchronous/asynchro
.. note::

The async simulation collector would cause some exceptions when used as
``test_collector`` in :doc:`/api/tianshou.trainer` (related to
``test_collector`` in :doc:`/03_api/trainer/index` (related to
`Issue 700 <https://github.com/thu-ml/tianshou/issues/700>`_). Please use
sync version for ``test_collector`` instead.

Expand Down Expand Up @@ -478,4 +478,4 @@ By constructing a new state ``state_ = (state, agent_id, mask)``, essentially we
act = policy(state_)
next_state_, reward = env.step(act)

Following this idea, we write a tiny example of playing `Tic Tac Toe <https://en.wikipedia.org/wiki/Tic-tac-toe>`_ against a random player by using a Q-learning algorithm. The tutorial is at :doc:`/tutorials/tictactoe`.
Following this idea, we write a tiny example of playing `Tic Tac Toe <https://en.wikipedia.org/wiki/Tic-tac-toe>`_ against a random player by using a Q-learning algorithm. The tutorial is at :doc:`/01_tutorials/04_tictactoe`.
2 changes: 2 additions & 0 deletions docs/01_tutorials/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Tutorials
=========
4 changes: 4 additions & 0 deletions docs/02_notebooks/0_intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Notebook Tutorials

Here is a collection of executable tutorials for Tianshou. You can run them
directly in colab, or download them and run them locally.
236 changes: 236 additions & 0 deletions docs/02_notebooks/L0_overview.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"editable": true,
"id": "r7aE6Rq3cAEE",
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"# Overview\n",
"In this tutorial, we use guide you step by step to show you how the most basic modules in Tianshou work and how they collaborate with each other to conduct a classic DRL experiment."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1_mLTSEIcY2c"
},
"source": [
"## Run the code\n",
"Before we get started, we must first install Tianshou's library and Gym environment by running the commands below. Here I choose a specific version of Tianshou(0.4.8) which is the latest as of the time writing this tutorial. APIs in different versions may vary a little bit but most are the same. Feel free to use other versions in your own project."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "IcFNmCjYeIIU"
},
"source": [
"Below is a short script that use a certain DRL algorithm (PPO) to solve the classic CartPole-v1\n",
"problem in Gym. Simply run it and **don't worry** if you can't understand the code very well. That is\n",
"exactly what this tutorial is for.\n",
"\n",
"If the script ends normally, you will see the evaluation result printed out before the first\n",
"epoch is done."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"editable": true,
"is_executing": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-cell",
"remove-output"
]
},
"outputs": [],
"source": [
"import gymnasium as gym\n",
"import torch\n",
"\n",
"from tianshou.data import Collector, VectorReplayBuffer\n",
"from tianshou.env import DummyVectorEnv\n",
"from tianshou.policy import PPOPolicy\n",
"from tianshou.trainer import OnpolicyTrainer\n",
"from tianshou.utils.net.common import ActorCritic, Net\n",
"from tianshou.utils.net.discrete import Actor, Critic\n",
"\n",
"device = \"cuda\" if torch.cuda.is_available() else \"cpu\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"editable": true,
"is_executing": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [],
"source": [
"# environments\n",
"env = gym.make(\"CartPole-v1\")\n",
"train_envs = DummyVectorEnv([lambda: gym.make(\"CartPole-v1\") for _ in range(20)])\n",
"test_envs = DummyVectorEnv([lambda: gym.make(\"CartPole-v1\") for _ in range(10)])\n",
"\n",
"# model & optimizer\n",
"net = Net(env.observation_space.shape, hidden_sizes=[64, 64], device=device)\n",
"actor = Actor(net, env.action_space.n, device=device).to(device)\n",
"critic = Critic(net, device=device).to(device)\n",
"actor_critic = ActorCritic(actor, critic)\n",
"optim = torch.optim.Adam(actor_critic.parameters(), lr=0.0003)\n",
"\n",
"# PPO policy\n",
"dist = torch.distributions.Categorical\n",
"policy = PPOPolicy(\n",
" actor=actor,\n",
" critic=critic,\n",
" optim=optim,\n",
" dist_fn=dist,\n",
" action_space=env.action_space,\n",
" action_scaling=False,\n",
")\n",
"\n",
"\n",
"# collector\n",
"train_collector = Collector(policy, train_envs, VectorReplayBuffer(20000, len(train_envs)))\n",
"test_collector = Collector(policy, test_envs)\n",
"\n",
"# trainer\n",
"result = OnpolicyTrainer(\n",
" policy=policy,\n",
" batch_size=256,\n",
" train_collector=train_collector,\n",
" test_collector=test_collector,\n",
" max_epoch=10,\n",
" step_per_epoch=50000,\n",
" repeat_per_collect=10,\n",
" episode_per_test=10,\n",
" step_per_collect=2000,\n",
" stop_fn=lambda mean_reward: mean_reward >= 195,\n",
")\n",
"print(result)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "G9YEQptYvCgx",
"is_executing": true,
"outputId": "2a9b5b22-be50-4bb7-ae93-af7e65e7442a"
},
"outputs": [],
"source": [
"# Let's watch its performance!\n",
"policy.eval()\n",
"result = test_collector.collect(n_episode=1, render=False)\n",
"print(\"Final reward: {}, length: {}\".format(result[\"rews\"].mean(), result[\"lens\"].mean()))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xFYlcPo8fpPU"
},
"source": [
"## Tutorial Introduction\n",
"\n",
"A common DRL experiment as is shown above may require many components to work together. The agent, the\n",
"environment (possibly parallelized ones), the replay buffer and the trainer all work together to complete a\n",
"training task.\n",
"\n",
"<div align=center>\n",
"<img src=\"https://tianshou.readthedocs.io/en/master/_images/pipeline.png\">\n",
"\n",
"</div>\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kV_uOyimj-bk"
},
"source": [
"In Tianshou, all of these main components are factored out as different building blocks, which you\n",
"can use to create your own algorithm and finish your own experiment.\n",
"\n",
"Building blocks may include:\n",
"- Batch\n",
"- Replay Buffer\n",
"- Vectorized Environment Wrapper\n",
"- Policy (the agent and the training algorithm)\n",
"- Data Collector\n",
"- Trainer\n",
"- Logger\n",
"\n",
"\n",
"Check this [webpage](https://tianshou.readthedocs.io/en/master/tutorials/dqn.html) to find jupyter-notebook-style tutorials that will guide you through all these\n",
"modules one by one. You can also read the [documentation](https://tianshou.readthedocs.io/en/master/) of Tianshou for more detailed explanation and\n",
"advanced usages."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "S0mNKwH9i6Ek"
},
"source": [
"## Further reading"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "M3NPSUnAov4L"
},
"source": [
"### What if I am not familiar with the PPO algorithm itself?\n",
"As for the DRL algorithms themselves, we will refer you to the [Spinning up documentation](https://spinningup.openai.com/en/latest/algorithms/ppo.html), where they provide\n",
"plenty of resources and guides if you want to study the DRL algorithms. In Tianshou's tutorials, we will\n",
"focus on the usages of different modules, but not the algorithms themselves."
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Loading
Loading