GitHub - tianbingsz/WALL-E: Codebase for Efficient yet simple Reinforcement Learning Research Framework

WALL-E (report)

Major Contributors: Tianbing Xu (Baidu Research, CA), initiates the project, writes the most of code.
Collaborators: Liang Zhao (Baidu Research, CA), Andrew Zhang (Stanford University), Shunan Zhang (Apple).
An Efficient, Fast, yet Simple Reinforcement Learning Research Framework codebase with potential applications in Robotics and beyond.

Motivations:

This is a long term Reinfocement Learning project focused on developing an efficient, yet simple RL framework to support the ongoing RL research related to systems, methodologies, et cetera. The first completed milestone is speding up RL with multi-process architectural support. In RL, the time to collect experience by running a policy on the environment MDP is a bottleneck, taking much more time compared to the computations of policy learning on GPU. With the multi-process support, we are able to collect experience in parallel and thus reduce the data collection time by a near linear factor.

Reinforcement Learning Architecture Design

General Reinforcement Learning Framework

Agent is responsible for updating policy given experience generated by Sampler.
Sampler is responsible for generating experience from updated policy by executing on environment MDP.

RL Framework with Multi-Process Experience Collection

Agent processor is running asynchronously and updates policy based on experience from Experience Queue when ready, sending policy parameters to Policy Queue.
There are N Sampler Processors running in parallel. Each Sampler Processor generates experience based on the updated policy read from the primed Policy Queue and sends experience to the Experience Queue.

Dependencies

Python 3.6
The Usual Suspects: NumPy, matplotlib, scipy
TensorFlow
gym - installation instructions
MuJoCo (30-day trial available and free to students)
Pickle

Refer to requirements.txt for more details.

If you are using conda

conda env create -f conda_walle.yml --prefix=`which conda`/../../envs/walle
source activate walle

Running Command

Single Process, using one CPU to collect experience

cd ./src
CUDA_VISIBLE_DEVICES=0
python main.py HalfCheetah-v2 -it 1000 -b 10000

Multi-Process, Parallel Sampler (10 CPU Processes to collect experience)

cd ./src
CUDA_VISIBLE_DEVICES=0
python run_parallel_main.py HalfCheetah-v2 -it 1000 -b 1000 -n 10

Plot Curve

cd ./src/experiment
python plotcurve.py -x xvariable -i /path-to-log/ -o fig.png
python plotcurve_cmp.py -x xvariable -i /path-to-log/ -b /path-to-baseline-log/ -o fig.png

Results on Mujoco Tasks

Running Time for collecting 20K samples per iteration.
Speedup for collecting 20K samples per iteration. The running time is not very accurate as it is fast, and there is variance from the asynchronously nature and Queue I/O. The basic conclusion is that experience collection speedup w.r.t. CPU numbers is near linear (while not over-linear).
As we can see, the average policy learning time per iteration is about 0.04 min and keeps almost the same for different number of processors.

With the increasing number of processors to collect experience, the experience collection time is near-linearly reduced. Experience collection time is no longer the bottleneck. Instead, the policy learning time takes more and more perecentages and becomes the bottleneck of total running time.

Reference

Danijar Hafner, James Davidson, Vincent Vanhoucke, "TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow"
Kevin Frans, Danijar Hafner, "Speeding Up TRPO Through Parallelization and Parameter Adaptation"
Hao Liu, Yihao Feng, Yi Mao, Dengyong Zhou, Jian Peng, Qiang Liu, "Action-depedent Control Variates for Policy Optimization via Stein's Identity"

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
Doc		Doc
hooks		hooks
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
conda_walle.yml		conda_walle.yml
requirements.txt		requirements.txt
start.sh		start.sh
walle_report.pdf		walle_report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WALL-E (report)

Motivations:

Reinforcement Learning Architecture Design

General Reinforcement Learning Framework

RL Framework with Multi-Process Experience Collection

Dependencies

Running Command

Single Process, using one CPU to collect experience

Multi-Process, Parallel Sampler (10 CPU Processes to collect experience)

Plot Curve

Results on Mujoco Tasks

Reference

About

Releases

Packages

Contributors 4

Languages

tianbingsz/WALL-E

Folders and files

Latest commit

History

Repository files navigation

WALL-E (report)

Motivations:

Reinforcement Learning Architecture Design

General Reinforcement Learning Framework

RL Framework with Multi-Process Experience Collection

Dependencies

Running Command

Single Process, using one CPU to collect experience

Multi-Process, Parallel Sampler (10 CPU Processes to collect experience)

Plot Curve

Results on Mujoco Tasks

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages