Name		Name	Last commit message	Last commit date
parent directory ..
graph		graph
saved_training		saved_training
visual		visual
ActorCriticBrainSharedNetwork.py		ActorCriticBrainSharedNetwork.py
ActorCriticBrainSplitNetwork.py		ActorCriticBrainSplitNetwork.py
BatchActorCriticAgent.py		BatchActorCriticAgent.py
OnlineActorCriticAgent.py		OnlineActorCriticAgent.py
OnlineTwoInputAdvantageActorCriticAgent.py		OnlineTwoInputAdvantageActorCriticAgent.py
README.md		README.md
__init__.py		__init__.py
__main__.py		__main__.py

README.md

Deep Reinforcement Learning

:: Advantage Actor-Critic

Advantage Actor-Critic method are close cousin of Policy Gradient class algorithm. The difference is that they use two neural networks instead of one: the actor who has the responsibility of finding the best action given a observation and the critic who has the responsibility of assessing if the actor does a good job.

The two main goals of this essay were to first, get a deeper understanding of Actor-Critic method theoric aspect and second, to acquire a practical understanding of it’s beavior, limitation and requirement in order to work. In order to reach this second goal, I felt it was nescessary to implement multiple design & architectural variation commonly found in the litterature.

With this in mind, I’ve focused on the following practical aspect:

Algorithm type: batch vs online;
Computation graph: split network vs split network (with shared lower layer) vs shared network;
Critic target: Monte-Carlo vs bootstrap estimate;
Math computation: element wise vs graph computed;
Various Data collection strategy;

In parallel, I writen a second essay A reflexion on design, architecture and implementation details where I go further in my study of somme aspect of DRL algortihm from a software engineering perspective applied to research by covering question like:

Does implementation details realy matters? Which one does, when & why?

I've also complemented my reading with the following ressources:

The classic book Reinforcement learning: An introduction 2nd ed. by Sutton & Barto (ed MIT Press)
CS 294--112 Deep Reinforcement Learning: lecture on Policy Gradient and Actor-Critic by Sergey Levine from University Berkeley;
OpenAI: Spinning Up: Intro to Policy Optimization, by Josh Achiam;
and Lil' Log blog:Policy Gradient Algorithms by Lilian Weng, research intern at OpenAI
Asynchronous Methods for Deep Reinforcement Learning. by Mnih et al.
Reinforcement learning that matters by Henderson et al.
TD or not TD: Analyzing the Role of Temporal Differencing in Deep Reinforcement Learning by Amiranashvili, Dosovitskiy, Koltun & Brox
High-Dimensional Continuous Control Using Generalized Advantage Estimation by Schulman, Moritz, Levine, Jordan & Abbeel

Download the essay pdf:

Watch recorded agent

The Actor-Critic implementations:

Note: You can check explanation on how to use the package by using the --help flag

To watch the trained algorithm

cd DRLimplementation
python -m ActorCritic --play[Lunar or Cartpole] [--record] [--play_for]=max trajectories (default=10)

To execute the training loop

cd DRLimplementation
python -m ActorCritic --trainExperimentSpecification [--rerun] [--renderTraining]

Choose --trainExperimentSpecification between the following:

CartPole-v0 environment:
- --trainSplitMC: Train a Batch Actor-Critic agent with Monte Carlo TD target
- --trainSplitBootstrap: Train a Batch Actor-Critic agent with bootstrap estimate TD target
- --trainSharedBootstrap: Train a Batch Actor-Critic agent with shared network
- --trainOnlineSplit: Train a Online Actor-Critic agent with split network
- --trainOnlineSplit3layer: Train a Online Actor-Critic agent with split network
- --trainOnlineShared3layer: Train a Online Actor-Critic agent with Shared network
- --trainOnlineSplitTwoInputAdvantage: Train a Online Actor-Critic agent with split Two input Advantage network
LunarLander-v2 environment:
- --trainOnlineLunarLander: Train on LunarLander a Online Actor-Critic agent with split Two input Advantage network
- --trainBatchLunarLander: Train on LunarLander a Batch Actor-Critic agent

To navigate trough the computation graph in TensorBoard

cd DRLimplementation
tensorboard --logdir=ActorCritic/graph

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ActorCritic

ActorCritic

README.md

:: Advantage Actor-Critic

The Actor-Critic implementations:

To watch the trained algorithm

To execute the training loop

To navigate trough the computation graph in TensorBoard

Files

ActorCritic

Directory actions

More options

Directory actions

More options

Latest commit

History

ActorCritic

Folders and files

parent directory

README.md

:: Advantage Actor-Critic

The Actor-Critic implementations:

To watch the trained algorithm

To execute the training loop

To navigate trough the computation graph in TensorBoard