Skip to content

a group project about Reinforcement Learning with A2C and PPO

Notifications You must be signed in to change notification settings

LukasDCode/All-You-Can-Creep

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

All You Can Creep

All you can creep.

Presentations

14.12.2020 Presentation KickOff

Final Presentation

Links

Examples

Installation

ML Agents

Gym Wrapper

Environment Exec

Work Distribution

All coding has been done by at least two people at the same time in Pair-Programming. Therefore, we used Visual Studio Code with Live Share, so everyone could participate and write simultaneously. Because at least two people (sometimes 3 or even all 4) have been coding at the same time, everyone has a basic understanding of A2C and PPO. As requested, we divided the different tasks among us in form of experts. The division can be seen in the following table:

Topic Name Info
A2C
Split- & Multihead NN Sofie
Activation Balthasar Sigmoid, Softplus, Softmax, TanH, ReLu
Min-Max-Clamping Balthasar
Loss & Entropy Balthasar
Advantages Sofie A2C, TD, 3-Step, Reinforce
Return Sofie
A2C vs A3C Sofie
----- ----- -----
PPO
Actor & Critic NN Lukas
Memory, Buffer, Batches Lukas
Hyperparameter Denny
Reward Denny
log_prob & prob_ratio Denny
weighted_probs & clipping Lukas
----- ----- -----
Slurm Denny Slurm Runner
Parameter Search Sofie Grid Search, Evolutionary Algorithm
Environments + Unity Lukas
Ml-Flow Balthasar Measures, Artifacts
Save and Load Models Balthasar

About

a group project about Reinforcement Learning with A2C and PPO

Resources

Stars

Watchers

Forks

Packages

No packages published