- Update requirements.txt
- Design the architecture of code
- Choice test tool and init them
- Choice docs tool and init this
- Config codecov
- Config codefactor
- Create Code style standard
- Document it in CONTRIBUTING.md
- List Agents for starting project
- List Environments for start project
- Add gpu option
- Render on notebook/collab
- Add progress bar for training
-
Random Agent
-
Constant Agent
-
Deep Q Network (Mnih et al., 2013)
-
Deep Recurrent Q Network (Hausknecht et al., 2015)
-
Persistent Advantage Learning (Bellamare et al., 2015)
-
Double Deep Q Network (van Hasselt et al., 2016)
-
Dueling Q Network (Wang et al., 2016)
-
Bootstraped Deep Q Network (Osband et al., 2016)
-
Continuous Deep Q Network (Guet al., 2016)
-
Categorical Deep Q Network (Bellamare et al., 2017)
-
Quantile Regression DQN (Dabney et al, 2017)
-
Rainbow (Hessel et al., 2017)
-
Quantile Regression Deep Q Network (Dabney et al., 2017)
-
Soft Actor-Critic (Haarnoja et al, 2018)
-
Vanilla Policy Gradient (2000)
-
Deep Deterministic Policy Gradient (Lillicrap et al, 2015)
-
Twin Delayed DDPG (Fujimoto et al, 2018)
-
Trust Region Policy Optimization (Schulman et al., 2015)
-
Proximal Policy Optimizations (Schulman et al., 2017)
-
A2C (Mnih et al, 2016)
-
A3C (Mnih et al, 2016)
-
Hindsight Experience Replay (Andrychowicz et al, 2017)
- base network support discrete action space
- base network support continuous action space
- base network support discrete observation space
- base network support continuous observation space
- simple network support discrete/continuous action/observation space
- c51 network support discrete action/observation space
- base dueling network support discrete/continuous action/observation space
- simple dueling network support discrete/continuous action/observation space
- Random
- Epsilon Greedy
- Intrinsic Curiosity Module (Pathak et al., 2017)
- Random Network Distillation (Burda et al., 2017)
-
No memory (= model based)
-
Trajectory replay
-
Experience Replay (Lin, 1992)
-
Prioritized Experience Replay (Schaul et al., 2015)
-
Hindsight Experience Replay (Andrychowicz et al., 2017)
-
Add temporal difference option in all memories
-
Add Discount reward in experience replay
-
Add average reward
- Gym CartPole