Skip to content

Latest commit

 

History

History
93 lines (62 loc) · 3.6 KB

TODO.md

File metadata and controls

93 lines (62 loc) · 3.6 KB

ToDo list

  • Update requirements.txt
  • Design the architecture of code
  • Choice test tool and init them
  • Choice docs tool and init this
  • Config codecov
  • Config codefactor
  • Create Code style standard
  • Document it in CONTRIBUTING.md
  • List Agents for starting project
  • List Environments for start project
  • Add gpu option
  • Render on notebook/collab
  • Add progress bar for training

Agents list

  • Random Agent

  • Constant Agent

  • Deep Q Network (Mnih et al., 2013)

  • Deep Recurrent Q Network (Hausknecht et al., 2015)

  • Persistent Advantage Learning (Bellamare et al., 2015)

  • Double Deep Q Network (van Hasselt et al., 2016)

  • Dueling Q Network (Wang et al., 2016)

  • Bootstraped Deep Q Network (Osband et al., 2016)

  • Continuous Deep Q Network (Guet al., 2016)

  • Categorical Deep Q Network (Bellamare et al., 2017)

  • Quantile Regression DQN (Dabney et al, 2017)

  • Rainbow (Hessel et al., 2017)

  • Quantile Regression Deep Q Network (Dabney et al., 2017)

  • Soft Actor-Critic (Haarnoja et al, 2018)

  • Vanilla Policy Gradient (2000)

  • Deep Deterministic Policy Gradient (Lillicrap et al, 2015)

  • Twin Delayed DDPG (Fujimoto et al, 2018)

  • Trust Region Policy Optimization (Schulman et al., 2015)

  • Proximal Policy Optimizations (Schulman et al., 2017)

  • A2C (Mnih et al, 2016)

  • A3C (Mnih et al, 2016)

  • Hindsight Experience Replay (Andrychowicz et al, 2017)

Network

  • base network support discrete action space
  • base network support continuous action space
  • base network support discrete observation space
  • base network support continuous observation space
  • simple network support discrete/continuous action/observation space
  • c51 network support discrete action/observation space
  • base dueling network support discrete/continuous action/observation space
  • simple dueling network support discrete/continuous action/observation space

Explorations list

  • Random
  • Epsilon Greedy
  • Intrinsic Curiosity Module (Pathak et al., 2017)
  • Random Network Distillation (Burda et al., 2017)

Memories list

  • No memory (= model based)

  • Trajectory replay

  • Experience Replay (Lin, 1992)

  • Prioritized Experience Replay (Schaul et al., 2015)

  • Hindsight Experience Replay (Andrychowicz et al., 2017)

  • Add temporal difference option in all memories

  • Add Discount reward in experience replay

  • Add average reward

Environments list

  • Gym CartPole