Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

french-ai / reinforcement Public

Notifications You must be signed in to change notification settings
Fork 1
Star 10

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Breadcrumbs

reinforcement

/

TODO.md

Latest commit

History

93 lines (62 loc) · 3.6 KB

Breadcrumbs

reinforcement

/

TODO.md

File metadata and controls

93 lines (62 loc) · 3.6 KB

ToDo list

Update requirements.txt
Design the architecture of code
Choice test tool and init them
Choice docs tool and init this
Config codecov
Config codefactor
Create Code style standard
Document it in CONTRIBUTING.md
List Agents for starting project
List Environments for start project
Add gpu option
Render on notebook/collab
Add progress bar for training

Agents list

Random Agent
Constant Agent
Deep Q Network (Mnih et al., 2013)
Deep Recurrent Q Network (Hausknecht et al., 2015)
Persistent Advantage Learning (Bellamare et al., 2015)
Double Deep Q Network (van Hasselt et al., 2016)
Dueling Q Network (Wang et al., 2016)
Bootstraped Deep Q Network (Osband et al., 2016)
Continuous Deep Q Network (Guet al., 2016)
Categorical Deep Q Network (Bellamare et al., 2017)
Quantile Regression DQN (Dabney et al, 2017)
Rainbow (Hessel et al., 2017)
Quantile Regression Deep Q Network (Dabney et al., 2017)
Soft Actor-Critic (Haarnoja et al, 2018)
Vanilla Policy Gradient (2000)
Deep Deterministic Policy Gradient (Lillicrap et al, 2015)
Twin Delayed DDPG (Fujimoto et al, 2018)
Trust Region Policy Optimization (Schulman et al., 2015)
Proximal Policy Optimizations (Schulman et al., 2017)
A2C (Mnih et al, 2016)
A3C (Mnih et al, 2016)
Hindsight Experience Replay (Andrychowicz et al, 2017)

Network

base network support discrete action space
base network support continuous action space
base network support discrete observation space
base network support continuous observation space
simple network support discrete/continuous action/observation space
c51 network support discrete action/observation space
base dueling network support discrete/continuous action/observation space
simple dueling network support discrete/continuous action/observation space

Explorations list

Random
Epsilon Greedy
Intrinsic Curiosity Module (Pathak et al., 2017)
Random Network Distillation (Burda et al., 2017)

Memories list

No memory (= model based)
Trajectory replay
Experience Replay (Lin, 1992)
Prioritized Experience Replay (Schaul et al., 2015)
Hindsight Experience Replay (Andrychowicz et al., 2017)
Add temporal difference option in all memories
Add Discount reward in experience replay
Add average reward

Environments list

Gym CartPole

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.