I'm trying to implement some reinforcement-learning algorithms. Most of my implementation based on three lectures below:
- Richard Sutton's & Andrew Barto's Book Reinforcement Learning: An Introduction (2nd Ed)
- David Silver's Lecture
- Udacity's Reinforcement-learning lecture (Georgia Tech).
My codes are like a rewrite from Denny Britz's Repo, But because I can't write such a beautiful code like he does yet :( So I try to implement many of it by myself ;)
- Grid World (Environment, DP-Policy Evaluation, DP-Policy Iteration, DP-Value Iteration)
- Black Jack (Environment)
- Windy Grid World (Environment)
- Grid World (Environment, DP-Policy Evaluation, DP-Policy Iteration, DP-Value Iteration)
- Black Jack Monte-Carlo (Prediction, Control, Simulation)
- Black Jack Temporal-Difference (TD[0]) (Prediction, Control a.k.a SARSA, Simulation)
- Windy Grid World Temporal-Difference (TD[0]) (Control a.k.a SARSA, Simulation)
- Black Jack TD-lambda (Prediction