Name		Name	Last commit message	Last commit date
parent directory ..
Directions_Legend.png		Directions_Legend.png
TD0_Methods.ipynb		TD0_Methods.ipynb
readme.md		readme.md
solution.py		solution.py

readme.md

Exercise 05

In this exercise we will revisit the included racetrack_environment to have a look at temporal difference (TD) algorithms.

Tasks:

policy evaluation using TD learning
on-policy epsilon-greedy control using TD learning
off-policy epsilon-greedy control using TD learning → Q-learning
using double Q-learning in stochastic environments