A game like blackjack except with full replacement and no aces as 1/11's.
Using GPI for Q optimzation, using time varying scalar step and ε-greedy exploration strategy.
Q*(s,a) = Q(s,a) + α ζet(s,a)
Q(s, a) = Φ(s, a)Τ θ
Using overlapping Coarse Coding for feature vector Φ overlapping state space with player sum and dealer initial value.