Policy Gradient Minimal implementation of Stochastic Policy Gradient Algorithm in Keras Pong Agent This PG agent seems to get more frequent wins after about 8000 episodes. Below is the score graph.