tf_A3C_BipedalWalker BipedalWalker environment from gym, solved with Asynchronous Advantage Actor Critic algorithm using Tensorflow. Agent trained about 30000 episodes per worker in ~21 hour on a single CPU, with 4 workers.