Skip to content

Latest commit

 

History

History
12 lines (8 loc) · 472 Bytes

File metadata and controls

12 lines (8 loc) · 472 Bytes

Multi-arm Bandits Exploration

This is an bandit experiment that implements different exploration techniques for a 10-arm testbed as described in the Reinforcement Learning Book by Sutton & Barto.

The exploration techniques covered include:

  • ε-greedy
  • Optimistic Initialization
  • UCB Exploration
  • Boltzmann (Softmax) Exploration

This experiment further compares the different exploration techniques and concludes on which is better to use in different settings.