Skip to content

Latest commit

 

History

History
147 lines (114 loc) · 3.85 KB

slots-docs.md

File metadata and controls

147 lines (114 loc) · 3.85 KB

slots

Multi-armed bandit library in Python

Documentation

This documents details the current and planned API for slots. Non-implemented features are noted as such.

What does the library need to do? An aspirational list.

  1. Set up N bandits with probabilities, p_i, and payouts, pay_i.
  2. Implement several MAB strategies, with kwargs as parameters, and consistent API.
  3. Allow for T trials.
  4. Continue with more trials (i.e. save state after trials).
  5. Values to save:
    1. Current choice
    2. number of trials completed for each arm
    3. scores for each arm
    4. average payout per arm (wins/trials?)
    5. Current regret. Regret = Trials*mean_max - sum^T_t=1(reward_t)
  6. Use sane defaults.
  7. Be obvious and clean.
  8. For the time being handle only binary payouts.

Library API ideas:

Running slots with a live website

# Using slots to determine the best of 3 variations on a live website. 3 is the default number of bandits and epsilon greedy is the default strategy.
mab = slots.MAB(3, live=True)

# Make the first choice randomly, record responses, and input reward
# 2 was chosen.
# Update online trial (input most recent result) until test criteria is met.
mab.online_trial(bandit=2,payout=1)

# Repsonse of mab.online_trial() is a dict of the form:
{'new_trial': boolean, 'choice': int, 'best': int}

# Where:
#   If the criterion is met, new_trial = False.
#   choice is the current choice of arm to try next.
#   best is the current best estimate of the highest payout arm.

Creating a MAB test instance:

# Default: 3 bandits with random probabilities, p_i.
mab = slots.MAB()

# Set up 4 bandits with random p_i.
mab = slots.MAB(4)

# 4 bandits with specified p_i
mab = slots.MAB(probs = [0.2,0.1,0.4,0.1])

# Creating 3 bandits with histoprical payout data
mab = slots.MAB(3, hist_payouts = np.array([[0,0,1,...],
                                            [1,0,0,...],
                                            [0,0,0,...]]))

Running tests with strategy, S

# Default: Epsilon-greedy, epsilon = 0.1, num_trials = 100
mab.run()

# Run chosen strategy with specified parameters and number of trials
mab.run(strategy = 'eps_greedy',params = {'eps':0.2}, trials = 10000)

# Run strategy, updating old trial data
# (NOT YET IMPLEMENTED)
mab.run(continue = True)

Displaying / retrieving bandit properties

# Default: display number of bandits, probabilities and payouts
# (NOT YET IMPLEMENTED)
mab.bandits.info()

# Display info for bandit i
# (NOT YET IMPLEMENTED)
mab.bandits[i]

# Retrieve bandits' payouts, probabilities, etc
mab.bandits.payouts
mab.bandits.probs

# Retrieve count of bandits
# (NOT YET IMPLEMENTED)
mab.bandits.count

Setting bandit properties

# Reset bandits to defaults
# (NOT YET IMPLEMENTED)
mab.bandits.reset()

# Set probabilities or payouts
# (NOT YET IMPLEMENTED)
mab.bandits.set_probs([0.1,0.05,0.2,0.15])
mab.bandits.set_hist_payouts([[1,1,0,0],[0,1,0,0]])

Displaying / retrieving test info

# Retrieve current "best" bandit
mab.best()

# Retrieve bandit probability estimates
# (NOT YET IMPLEMENTED)
mab.prob_est()

# Retrieve bandit probability estimate of bandit i
# (NOT YET IMPLEMENTED)
mab.est_prob(i)

# Retrieve bandit probability estimates
mab.est_probs()

# Retrieve current bandit choice
# (NOT YET IMPLEMENTED, use mab.choices[-1])
mab.current()

# Retrieve sequence of choices
mab.choices

# Retrieve probability estimate history
# (NOT YET IMPLEMENTED)
mab.prob_est_sequence

# Retrieve test strategy info (current strategy) -- a dict
# (NOT YET IMPLEMENTED)
mab.strategy_info()

Proposed MAB strategies

  • Epsilon-greedy
  • Epsilon decreasing
  • Softmax
  • Softmax decreasing
  • Upper credible bound
  • Bayesian bandits