slots

Multi-armed bandit library in Python

Documentation

This documents details the current and planned API for slots. Non-implemented features are noted as such.

What does the library need to do? An aspirational list.

Set up N bandits with probabilities, p_i, and payouts, pay_i.
Implement several MAB strategies, with kwargs as parameters, and consistent API.
Allow for T trials.
Continue with more trials (i.e. save state after trials).
Values to save:
1. Current choice
2. number of trials completed for each arm
3. scores for each arm
4. average payout per arm (wins/trials?)
5. Current regret. Regret = Trials*mean_max - sum^T_t=1(reward_t)
  - See ref
Use sane defaults.
Be obvious and clean.
For the time being handle only binary payouts.

Library API ideas:

Running slots with a live website

# Using slots to determine the best of 3 variations on a live website. 3 is the default number of bandits and epsilon greedy is the default strategy.
mab = slots.MAB(3, live=True)

# Make the first choice randomly, record responses, and input reward
# 2 was chosen.
# Update online trial (input most recent result) until test criteria is met.
mab.online_trial(bandit=2,payout=1)

# Repsonse of mab.online_trial() is a dict of the form:
{'new_trial': boolean, 'choice': int, 'best': int}

# Where:
#   If the criterion is met, new_trial = False.
#   choice is the current choice of arm to try next.
#   best is the current best estimate of the highest payout arm.

Creating a MAB test instance:

# Default: 3 bandits with random probabilities, p_i.
mab = slots.MAB()

# Set up 4 bandits with random p_i.
mab = slots.MAB(4)

# 4 bandits with specified p_i
mab = slots.MAB(probs = [0.2,0.1,0.4,0.1])

# Creating 3 bandits with histoprical payout data
mab = slots.MAB(3, hist_payouts = np.array([[0,0,1,...],
                                            [1,0,0,...],
                                            [0,0,0,...]]))

Running tests with strategy, S

# Default: Epsilon-greedy, epsilon = 0.1, num_trials = 100
mab.run()

# Run chosen strategy with specified parameters and number of trials
mab.run(strategy = 'eps_greedy',params = {'eps':0.2}, trials = 10000)

# Run strategy, updating old trial data
# (NOT YET IMPLEMENTED)
mab.run(continue = True)

Displaying / retrieving bandit properties

# Default: display number of bandits, probabilities and payouts
# (NOT YET IMPLEMENTED)
mab.bandits.info()

# Display info for bandit i
# (NOT YET IMPLEMENTED)
mab.bandits[i]

# Retrieve bandits' payouts, probabilities, etc
mab.bandits.payouts
mab.bandits.probs

# Retrieve count of bandits
# (NOT YET IMPLEMENTED)
mab.bandits.count

Setting bandit properties

# Reset bandits to defaults
# (NOT YET IMPLEMENTED)
mab.bandits.reset()

# Set probabilities or payouts
# (NOT YET IMPLEMENTED)
mab.bandits.set_probs([0.1,0.05,0.2,0.15])
mab.bandits.set_hist_payouts([[1,1,0,0],[0,1,0,0]])

Displaying / retrieving test info

# Retrieve current "best" bandit
mab.best()

# Retrieve bandit probability estimates
# (NOT YET IMPLEMENTED)
mab.prob_est()

# Retrieve bandit probability estimate of bandit i
# (NOT YET IMPLEMENTED)
mab.est_prob(i)

# Retrieve bandit probability estimates
mab.est_probs()

# Retrieve current bandit choice
# (NOT YET IMPLEMENTED, use mab.choices[-1])
mab.current()

# Retrieve sequence of choices
mab.choices

# Retrieve probability estimate history
# (NOT YET IMPLEMENTED)
mab.prob_est_sequence

# Retrieve test strategy info (current strategy) -- a dict
# (NOT YET IMPLEMENTED)
mab.strategy_info()

Proposed MAB strategies

Epsilon-greedy
Epsilon decreasing
Softmax
Softmax decreasing
Upper credible bound
Bayesian bandits

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

slots-docs.md

slots-docs.md

slots

Multi-armed bandit library in Python

Documentation

What does the library need to do? An aspirational list.

Library API ideas:

Running slots with a live website

Creating a MAB test instance:

Running tests with strategy, S

Displaying / retrieving bandit properties

Setting bandit properties

Displaying / retrieving test info

Proposed MAB strategies

Files

slots-docs.md

Latest commit

History

slots-docs.md

File metadata and controls

slots

Multi-armed bandit library in Python

Documentation

What does the library need to do? An aspirational list.

Library API ideas:

Running slots with a live website

Creating a MAB test instance:

Running tests with strategy, S

Displaying / retrieving bandit properties

Setting bandit properties

Displaying / retrieving test info

Proposed MAB strategies