Budget Allocation on a fitted MMM models #684

thipokKub · 2024-05-14T09:49:07Z

thipokKub
May 14, 2024

I've seen that MMM now support out of sample prediction, lift test, and Budget Allocation

But I've been thinking about the last one, and the current method did not make total sense to me, and I think it is still missing a few key points

The response plot only show the reach to investment level at any given time, and does not account for adstock effect
The budget allocation base on response plot only effective when there is no effect of the past campaign in the same channel. If there is still lingering effect, we will hit diminishing returns
It does not tell you how to response if you have a CAC limit (CAC = investment / reach)

Consider this, I think my sketch solution would be as follows

Term definitions

Assume that adstock effect only effect for k timesteps into the future on a given $i^\text{th}$ channel, given current ads-spending state calculate a baseline reach k timesteps into the future if there is no additional ads-spending on that channel (apply only carryover effect), which denote this as
$$S^i_{t:t+k}$$
The fitted MMM response curve in each timestep, when inject budget at timestep $t$, calculate at timestep $t_1$, for channel i is denoted as
$$f_i(x^i_t, t_1; S^i_{t_1})$$
where it return the number of reach given that we inject budget of $x^i_t$ at timestep t, when evaluate at timestep $t_1$, and given baseline level $S^i_{t:t+k}$. If lets say adstock effect is represented by $\sigma(x, t)$. Then

$$f_i(x^i_t, t_1; S^i_{t_1}) = f_i(\sigma(x^i_t, t_1) + S^i_{t_1})$$

Calculate aggregated budget response function across time, denoted as
$$p_i(x^i_t; S^i_{t:t+k}) = \sum^k_{j = 0} f_i(x^i_t, t +j; S^i_{t + j})$$
This will also incorporate the adstock effect by changing $x^i_t$ over time

Note $\text{CAC} = \frac{\text{investment}}{\text{reach}}$

Optimization

The budget allocation problem can have in 2 aspects

Given a maximum budget $C^\text{max}_t$, and maximum CAC $\rightarrow$ maximize reach

$$ \max_{x^i_t} \quad \sum^m_{i=1} p_i(x^i_t; S^i_{t:t+k}) \quad \quad \textrm{s.t.} \quad \frac{1}{m - 1} \sum^m_{i=1} \frac{x^i_t}{p_i(x^i_t; S^i_{t:t+k}) } \leq CAC_\text{max} \quad \text{, and} \quad \sum^m_{i=1} x^i_t \leq C^\text{max}_t $$

Given a minimum reach of $R^\text{min}_t$, and maxmimum CAC $\rightarrow$ minimize budget

$$ \min_{x^i_t} \quad \sum^m_{i=1} p_i(x^i_t; S^i_{t:t+k}) \quad \quad \textrm{s.t.} \quad \frac{1}{m - 1} \sum^m_{i=1} \frac{x^i_t}{p_i(x^i_t; S^i_{t:t+k}) } \leq CAC_\text{max} \quad \text{, and} \quad \sum^m_{i=1} p_i(x^i_t; S^i_{t:t+k}) \geq R^\text{min}_t $$

So given these condition, it should now be able to optimize for both case

Also, additionally for the first case, it is possible to find an optimize budget policy using greedy algorithm by solving for $C^\text{max}_t$ in each timestep, such that the over all allocation policy does not exceed $C^\text{max}_t$. Formally as

$$ \sum^n_{q=t} C^\text{max}_q \leq C^\text{max} $$

Assume that we can inject budget within into n steps into the future

Given all of this, I'm not sure if I have the correct formulation, so does anyone has any ideas how to improve this? If not, then it should be possible to create an example notebook demonstrate the budget allocation

thipokKub · 2024-05-14T14:30:30Z

thipokKub
May 14, 2024
Author

I've made a simple example

import numpy as np
import pandas as pd
from scipy.optimize import minimize

# Taken from https://towardsdatascience.com/carryover-and-shape-effects-in-media-mix-modeling-paper-review-fd699b509e2d

def geoDecay(alpha, L):
    return alpha**(np.ones(L).cumsum()-1)

def delayed_adstock(alpha, theta, L):
    return alpha**((np.ones(L).cumsum()-1)-theta)**2

def carryover(x, alpha, L, theta = None, func='geo'):
    transformed_x = []
    if func=='geo':
        weights = geoDecay(alpha, L)

    elif func=='delayed':
        weights = delayed_adstock(alpha, theta, L)

    for t in range(x.shape[0]):
        upper_window = t+1
        lower_window = max(0,upper_window-L)
        current_window_x = x[:upper_window]
        t_in_window = len(current_window_x)
        if t < L:
            new_x = (current_window_x*np.flip(weights[:t_in_window], axis=0)).sum()
            transformed_x.append(new_x/weights[:t_in_window].sum())
        elif t >= L:
            current_window_x = x[upper_window-L:upper_window]
            ext_weights = np.flip(weights, axis=0) 
            new_x = (current_window_x*ext_weights).sum()
            transformed_x.append(new_x/ext_weights.sum())

    return np.array(transformed_x)

def beta_hill(x, S, K, beta):
    return beta - (K**S*beta)/(x**S+K**S)

# Define given information
historical_buget = pd.DataFrame({
    "channel_1": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 300, 500, 0, 0, 0, 0],
    "channel_2": [20, 60, 0, 30, 30, 30, 50, 50, 50, 50, 100, 100, 50, 50, 50, 50],
    "channel_3": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 100, 100]
})

fitted_params = {
    "channel_1": {
        "beta_hill": [3, 100, 400],
        "carryover": [0.95, 15, 3,  'delayed']
    },
    "channel_2": {
        "beta_hill": [2, 40, 200],
        "carryover": [0.1, 5, 1,  'delayed']
    },
    "channel_3": {
        "beta_hill": [1, 9.5, 250],
        "carryover": [0.8, 10, 3,  'delayed']
    }
}

# Calculate baseline state
max_window = 16

full_baseline_carryover_state = {}
full_baseline_hill_state = {}

for channel in ["channel_1", "channel_2", "channel_3"]:
    expand = np.concatenate([
        historical_buget[channel].to_numpy()[-max_window:],
        np.zeros(max_window)
    ]) * np.eye(2 * max_window)

    agg = []
    for i in range(len(expand)):
        agg.append(carryover(expand[i, :], *fitted_params[channel]["carryover"]))
    full_baseline_carryover_state[channel] = np.sum(agg, axis=0)
    full_baseline_hill_state[channel] = beta_hill(full_baseline_carryover_state[channel], *fitted_params[channel]["beta_hill"])

full_baseline_carryover_state = pd.DataFrame(full_baseline_carryover_state)
full_baseline_hill_state = pd.DataFrame(full_baseline_hill_state)
baseline_carryover_state = full_baseline_carryover_state.iloc[-max_window:].reset_index(drop=True)
baseline_hill_state = full_baseline_hill_state.iloc[-max_window:].reset_index(drop=True)

# Define function to optimize
def cumulative_reach(budget, baseline_carryover_state, beta_hill_params, carryover_params):
    predictive_window = np.zeros(max_window)
    predictive_window[0] = budget
    
    expand = predictive_window * np.eye(max_window)
    agg = []
    for i in range(len(expand)):
        agg.append(carryover(expand[i, :], *carryover_params))
    predictive_state = beta_hill(np.sum(agg, axis=0) + baseline_carryover_state, *beta_hill_params)
    return predictive_state.sum()

def predictive_outcome(x):
    budget_ch1, budget_ch2, budget_ch3 = x
    return (
        cumulative_reach(budget_ch1, baseline_carryover_state["channel_1"], fitted_params["channel_1"]["beta_hill"], fitted_params["channel_1"]["carryover"]) +
        cumulative_reach(budget_ch2, baseline_carryover_state["channel_2"], fitted_params["channel_2"]["beta_hill"], fitted_params["channel_2"]["carryover"]) +
        cumulative_reach(budget_ch3, baseline_carryover_state["channel_3"], fitted_params["channel_3"]["beta_hill"], fitted_params["channel_3"]["carryover"])
    )

def minimize_inv_predictive_outcome(*args, **kwargs):
    return -1 * predictive_outcome(*args, **kwargs)

Use case 1

Given maximum budget, maximize reach

max_budget = 200
max_cac = 0.1
res = minimize(
    minimize_inv_predictive_outcome, np.zeros(3),
    bounds=[(0, max_budget), (0, max_budget), (0, max_budget)],
    constraints=(
        {'type': 'ineq', 'fun': lambda x: max_budget -1*(x[0] + x[1] + x[2])}, 
        {'type': 'ineq', 'fun': lambda x: max_cac - x.sum()/(predictive_outcome(x) + 1e-3)}
    )
)
print(res)
print(f"Total budget used: {res.x.sum():.3f}; Reach: {-1 * res.fun:.3f}; CAC: {(-1 * res.x.sum() / res.fun):.3f}; New CAC Only: {res.x.sum() / (-1 * res.fun - baseline_hill_state.sum().sum()):.3f}")

Output

 message: Optimization terminated successfully
 success: True
  status: 0
     fun: -2454.211891235602
       x: [ 1.493e+02  2.526e+01  2.541e+01]
     nit: 15
     jac: [-3.963e+00 -3.963e+00 -3.963e+00]
    nfev: 60
    njev: 15
Total budget used: 200.000; Reach: 2454.212; CAC: 0.081; New CAC Only: 0.202

Use case 2

Given minimum reach, minimize budget

minimum_reach = 5000
max_cac = 0.1
res = minimize(
    predictive_outcome, np.zeros(3),
    bounds=[(0, np.inf), (0, np.inf), (0, np.inf)],
    constraints=(
        { 'type': 'ineq', 'fun': lambda x: predictive_outcome(x) - minimum_reach }, 
        { 'type': 'ineq', 'fun': lambda x: max_cac - x.sum()/(predictive_outcome(x) + 1e-3) })
)
print(res)
print(f"Total budget used: {res.x.sum():.3f}; Reach: {res.fun:.3f}; CAC: {(res.x.sum() / res.fun):.3f}; New CAC Only: {res.x.sum() / (res.fun - baseline_hill_state.sum().sum()):.3f}")

Output

 message: Positive directional derivative for linesearch
 success: False
  status: 8
     fun: 2748.0374089478237
       x: [ 2.097e+02  3.415e+01  3.618e+01]
     nit: 21
     jac: [ 3.399e+00  3.399e+00  3.399e+00]
    nfev: 99
    njev: 17
Total budget used: 280.053; Reach: 2748.037; CAC: 0.102; New CAC Only: 0.218

If the model parameters has uncertainty, you can just sample a bunch of times and solve for those specific case. Then there should be a sample distribution of budget solutions. Those should be able to estimate the solution uncertainty, and quantify upper, and lower bound (probably)

1 reply

thipokKub May 16, 2024
Author

I've been thinking about a bit more non trivia solution e.g. what is the best strategy to inject budget + allocate budget for each channel. Then I realize that a policy for each channel can be thought of a categorical vector (normalize to one), and this have a nice way of representing it using Dirichlet prior. So I modify my code a bit and got the following

P.S. the optimization time is really slow, because instead of just 3 parameters, it goes to 3*(k+1) parameters. Based on what I've searched the optimizer is O(n^2), so it grows from O(9) to O(9*(k+1)^2)

predictive_window_max_length = 10
max_window = 16

full_baseline_carryover_state = {}
full_baseline_hill_state = {}

for channel in ["channel_1", "channel_2", "channel_3"]:
    expand = np.concatenate([
        historical_buget[channel].to_numpy()[-max_window:],
        np.zeros(max_window + predictive_window_max_length)
    ]) * np.eye(2 * max_window + predictive_window_max_length)

    agg = []
    for i in range(len(expand)):
        agg.append(carryover(expand[i, :], *fitted_params[channel]["carryover"]))
    full_baseline_carryover_state[channel] = np.sum(agg, axis=0)
    full_baseline_hill_state[channel] = beta_hill(full_baseline_carryover_state[channel], *fitted_params[channel]["beta_hill"])

full_baseline_carryover_state = pd.DataFrame(full_baseline_carryover_state)
full_baseline_hill_state = pd.DataFrame(full_baseline_hill_state)
baseline_carryover_state = full_baseline_carryover_state.iloc[-(max_window + predictive_window_max_length):].reset_index(drop=True)
baseline_hill_state = full_baseline_hill_state.iloc[-(max_window + predictive_window_max_length):].reset_index(drop=True)

def cumulative_reach_general(budget, predictive_norm_window, baseline_carryover_state, beta_hill_params, carryover_params):
    predictive_window = budget * predictive_norm_window
    expand = np.concatenate([predictive_window, np.zeros(max_window)])* np.eye(len(predictive_norm_window) + max_window)
    agg = []
    for i in range(len(expand)):
        agg.append(carryover(expand[i, :], *carryover_params))
    predictive_state = beta_hill(np.sum(agg, axis=0) + baseline_carryover_state, *beta_hill_params)
    return predictive_state.sum()

def predictive_outcome_general(x):
    budget_ch1, budget_ch2, budget_ch3 = x[:3]
    all_priors = x[3:].reshape(3, -1)
    
    predictive_norm_window_ch1 = all_priors[0] / all_priors[0].sum()
    predictive_norm_window_ch2 = all_priors[1] / all_priors[1].sum()
    predictive_norm_window_ch3 = all_priors[2] / all_priors[2].sum()
    return (
        cumulative_reach_general(budget_ch1, predictive_norm_window_ch1, baseline_carryover_state["channel_1"], fitted_params["channel_1"]["beta_hill"], fitted_params["channel_1"]["carryover"]) +
        cumulative_reach_general(budget_ch2, predictive_norm_window_ch2, baseline_carryover_state["channel_2"], fitted_params["channel_2"]["beta_hill"], fitted_params["channel_2"]["carryover"]) +
        cumulative_reach_general(budget_ch3, predictive_norm_window_ch3, baseline_carryover_state["channel_3"], fitted_params["channel_3"]["beta_hill"], fitted_params["channel_3"]["carryover"])
    )

def minimize_inv_predictive_outcome_general(*args, **kwargs):
    return -1 * predictive_outcome_general(*args, **kwargs)

max_budget = 200
max_cac = 0.2
baseline_hill_sum = baseline_hill_state.sum().sum()

res = minimize(
    minimize_inv_predictive_outcome_general, np.array([
        *([max_budget/3] * 3),
        *([1] * predictive_window_max_length),
        *([1] * predictive_window_max_length),
        *([1] * predictive_window_max_length)
    ]),
    bounds=[
        *([(0, max_budget)] * 3),
        *([(1, 100)] * predictive_window_max_length),
        *([(1, 100)] * predictive_window_max_length), 
        *([(1, 100)] * predictive_window_max_length)
    ],
    constraints=(
        {'type': 'ineq', 'fun': lambda x: max_budget -1*(x[0] + x[1] + x[2])},
        {'type': 'ineq', 'fun': lambda x: max_cac - x[:3].sum()/(predictive_outcome_general(x) - baseline_hill_sum + 1e-3)}
    )
)

response_fns = res.x[3:].reshape(3, -1)/res.x[3:].reshape(3, -1).sum(axis=-1).reshape(-1, 1)

contrib_ch_1 = cumulative_reach_general(res.x[0], response_fns[0], baseline_carryover_state["channel_1"], fitted_params["channel_1"]["beta_hill"], fitted_params["channel_1"]["carryover"])
contrib_ch_2 = cumulative_reach_general(res.x[1], response_fns[1], baseline_carryover_state["channel_2"], fitted_params["channel_2"]["beta_hill"], fitted_params["channel_2"]["carryover"])
contrib_ch_3 = cumulative_reach_general(res.x[2], response_fns[2], baseline_carryover_state["channel_3"], fitted_params["channel_3"]["beta_hill"], fitted_params["channel_3"]["carryover"])

base_ch_1 = cumulative_reach_general(0, response_fns[0], baseline_carryover_state["channel_1"], fitted_params["channel_1"]["beta_hill"], fitted_params["channel_1"]["carryover"])
base_ch_2 = cumulative_reach_general(0, response_fns[1], baseline_carryover_state["channel_2"], fitted_params["channel_2"]["beta_hill"], fitted_params["channel_2"]["carryover"])
base_ch_3 = cumulative_reach_general(0, response_fns[2], baseline_carryover_state["channel_3"], fitted_params["channel_3"]["beta_hill"], fitted_params["channel_3"]["carryover"])

print(res)
print(
    f"Budget: {res.x[0]:.3f}, {res.x[1]:.3f}, {res.x[2]:.3f}; New Reach: " +
    f"{contrib_ch_1 - base_ch_1:.3f}, " + 
    f"{contrib_ch_2 - base_ch_2:.3f}, " + 
    f"{contrib_ch_3 - base_ch_3:.3f}"
)
print(f"Total budget used: {res.x[:3].sum():.3f}; Reach: {-1 * res.fun:.3f}; CAC: {(-1 * res.x[:3].sum() / res.fun):.3f};")
print(f"New Reach Only: {(-1 * res.fun - baseline_hill_sum):.3f}; New CAC Only: {res.x[:3].sum() / (-1 * res.fun - baseline_hill_sum):.3f}")

plt.title("Response Fn (first k steps)")
plt.plot(response_fns[0] * res.x[0], label="channel 1")
plt.plot(response_fns[1] * res.x[1], label="channel 2")
plt.plot(response_fns[2] * res.x[2], label="channel 3")
plt.legend(loc="best")
plt.show()

Output

 message: Optimization terminated successfully
 success: True
  status: 0
     fun: -3244.855743826549
       x: [ 6.917e+01  3.961e-10 ...  1.000e+00  1.000e+02]
     nit: 54
     jac: [-4.890e+00 -2.333e+00 ...  3.115e-01 -1.541e-02]
    nfev: 1837
    njev: 54
Budget: 69.175, 0.000, 130.825; New Reach: 376.099, 0.000, 1402.880
Total budget used: 200.000; Reach: 3244.856; CAC: 0.062;
New Reach Only: 1778.980; New CAC Only: 0.112

And here is the output when change predictive_window_max_length = 20

 message: Iteration limit reached
 success: False
  status: 9
     fun: -3979.987210532122
       x: [ 3.054e-12  5.043e-12 ...  1.000e+00  1.000e+02]
     nit: 100
     jac: [-4.742e+00 -2.458e+00 ...  2.886e-01 -2.722e-02]
    nfev: 6401
    njev: 100
Budget: 0.000, 0.000, 200.000; New Reach: 0.000, 0.000, 2514.111
Total budget used: 200.000; Reach: 3979.987; CAC: 0.050;
New Reach Only: 2514.111; New CAC Only: 0.080

juanitorduz · 2024-05-14T16:53:33Z

juanitorduz
May 14, 2024
Maintainer

@thipokKub Thank you very much for putting this together! I will take the time (the next few days) to review the details!

What I can tell you now is that, in practice, there is no unique optimization path, as different companies have different constraints. I agree with your point and we want to provide alternatives. See, for example #358

In the meantime, if you feel like opening a Pull Request, please go for it :)

Thank you very much for your feedback! It is much appreciated! 🤗

0 replies

cetagostini · 2024-05-14T20:33:13Z

cetagostini
May 14, 2024
Maintainer

@thipokKub thanks for all the work here! You can check #632

There we are modifying the Budget Optimizer. We are aware of the missing pieces you mentioned and now, the optimizer should:

Consider adstock effect. Optimizing based on the lingering effect from point zero to l_max
Allow custom constraints. You'll be able to modify the constraints and optimize based on what you need.
Effect from previous campaigns. After the algorithm is done, the posterior predictive is generated based on the recommendation from the optimizer. Giving you the uncertainty around the optimization and considering the lingering from previous campaigns.
Compatible with all types of saturation functions and adstock functions in the library (Not only geometric adstock).

I would love your input there, maybe something is missing and it's a low-hanging fruit to integrate!

1 reply

thipokKub May 15, 2024
Author

Hi, I'm glad you response so quickly!

Assume that the lingering effect within l_max, I want to ensure that the data input will also need l_max into the past, to calculate appropriate carryover effect. Then we can calculate l_max steps into the future. However in my case I assume that there will be no ads spending in the future (like what if we stop spending scenario) to get the basis. And when the budget is apply, it only apply once at the start with a given amount of budget

I think this should be good enough scenario to better allocate the budget in a general sense. However, if the user want to include more than "impulse" action function, for example a delayed impulse, or a recurring of a fixed budget for k timesteps (which will extend the prediction window to k + l_max steps). My simple example need to be modified

juanitorduz · 2024-05-21T08:22:00Z

juanitorduz
May 21, 2024
Maintainer

Thank You, @thipokKub and @cetagostini, for your valuable input! Your insights have greatly contributed to our discussion. Here's a proposal to advance our work on iterations.

We work to merge User-defined media transformations and custom ordering #632 so that we have an improved version of the optimizer.
We evaluate this new version of the optimizer and revisit the items still missing from this discussion.
Once we have identified the potential improvements @thipokKub you can create a Pull Request (with our help) or we can work on them ourselves (pymc-marketing devs) and @thipokKub can act as a reviewer. If we use some of the code included in this discussion, we will make sure we add you as a contributor as described in https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors
From my end, after we merge User-defined media transformations and custom ordering #632, I will look into how to integrate (or potentially create a new optimizer) based on the draft PR Add optimization notebook #358.

Does that sound like a plan?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Budget Allocation on a fitted MMM models #684

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Budget Allocation on a fitted MMM models #684

thipokKub May 14, 2024

Term definitions

Optimization

Replies: 4 comments · 2 replies

thipokKub May 14, 2024 Author

Use case 1

Use case 2

thipokKub May 16, 2024 Author

juanitorduz May 14, 2024 Maintainer

cetagostini May 14, 2024 Maintainer

thipokKub May 15, 2024 Author

juanitorduz May 21, 2024 Maintainer

thipokKub
May 14, 2024

Replies: 4 comments 2 replies

thipokKub
May 14, 2024
Author

thipokKub May 16, 2024
Author

juanitorduz
May 14, 2024
Maintainer

cetagostini
May 14, 2024
Maintainer

thipokKub May 15, 2024
Author

juanitorduz
May 21, 2024
Maintainer