Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiments statistics: migration plan #26713

Open
21 of 36 tasks
jurajmajerik opened this issue Dec 6, 2024 · 6 comments
Open
21 of 36 tasks

Experiments statistics: migration plan #26713

jurajmajerik opened this issue Dec 6, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request feature/experimentation Feature Tag: Experimentation

Comments

@jurajmajerik
Copy link
Contributor

jurajmajerik commented Dec 6, 2024

Feature request

  • Trends: event counts
    • Get final feedback from Jimmy
    • Get feedback and sign-off from @andehen
    • Implement the expected loss calculation which complements the significance decision
    • Write test cases:
      • Small / large sample size
      • Two / many variants
      • Significant / strongly significant / not significant
    • Implement the method in code behind a flag
    • Update docs
    • Compare results using existing production experiments
    • Release for all new experiments created after the cut-off date
  • Trends: continuous value
    • Get final feedback from Jimmy
    • Implement the expected loss calculation which complements the significance decision
    • Get feedback and sign-off from @andehen
    • Write test cases:
      • Small / large sample size
      • Two / many variants
      • Significant / strongly significant / not significant
    • Implement the method in code behind a flag
    • Update docs
    • Compare results using existing production experiments
    • Release for all new experiments created after the cut-off date
  • Funnels: conversion rate
    • Implement the expected loss calculation which complements the significance decision
    • Get final feedback from Jimmy
    • Get feedback and sign-off from @andehen
    • Write test cases:
      • Small / large sample size
      • Two / many variants
      • Significant / strongly significant / not significant
    • Implement the method in code behind a flag
    • Update docs
    • Compare results using existing production experiments
    • Release for all new experiments created after the cut-off date

Debug info

- [ ] PostHog Cloud, Debug information: [please copy/paste from https://us.posthog.com/settings/project-details#variables]
- [ ] PostHog Hobby self-hosted with `docker compose`, version/commit: [please provide]
- [ ] PostHog self-hosted with Kubernetes (deprecated, see [`Sunsetting Kubernetes support`](https://posthog.com/blog/sunsetting-helm-support-posthog)), version/commit: [please provide]
@jurajmajerik jurajmajerik added enhancement New feature or request feature/experimentation Feature Tag: Experimentation labels Dec 6, 2024
@danielbachhuber
Copy link
Contributor

@jurajmajerik Could you share a bit more detail about what you're thinking w/r/t test cases for each?

Also, should we have an "Update documentation" item?

@jurajmajerik
Copy link
Contributor Author

Could you share a bit more detail about what you're thinking w/r/t test cases for each?

@danielbachhuber I was thinking of testing all the permutations of the test cases suggested in the list, so for example:

  • Low sample size, two variants, significant results
  • High sample size, two variants, significant results
  • ...

There are also tests for the existing methods, I just haven't had time to look closer at those.

Also, should we have an "Update documentation" item?
Good point, added it :)

@danielbachhuber
Copy link
Contributor

@jurajmajerik Is there some documentation on why each methodology for each scenario? e.g. why does Trends continuous take the mean and then apply some log variance?

@jurajmajerik
Copy link
Contributor Author

@danielbachhuber some of this is covered in our main jupyter notebook: https://colab.research.google.com/drive/1hcWMsaS2GvMM0YeFCVctXfVWiNM5WDwq?usp=sharing

At a high level, the goal is to choose a probability distribution that reflects the kind of values you'd expect in real life. For a continuous value like revenue, the distribution starts at zero and extends into positive values. This is why a logarithm is applied - it ensures the values start at zero.

As for why the mean is used for a continuous value, I assume it's because you need a way to fairly compare the two groups. Comparing the sums wouldn’t work since the sample sizes might be different. Taking the mean per user gives a more accurate comparison.

@andehen does the above make sense and can you provide more detail?

@danielbachhuber
Copy link
Contributor

https://posthoghelp.zendesk.com/agent/tickets/21955 is interested in trying this out when it's ready

@danielbachhuber
Copy link
Contributor

@jurajmajerik One thing worth noting that came up in conversation with @andehen today...

Our current implementation of MIN_PROBABILITY_FOR_SIGNIFICANCE and expected loss means that HIGH_LOSS probably won't ever be seen:

if max_probability >= MIN_PROBABILITY_FOR_SIGNIFICANCE:
# Find best performing variant
all_variants = [control, *variants]
conversion_rates = [v.success_count / (v.success_count + v.failure_count) for v in all_variants]
best_idx = np.argmax(conversion_rates)
best_variant = all_variants[best_idx]
other_variants = all_variants[:best_idx] + all_variants[best_idx + 1 :]
expected_loss = calculate_expected_loss_v2(best_variant, other_variants)
if expected_loss >= EXPECTED_LOSS_SIGNIFICANCE_LEVEL:
return ExperimentSignificanceCode.HIGH_LOSS, expected_loss
return ExperimentSignificanceCode.SIGNIFICANT, expected_loss

Higher probability and expected loss are inversely correlated, so a probability of >90% means that expected loss is probably less than 1%.

Also worth noting that the current implementation of are_results_significant for Trends returns a p_value:

p_value = calculate_p_value(control_variant, test_variants)
if p_value >= P_VALUE_SIGNIFICANCE_LEVEL:
return ExperimentSignificanceCode.HIGH_P_VALUE, p_value
return ExperimentSignificanceCode.SIGNIFICANT, p_value

However, it's only ever used in this condition, so probably not an immediate problem for us:

if (experimentResults?.significance_code === SignificanceCode.HighPValue) {
return (
<>
This is because the p value is greater than 0.05
<Tooltip
placement="right"
title={<>Current value is {experimentResults?.p_value?.toFixed(3) || 1}.</>}
>
<IconInfo className="ml-1 text-muted text-xl" />
</Tooltip>
.
</>
)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature/experimentation Feature Tag: Experimentation
Projects
None yet
Development

No branches or pull requests

3 participants