Experiments statistics: migration plan #26713

jurajmajerik · 2024-12-06T10:19:38Z

Feature request

Debug info

- [ ] PostHog Cloud, Debug information: [please copy/paste from https://us.posthog.com/settings/project-details#variables]
- [ ] PostHog Hobby self-hosted with `docker compose`, version/commit: [please provide]
- [ ] PostHog self-hosted with Kubernetes (deprecated, see [`Sunsetting Kubernetes support`](https://posthog.com/blog/sunsetting-helm-support-posthog)), version/commit: [please provide]

danielbachhuber · 2024-12-06T11:57:55Z

@jurajmajerik Could you share a bit more detail about what you're thinking w/r/t test cases for each?

Also, should we have an "Update documentation" item?

jurajmajerik · 2024-12-09T15:28:20Z

Could you share a bit more detail about what you're thinking w/r/t test cases for each?

@danielbachhuber I was thinking of testing all the permutations of the test cases suggested in the list, so for example:

Low sample size, two variants, significant results
High sample size, two variants, significant results
...

There are also tests for the existing methods, I just haven't had time to look closer at those.

Also, should we have an "Update documentation" item?
Good point, added it :)

danielbachhuber · 2024-12-11T20:57:32Z

@jurajmajerik Is there some documentation on why each methodology for each scenario? e.g. why does Trends continuous take the mean and then apply some log variance?

jurajmajerik · 2024-12-13T09:28:54Z

@danielbachhuber some of this is covered in our main jupyter notebook: https://colab.research.google.com/drive/1hcWMsaS2GvMM0YeFCVctXfVWiNM5WDwq?usp=sharing

At a high level, the goal is to choose a probability distribution that reflects the kind of values you'd expect in real life. For a continuous value like revenue, the distribution starts at zero and extends into positive values. This is why a logarithm is applied - it ensures the values start at zero.

As for why the mean is used for a continuous value, I assume it's because you need a way to fairly compare the two groups. Comparing the sums wouldn’t work since the sample sizes might be different. Taking the mean per user gives a more accurate comparison.

@andehen does the above make sense and can you provide more detail?

danielbachhuber · 2024-12-16T20:22:59Z

https://posthoghelp.zendesk.com/agent/tickets/21955 is interested in trying this out when it's ready

danielbachhuber · 2024-12-19T22:30:12Z

@jurajmajerik One thing worth noting that came up in conversation with @andehen today...

Our current implementation of MIN_PROBABILITY_FOR_SIGNIFICANCE and expected loss means that HIGH_LOSS probably won't ever be seen:

posthog/posthog/hogql_queries/experiments/funnels_statistics_v2.py

Lines 160 to 172 in f4d7603

    
           if max_probability >= MIN_PROBABILITY_FOR_SIGNIFICANCE: 
        
               # Find best performing variant 
        
               all_variants = [control, *variants] 
        
               conversion_rates = [v.success_count / (v.success_count + v.failure_count) for v in all_variants] 
        
               best_idx = np.argmax(conversion_rates) 
        
               best_variant = all_variants[best_idx] 
        
               other_variants = all_variants[:best_idx] + all_variants[best_idx + 1 :] 
        
               expected_loss = calculate_expected_loss_v2(best_variant, other_variants) 
        
               if expected_loss >= EXPECTED_LOSS_SIGNIFICANCE_LEVEL: 
        
                   return ExperimentSignificanceCode.HIGH_LOSS, expected_loss 
        
               return ExperimentSignificanceCode.SIGNIFICANT, expected_loss

Higher probability and expected loss are inversely correlated, so a probability of >90% means that expected loss is probably less than 1%.

Also worth noting that the current implementation of are_results_significant for Trends returns a p_value:

posthog/posthog/hogql_queries/experiments/trends_statistics.py

Lines 110 to 115 in ecaed9a

    
           p_value = calculate_p_value(control_variant, test_variants) 
        
           if p_value >= P_VALUE_SIGNIFICANCE_LEVEL: 
        
               return ExperimentSignificanceCode.HIGH_P_VALUE, p_value 
        
           return ExperimentSignificanceCode.SIGNIFICANT, p_value

However, it's only ever used in this condition, so probably not an immediate problem for us:

posthog/frontend/src/scenes/experiments/experimentLogic.tsx

Lines 1213 to 1226 in ecaed9a

    
           if (experimentResults?.significance_code === SignificanceCode.HighPValue) { 
        
               return ( 
        
                   <> 
        
                       This is because the p value is greater than 0.05 
        
                       <Tooltip 
        
                           placement="right" 
        
                           title={<>Current value is {experimentResults?.p_value?.toFixed(3) || 1}.</>} 
        
                       > 
        
                           <IconInfo className="ml-1 text-muted text-xl" /> 
        
                       </Tooltip> 
        
                       . 
        
                   </> 
        
               ) 
        
           }

jurajmajerik added enhancement New feature or request feature/experimentation Feature Tag: Experimentation labels Dec 6, 2024

danielbachhuber assigned danielbachhuber, andehen and jurajmajerik Dec 9, 2024

danielbachhuber mentioned this issue Dec 9, 2024

Stats refresh #26433

Closed

11 tasks

danielbachhuber mentioned this issue Dec 12, 2024

fix(experiments): Use absolute_exposure not relative exposure #26872

Merged

This was referenced Dec 16, 2024

chore(experiments): Add assertions for property math operations #26952

Merged

feat(experiments): Introduce new Funnels stats methods #26970

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiments statistics: migration plan #26713

Experiments statistics: migration plan #26713

jurajmajerik commented Dec 6, 2024 •

edited by danielbachhuber

Loading

danielbachhuber commented Dec 6, 2024

jurajmajerik commented Dec 9, 2024

danielbachhuber commented Dec 11, 2024

jurajmajerik commented Dec 13, 2024

danielbachhuber commented Dec 16, 2024

danielbachhuber commented Dec 19, 2024

Experiments statistics: migration plan #26713

Experiments statistics: migration plan #26713

Comments

jurajmajerik commented Dec 6, 2024 • edited by danielbachhuber Loading

Feature request

Debug info

danielbachhuber commented Dec 6, 2024

jurajmajerik commented Dec 9, 2024

danielbachhuber commented Dec 11, 2024

jurajmajerik commented Dec 13, 2024

danielbachhuber commented Dec 16, 2024

danielbachhuber commented Dec 19, 2024

jurajmajerik commented Dec 6, 2024 •

edited by danielbachhuber

Loading