fix: Use MLEs of NPs to create sampling distributions in ToyCalculator #1610

masonproffitt · 2021-09-24T23:09:45Z

Description

Resolves #1426.

In short, the problem is a significant divergence between AsymptoticCalculator and ToyCalculator for models which have MLEs of their NPs that are not very close to the initial parameter values. This is caused by the fact that ToyCalculator generates MC toys based on the initial parameter values rather than the profiled MLE values. This behavior is inconsistent with the default RooStats configuration and is inconsistent with the method that ToyCalculator itself uses to calculate the test statistic values of these MC toys. This invariably leads to inaccurate distributions of the test statistic for both the background-only hypothesis toys and signal + background hypothesis toys.

This can easily be demonstrated by a model with a normfactor (inspired by a simplified version of the ABCD method that I'm using in my analysis):

spec = {
    "channels": [
        {
            "name": "A",
            "samples": [
                {
                    "name": "signal",
                    "data": [0.9],
                    "modifiers": [{"name": "mu", "type": "normfactor", "data": None}],
                },
                {
                    "name": "background",
                    "data": [0.1],
                    "modifiers": [
                        {
                            "name": "background_normalization",
                            "type": "normfactor",
                            "data": None,
                        }
                    ],
                },
            ],
        },
        {
            "name": "B",
            "samples": [
                {
                    "name": "signal",
                    "data": [0.1],
                    "modifiers": [{"name": "mu", "type": "normfactor", "data": None}],
                },
                {
                    "name": "background",
                    "data": [0.9],
                    "modifiers": [
                        {
                            "name": "background_normalization",
                            "type": "normfactor",
                            "data": None,
                        }
                    ],
                },
            ],
        },
    ]
}
model = pyhf.Model(spec)
data = [15, 90]
test_mu = 7.0

Using the current master (02b1951), if I run:

CLs_obs_asymptotics, CLs_exp_asymptotics = pyhf.infer.hypotest(
    test_mu,
    data,
    model,
    par_bounds=[(0, 30), (0, 300)],
    test_stat="qtilde",
    return_expected=True,
    calctype="asymptotics",
)
print(f"Asymptotics: Observed: {CLs_obs_asymptotics}, Expected: {CLs_exp_asymptotics}")
np.random.seed(0)
CLs_obs_toybased, CLs_exp_toybased = pyhf.infer.hypotest(
    test_mu,
    data,
    model,
    par_bounds=[(0, 30), (0, 300)],
    test_stat="qtilde",
    return_expected=True,
    calctype="toybased",
)
print(f"Toybased: Observed: {CLs_obs_toybased}, Expected: {CLs_exp_toybased}")

I get:

Asymptotics: Observed: 0.42483777279816676, Expected: 0.11233914460989139
Toybased: Observed: 0.444, Expected: 0.0028964518464880524

The observed values are close enough, but the expected values are very different. This would lead to completely different expected upper limits on mu.

After applying the changes in this PR, I now get:

Asymptotics: Observed: 0.42483777279816676, Expected: 0.11233914460989139
Toybased: Observed: 0.4443237370994025, Expected: 0.13147410358565736

Now the expected values are at least as compatible as the observed values.

Note that if I do actually want to get expected CLs values assuming the initial parameter values are the true ones, I can still get this using model.expected_data(model.config.suggested_init()):

np.random.seed(0)
CLs_obs_toybased, CLs_exp_toybased = pyhf.infer.hypotest(
    test_mu, model.expected_data(model.config.suggested_init()), model, par_bounds=[(0, 30), (0, 300)], test_stat="qtilde", return_expected=True, calctype="toybased"
)
print(f"Toybased: Observed: {CLs_obs_toybased}, Expected: {CLs_exp_toybased}")

which gives me:

Toybased: Observed: 0.013238289205702648, Expected: 0.003243243243243243

in line with the first (expected) result.

Checklist Before Requesting Reviewer

Tests are passing
"WIP" removed from the title of the pull request
Selected an Assignee for the PR to be responsible for the log summary

Before Merging

For the PR Assignees:

Summarize commit messages into a comprehensive review of the PR

* Use the best fit parameter values of pyhf.infer.mle.fixed_poi_fit as the values of the nuisance parameters when creating the sampling distributions to generate pseudo-experiments with the ToyCalculator
   - This implements the recommendations from ATLAS and CMS in "Procedure for the LHC Higgs boson search combination in Summer 2011" (Section 2.1 'Observed limits')
   - https://inspirehep.net/literature/1196797
   - Also matches the description of RooStats::FrequentistCalculator where "The nuisance parameters are fixed to their MLEs"
* Update test values in response to the difference in values from pseudo-experiments
* Add Mason Proffitt to the list of contributors

matthewfeickert · 2021-09-25T04:25:59Z

Thanks for the PR @masonproffitt. This looks good, but I'll review it in full tomorrow (Saturday).

There was already some discussion on the IRIS-HEP Slack that I want to follow up on, but I'll probably want to summarize some of those points in Issue #1426 so that things will be public and searchable.

codecov · 2021-09-25T04:38:31Z

Codecov Report

Merging #1610 (56d9128) into master (02b1951) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1610      +/-   ##
==========================================
- Coverage   97.70%   97.70%   -0.01%     
==========================================
  Files          63       63              
  Lines        4050     4048       -2     
  Branches      576      576              
==========================================
- Hits         3957     3955       -2     
  Misses         54       54              
  Partials       39       39

Flag	Coverage Δ
contrib	`25.44% <0.00%> (+0.01%)`	⬆️
unittests	`97.48% <100.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/pyhf/infer/utils.py	`100.00% <ø> (ø)`
src/pyhf/infer/calculators.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 02b1951...56d9128. Read the comment docs.

matthewfeickert

Just a note to self. I'll push this in.

src/pyhf/infer/calculators.py

matthewfeickert · 2021-09-27T07:04:15Z

Yeah, for the following (@masonproffitt's script with some edits)

# issue_1426.py
import argparse

import numpy as np

import pyhf

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Determine number of toys.")
    parser.add_argument(
        "--ntoys",
        dest="ntoys",
        type=int,
        default="2000",
        help="Number of pseudo-experiments to perform.",
    )
    args = parser.parse_args()

    spec = {
        "channels": [
            {
                "name": "A",
                "samples": [
                    {
                        "name": "signal",
                        "data": [0.9],
                        "modifiers": [
                            {"name": "mu", "type": "normfactor", "data": None}
                        ],
                    },
                    {
                        "name": "background",
                        "data": [0.1],
                        "modifiers": [
                            {
                                "name": "background_normalization",
                                "type": "normfactor",
                                "data": None,
                            }
                        ],
                    },
                ],
            },
            {
                "name": "B",
                "samples": [
                    {
                        "name": "signal",
                        "data": [0.1],
                        "modifiers": [
                            {"name": "mu", "type": "normfactor", "data": None}
                        ],
                    },
                    {
                        "name": "background",
                        "data": [0.9],
                        "modifiers": [
                            {
                                "name": "background_normalization",
                                "type": "normfactor",
                                "data": None,
                            }
                        ],
                    },
                ],
            },
        ]
    }
    model = pyhf.Model(spec)
    data = [15, 90]
    test_mu = 7.0

    CLs_obs_asymptotics, CLs_exp_asymptotics = pyhf.infer.hypotest(
        test_mu,
        data,
        model,
        par_bounds=[(0, 30), (0, 300)],
        test_stat="qtilde",
        return_expected=True,
        calctype="asymptotics",
    )
    print(
        f"Asymptotics: Observed: {CLs_obs_asymptotics}, Expected: {CLs_exp_asymptotics}"
    )
    np.random.seed(0)
    CLs_obs_toybased, CLs_exp_toybased = pyhf.infer.hypotest(
        test_mu,
        data,
        model,
        par_bounds=[(0, 30), (0, 300)],
        test_stat="qtilde",
        return_expected=True,
        calctype="toybased",
        ntoys=args.ntoys,
    )
    print(f"Toybased:    Observed: {CLs_obs_toybased}, Expected: {CLs_exp_toybased}")

we see this effect more prominently as the number of pseudo-experiments ramps up

# On master
$ python issue_1426.py --ntoys 2_000
Asymptotics: Observed: 0.42483777279816676, Expected: 0.11233914460989139
Toybased:    Observed: 0.444,               Expected: 0.0028964518464880524
# PR 1610
$ python issue_1426.py --ntoys 2_000
Asymptotics: Observed: 0.42483777279816676, Expected: 0.11233914460989139
Toybased:    Observed: 0.4443237370994025,  Expected: 0.13147410358565736

# On master
$ python issue_1426.py --ntoys 10_000
Asymptotics: Observed: 0.42483777279816676, Expected: 0.11233914460989139
Toybased:    Observed: 0.4251,              Expected: 0.0015866147410933218 
# PR 1610
$ python issue_1426.py --ntoys 10_000
Asymptotics: Observed: 0.42483777279816676, Expected: 0.11233914460989139
Toybased:    Observed: 0.43241200604621033, Expected: 0.11898498909379338

matthewfeickert

Thanks for the PR @masonproffitt. 👍 I'm going to approve this — thanks for the nice example btw — however I'm going to need @lukasheinrich or @kratsg to review this to make sure that I'm not missing something else that should get patched (it is late at night for me when reviewing) as well as this is a big enough change that this will also probably warrant an immediate patch release.

src/pyhf/infer/calculators.py

…011'

masonproffitt added 5 commits September 24, 2021 21:00

use MLE for NPs in ToyCalculator scikit-hep#1426

eca2bb6

update ToyCalculator test scikit-hep#1426

c6cdc95

run black

547381b

update toy tests

c02c8ca

add to contributors list

d765003

matthewfeickert added the fix A bug fix label Sep 27, 2021

matthewfeickert assigned masonproffitt Sep 27, 2021

matthewfeickert changed the title ~~fix: Use MLEs of NPs to generate MC toys~~ fix: Use MLEs of NPs to create sampling distributions in ToyCalculator Sep 27, 2021

matthewfeickert reviewed Sep 27, 2021

View reviewed changes

src/pyhf/infer/calculators.py Outdated Show resolved Hide resolved

Remove redundant astensor call

8450f0d

matthewfeickert added docs Documentation related tests pytest labels Sep 27, 2021

matthewfeickert approved these changes Sep 27, 2021

View reviewed changes

lukasheinrich reviewed Sep 27, 2021

View reviewed changes

src/pyhf/infer/calculators.py Show resolved Hide resolved

matthewfeickert mentioned this pull request Sep 27, 2021

Parallelism of calculations in pyhf ala joblib (or similar) #807

Open

add to docstring about MLEs used in toy calculator

3f35920

matthewfeickert reviewed Sep 27, 2021

View reviewed changes

src/pyhf/infer/calculators.py Outdated Show resolved Hide resolved

use 'Procedure for the LHC Higgs boson search combination in Summer 2…

56d9128

…011'

matthewfeickert requested a review from lukasheinrich September 27, 2021 14:10

lukasheinrich approved these changes Sep 27, 2021

View reviewed changes

matthewfeickert merged commit 4459510 into scikit-hep:master Sep 27, 2021

matthewfeickert mentioned this pull request Sep 28, 2021

Toys seems to give strange results #892

Closed

matthewfeickert mentioned this pull request Dec 7, 2021

docs: Add v0.7.0 release notes #1705

Merged

4 tasks

matthewfeickert mentioned this pull request Aug 10, 2023

hypotest documentation not clear on return types depending on test statistic chosen #2262

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Use MLEs of NPs to create sampling distributions in ToyCalculator #1610

fix: Use MLEs of NPs to create sampling distributions in ToyCalculator #1610

masonproffitt commented Sep 24, 2021 •

edited by matthewfeickert

Loading

matthewfeickert commented Sep 25, 2021

codecov bot commented Sep 25, 2021 •

edited

Loading

matthewfeickert left a comment

matthewfeickert commented Sep 27, 2021 •

edited

Loading

matthewfeickert left a comment

fix: Use MLEs of NPs to create sampling distributions in ToyCalculator #1610

fix: Use MLEs of NPs to create sampling distributions in ToyCalculator #1610

Conversation

masonproffitt commented Sep 24, 2021 • edited by matthewfeickert Loading

Description

Checklist Before Requesting Reviewer

Before Merging

matthewfeickert commented Sep 25, 2021

codecov bot commented Sep 25, 2021 • edited Loading

Codecov Report

matthewfeickert left a comment

Choose a reason for hiding this comment

matthewfeickert commented Sep 27, 2021 • edited Loading

matthewfeickert left a comment

Choose a reason for hiding this comment

masonproffitt commented Sep 24, 2021 •

edited by matthewfeickert

Loading

codecov bot commented Sep 25, 2021 •

edited

Loading

matthewfeickert commented Sep 27, 2021 •

edited

Loading