Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Use MLEs of NPs to create sampling distributions in ToyCalculator #1610

Merged
merged 8 commits into from
Sep 27, 2021

Conversation

masonproffitt
Copy link
Contributor

@masonproffitt masonproffitt commented Sep 24, 2021

Description

Resolves #1426.

In short, the problem is a significant divergence between AsymptoticCalculator and ToyCalculator for models which have MLEs of their NPs that are not very close to the initial parameter values. This is caused by the fact that ToyCalculator generates MC toys based on the initial parameter values rather than the profiled MLE values. This behavior is inconsistent with the default RooStats configuration and is inconsistent with the method that ToyCalculator itself uses to calculate the test statistic values of these MC toys. This invariably leads to inaccurate distributions of the test statistic for both the background-only hypothesis toys and signal + background hypothesis toys.

This can easily be demonstrated by a model with a normfactor (inspired by a simplified version of the ABCD method that I'm using in my analysis):

spec = {
    "channels": [
        {
            "name": "A",
            "samples": [
                {
                    "name": "signal",
                    "data": [0.9],
                    "modifiers": [{"name": "mu", "type": "normfactor", "data": None}],
                },
                {
                    "name": "background",
                    "data": [0.1],
                    "modifiers": [
                        {
                            "name": "background_normalization",
                            "type": "normfactor",
                            "data": None,
                        }
                    ],
                },
            ],
        },
        {
            "name": "B",
            "samples": [
                {
                    "name": "signal",
                    "data": [0.1],
                    "modifiers": [{"name": "mu", "type": "normfactor", "data": None}],
                },
                {
                    "name": "background",
                    "data": [0.9],
                    "modifiers": [
                        {
                            "name": "background_normalization",
                            "type": "normfactor",
                            "data": None,
                        }
                    ],
                },
            ],
        },
    ]
}
model = pyhf.Model(spec)
data = [15, 90]
test_mu = 7.0

Using the current master (02b1951), if I run:

CLs_obs_asymptotics, CLs_exp_asymptotics = pyhf.infer.hypotest(
    test_mu,
    data,
    model,
    par_bounds=[(0, 30), (0, 300)],
    test_stat="qtilde",
    return_expected=True,
    calctype="asymptotics",
)
print(f"Asymptotics: Observed: {CLs_obs_asymptotics}, Expected: {CLs_exp_asymptotics}")
np.random.seed(0)
CLs_obs_toybased, CLs_exp_toybased = pyhf.infer.hypotest(
    test_mu,
    data,
    model,
    par_bounds=[(0, 30), (0, 300)],
    test_stat="qtilde",
    return_expected=True,
    calctype="toybased",
)
print(f"Toybased: Observed: {CLs_obs_toybased}, Expected: {CLs_exp_toybased}")

I get:

Asymptotics: Observed: 0.42483777279816676, Expected: 0.11233914460989139
Toybased: Observed: 0.444, Expected: 0.0028964518464880524

The observed values are close enough, but the expected values are very different. This would lead to completely different expected upper limits on mu.

After applying the changes in this PR, I now get:

Asymptotics: Observed: 0.42483777279816676, Expected: 0.11233914460989139
Toybased: Observed: 0.4443237370994025, Expected: 0.13147410358565736

Now the expected values are at least as compatible as the observed values.

Note that if I do actually want to get expected CLs values assuming the initial parameter values are the true ones, I can still get this using model.expected_data(model.config.suggested_init()):

np.random.seed(0)
CLs_obs_toybased, CLs_exp_toybased = pyhf.infer.hypotest(
    test_mu, model.expected_data(model.config.suggested_init()), model, par_bounds=[(0, 30), (0, 300)], test_stat="qtilde", return_expected=True, calctype="toybased"
)
print(f"Toybased: Observed: {CLs_obs_toybased}, Expected: {CLs_exp_toybased}")

which gives me:

Toybased: Observed: 0.013238289205702648, Expected: 0.003243243243243243

in line with the first (expected) result.

Checklist Before Requesting Reviewer

  • Tests are passing
  • "WIP" removed from the title of the pull request
  • Selected an Assignee for the PR to be responsible for the log summary

Before Merging

For the PR Assignees:

  • Summarize commit messages into a comprehensive review of the PR
* Use the best fit parameter values of pyhf.infer.mle.fixed_poi_fit as the values of the nuisance parameters when creating the sampling distributions to generate pseudo-experiments with the ToyCalculator
   - This implements the recommendations from ATLAS and CMS in "Procedure for the LHC Higgs boson search combination in Summer 2011" (Section 2.1 'Observed limits')
   - https://inspirehep.net/literature/1196797
   - Also matches the description of RooStats::FrequentistCalculator where "The nuisance parameters are fixed to their MLEs"
* Update test values in response to the difference in values from pseudo-experiments
* Add Mason Proffitt to the list of contributors

@matthewfeickert
Copy link
Member

Thanks for the PR @masonproffitt. This looks good, but I'll review it in full tomorrow (Saturday).

There was already some discussion on the IRIS-HEP Slack that I want to follow up on, but I'll probably want to summarize some of those points in Issue #1426 so that things will be public and searchable.

@codecov
Copy link

codecov bot commented Sep 25, 2021

Codecov Report

Merging #1610 (56d9128) into master (02b1951) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1610      +/-   ##
==========================================
- Coverage   97.70%   97.70%   -0.01%     
==========================================
  Files          63       63              
  Lines        4050     4048       -2     
  Branches      576      576              
==========================================
- Hits         3957     3955       -2     
  Misses         54       54              
  Partials       39       39              
Flag Coverage Δ
contrib 25.44% <0.00%> (+0.01%) ⬆️
unittests 97.48% <100.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/pyhf/infer/utils.py 100.00% <ø> (ø)
src/pyhf/infer/calculators.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 02b1951...56d9128. Read the comment docs.

@matthewfeickert matthewfeickert added the fix A bug fix label Sep 27, 2021
@matthewfeickert matthewfeickert changed the title fix: Use MLEs of NPs to generate MC toys fix: Use MLEs of NPs to create sampling distributions in ToyCalculator Sep 27, 2021
Copy link
Member

@matthewfeickert matthewfeickert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note to self. I'll push this in.

src/pyhf/infer/calculators.py Outdated Show resolved Hide resolved
@matthewfeickert matthewfeickert added docs Documentation related tests pytest labels Sep 27, 2021
@matthewfeickert
Copy link
Member

matthewfeickert commented Sep 27, 2021

Yeah, for the following (@masonproffitt's script with some edits)

# issue_1426.py
import argparse

import numpy as np

import pyhf

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Determine number of toys.")
    parser.add_argument(
        "--ntoys",
        dest="ntoys",
        type=int,
        default="2000",
        help="Number of pseudo-experiments to perform.",
    )
    args = parser.parse_args()

    spec = {
        "channels": [
            {
                "name": "A",
                "samples": [
                    {
                        "name": "signal",
                        "data": [0.9],
                        "modifiers": [
                            {"name": "mu", "type": "normfactor", "data": None}
                        ],
                    },
                    {
                        "name": "background",
                        "data": [0.1],
                        "modifiers": [
                            {
                                "name": "background_normalization",
                                "type": "normfactor",
                                "data": None,
                            }
                        ],
                    },
                ],
            },
            {
                "name": "B",
                "samples": [
                    {
                        "name": "signal",
                        "data": [0.1],
                        "modifiers": [
                            {"name": "mu", "type": "normfactor", "data": None}
                        ],
                    },
                    {
                        "name": "background",
                        "data": [0.9],
                        "modifiers": [
                            {
                                "name": "background_normalization",
                                "type": "normfactor",
                                "data": None,
                            }
                        ],
                    },
                ],
            },
        ]
    }
    model = pyhf.Model(spec)
    data = [15, 90]
    test_mu = 7.0

    CLs_obs_asymptotics, CLs_exp_asymptotics = pyhf.infer.hypotest(
        test_mu,
        data,
        model,
        par_bounds=[(0, 30), (0, 300)],
        test_stat="qtilde",
        return_expected=True,
        calctype="asymptotics",
    )
    print(
        f"Asymptotics: Observed: {CLs_obs_asymptotics}, Expected: {CLs_exp_asymptotics}"
    )
    np.random.seed(0)
    CLs_obs_toybased, CLs_exp_toybased = pyhf.infer.hypotest(
        test_mu,
        data,
        model,
        par_bounds=[(0, 30), (0, 300)],
        test_stat="qtilde",
        return_expected=True,
        calctype="toybased",
        ntoys=args.ntoys,
    )
    print(f"Toybased:    Observed: {CLs_obs_toybased}, Expected: {CLs_exp_toybased}")

we see this effect more prominently as the number of pseudo-experiments ramps up

# On master
$ python issue_1426.py --ntoys 2_000
Asymptotics: Observed: 0.42483777279816676, Expected: 0.11233914460989139
Toybased:    Observed: 0.444,               Expected: 0.0028964518464880524
# PR 1610
$ python issue_1426.py --ntoys 2_000
Asymptotics: Observed: 0.42483777279816676, Expected: 0.11233914460989139
Toybased:    Observed: 0.4443237370994025,  Expected: 0.13147410358565736
# On master
$ python issue_1426.py --ntoys 10_000
Asymptotics: Observed: 0.42483777279816676, Expected: 0.11233914460989139
Toybased:    Observed: 0.4251,              Expected: 0.0015866147410933218 
# PR 1610
$ python issue_1426.py --ntoys 10_000
Asymptotics: Observed: 0.42483777279816676, Expected: 0.11233914460989139
Toybased:    Observed: 0.43241200604621033, Expected: 0.11898498909379338

Copy link
Member

@matthewfeickert matthewfeickert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @masonproffitt. 👍 I'm going to approve this — thanks for the nice example btw — however I'm going to need @lukasheinrich or @kratsg to review this to make sure that I'm not missing something else that should get patched (it is late at night for me when reviewing) as well as this is a big enough change that this will also probably warrant an immediate patch release.

@matthewfeickert matthewfeickert merged commit 4459510 into scikit-hep:master Sep 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation related fix A bug fix tests pytest
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Option to change the NP values for toys
3 participants