Skip to content

Commit

Permalink
Merge pull request #937 from runtingt/main
Browse files Browse the repository at this point in the history
Add an option to read in a list of points from a .csv file (--fromfile)
  • Loading branch information
ajgilbert committed Aug 19, 2024
2 parents 2a34f04 + fc7121d commit eccf7f1
Show file tree
Hide file tree
Showing 4 changed files with 68 additions and 0 deletions.
14 changes: 14 additions & 0 deletions .github/workflows/cvmfs-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,20 @@ jobs:
text2workspace.py HiggsAnalysis/CombinedLimit/data/tutorials/multiDim/toy-hgg-125.txt -m 125 -P HiggsAnalysis.CombinedLimit.PhysicsModel:floatingXSHiggs --PO modes=ggH,qqH
combine -M MultiDimFit HiggsAnalysis/CombinedLimit/data/tutorials/multiDim/toy-hgg-125.root --setParameterRanges r=-1,1
- uses: rhaschke/docker-run-action@v5
name: Countind datacard Fixed Point from csv
with:
image: ${{ matrix.IMAGE }}
shell: bash
options: ${{env.docker_opt_ro}}
run: |
cp -r cmssw/${CMSSW_VERSION} .
cd /home/cmsusr/${CMSSW_VERSION}/src
source /cvmfs/cms.cern.ch/cmsset_default.sh
cmsenv
text2workspace.py HiggsAnalysis/CombinedLimit/data/tutorials/multiDim/toy-hgg-125.txt -m 125 -P HiggsAnalysis.CombinedLimit.PhysicsModel:floatingXSHiggs --PO modes=ggH,qqH
combineTool.py -M MultiDimFit HiggsAnalysis/CombinedLimit/data/tutorials/multiDim/toy-hgg-125.root --fromfile HiggsAnalysis/CombinedLimit/data/tutorials/multiDim/fixed.csv
- uses: rhaschke/docker-run-action@v5
name: Parametric analysis
with:
Expand Down
5 changes: 5 additions & 0 deletions data/tutorials/multiDim/fixed.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
r_ggH,r_qqH
1.0,1.0
1.0,2.0
2.0,1.0
2.0,2.0
12 changes: 12 additions & 0 deletions docs/part3/commonstatsmethods.md
Original file line number Diff line number Diff line change
Expand Up @@ -846,6 +846,18 @@ A number of different algorithms can be used with the option `--algo <algo>`,

- **`fixed`**: Compare the log-likelihood at a fixed point compared to the best fit. `combine -M MultiDimFit toy-hgg-125.root --algo fixed --fixedPointPOIs r=r_fixed,MH=MH_fixed`. The output tree will contain the difference in the negative log-likelihood between the points ($\hat{r},\hat{m}_{H}$) and ($\hat{r}_{fixed},\hat{m}_{H,fixed}$) in the branch `deltaNLL`.

You can use the `combineTool.py` script to run multiple fixed points from a `.csv` file. For example, [data/tutorials/multiDim/fixed.csv](https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/main/data/tutorials/multiDim/fixed.csv) contains the points

```csv
r_ggH,r_qqH
1.0,1.0
1.0,2.0
2.0,1.0
2.0,2.0
```

and `combineTool.py -M MultiDimFit toy-hgg-125.root --fromfile fixed.csv` will run `--algo fixed` at each of these points.

- **`grid`**: Scan a fixed grid of points with approximately N points in total. `combine -M MultiDimFit toy-hgg-125.root --algo grid --points=10000`.
* You can partition the job in multiple tasks by using the options `--firstPoint` and `--lastPoint`. For complicated scans, the points can be split as described in the [combineTool for job submission](http://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/part3/runningthetool/#combinetool-for-job-submission) section. The output file will contain a column `deltaNLL` with the difference in negative log-likelihood with respect to the best fit point. Ranges/contours can be evaluated by filling TGraphs or TH2 histograms with these points.
* By default the "min" and "max" of the POI ranges are *not* included and the points that are in the scan are *centred* , eg `combine -M MultiDimFit --algo grid --rMin 0 --rMax 5 --points 5` will scan at the points $r=0.5, 1.5, 2.5, 3.5, 4.5$. You can include the option `--alignEdges 1`, which causes the points to be aligned with the end-points of the parameter ranges - e.g. `combine -M MultiDimFit --algo grid --rMin 0 --rMax 5 --points 6 --alignEdges 1` will scan at the points $r=0, 1, 2, 3, 4, 5$. Note - the number of points must be increased by 1 to ensure both end points are included.
Expand Down
37 changes: 37 additions & 0 deletions python/tool_base/EnhancedCombine.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from HiggsAnalysis.CombinedLimit.tool_base.CombineToolBase import CombineToolBase
import six
from six.moves import zip
import pandas as pd


def isfloat(value):
Expand Down Expand Up @@ -39,6 +40,7 @@ def attach_intercept_args(self, group):
group.add_argument("-d", "--datacard", nargs="*", default=[], help="Operate on multiple datacards")
group.add_argument("--name", "-n", default=".Test", help="Name used to label the combine output file, can be modified by other options")
group.add_argument("--setParameterRanges", help="Some other options will modify or add to the list of parameter ranges")
group.add_argument("--algo", help='The algorithm to use with "-M MultiDimFit"')

def attach_args(self, group):
CombineToolBase.attach_args(self, group)
Expand All @@ -54,6 +56,8 @@ def attach_args(self, group):
"--boundlist", help="Name of json-file which contains the ranges of physical parameters depending on the given mass and given physics model"
)
group.add_argument("--generate", nargs="*", default=[], help="Generate sets of options")
group.add_argument("--fromfile", help='The file to read the points from. For use with "-M MultiDimFit --algo fixed')
group.add_argument("--limitPoints", default=-1, help='The maximum number of points to scan from a file. For use with "--fromfile"')

def set_args(self, known, unknown):
CombineToolBase.set_args(self, known, unknown)
Expand Down Expand Up @@ -86,6 +90,26 @@ def run_method(self):
subbed_vars[("SEED",)] = [(sval,) for sval in seed_vals]
self.passthru.extend(["-s", "%(SEED)s"])

# Handle the --fromfile option
if self.args.fromfile is not None:
if self.args.algo is None:
self.args.algo = "fixed"
print(f"Argument --algo not specified but --fromfile is set to {self.args.fromfile}, defaulting to --algo fixed.")
elif self.args.algo != "fixed":
print(f"Warning: --fromfile option is only compatible with --algo fixed, not {self.args.algo}. Setting --algo to fixed.")
self.args.algo = "fixed"
# Read the points from the file into a dataframe
points_df = pd.read_csv(self.args.fromfile)
# If the limitPoints option is set, limit the number of points read
limitPoints = int(self.args.limitPoints)
if limitPoints == 0:
print("Warning: --limitPoints option is set to 0. No points will be scanned!")
points_df = pd.DataFrame()
elif limitPoints > 0:
points_df = points_df.head(limitPoints)
if self.args.algo is not None:
self.put_back_arg("algo", "--algo")

for i, generate in enumerate(self.args.generate):
split_char = ":" if "::" in generate else ";"
gen_header, gen_content = generate.split(split_char * 2)
Expand Down Expand Up @@ -236,6 +260,19 @@ def run_method(self):
subbed_vars[("P_START", "P_END")] = [(r[0], r[1]) for r in ranges]
self.passthru.extend(["--firstPoint %(P_START)s --lastPoint %(P_END)s"])
self.args.name += ".POINTS.%(P_START)s.%(P_END)s"
# Handle the --fromfile option
if self.args.fromfile is not None:
# For each row in the dataframe, create a fixedPointPOIs string
poi_strs = []
for _, row in points_df.iterrows():
fixedPointPOIs = ""
for col in points_df.columns:
fixedPointPOIs += f"{col}={row[col]},"
fixedPointPOIs = fixedPointPOIs[:-1]
poi_strs.append(fixedPointPOIs)
subbed_vars[("FIXEDPOINTPOIS", "INDEX")] = [(s, index) for s, index in zip(poi_strs, range(len(poi_strs)))]
self.passthru.extend(["--fixedPointPOIs", "%(FIXEDPOINTPOIS)s"])
self.args.name += ".POINT.%(INDEX)s"

# can only put the name option back now because we might have modified
# it from what the user specified
Expand Down

0 comments on commit eccf7f1

Please sign in to comment.