[RF] Update RooFit rf204b_extendedLikelihood_rangedFit tutorial

The PR root-project#7719 changed the behavior of multi-range fits in RooFit, and from then one the relative yields in the multiple regions were also considered in the fit. The tutorial `rf204b_extendedLikelihood_rangedFit` was explaining the old caveat that relative yields are not considered in non-extended fits, but since that is not a problem anymore, a big chunk of explanation in the tutorial should be removed. A Python translation of the tutorial was also added. Closes root-project#8808.
guitargeek · Sep 15, 2022 · f89feca · f89feca
1 parent 2130fb2
commit f89feca
Show file tree

Hide file tree

Showing 3 changed files with 195 additions and 31 deletions.
diff --git a/tutorials/roofit/rf204b_extendedLikelihood_rangedFit.C b/tutorials/roofit/rf204b_extendedLikelihood_rangedFit.C
@@ -1,30 +1,22 @@
 /// \file
 /// \ingroup tutorial_roofit
 /// \notebook -nodraw
-/// This macro demonstrates how to set up a fit in two ranges
-/// such that it does not only fit the shapes in each region, but also
-/// takes into account the relative normalization of the two.
+/// This macro demonstrates how to set up a fit in two ranges for plain
+/// likelihoods and extended likelihoods.
 ///
 /// ### 1. Shape fits (plain likelihood)
 ///
-/// If you perform a fit in two ranges in RooFit, e.g. `pdf->fitTo(data,Range("Range1,Range2"))`
-/// it will construct a simple simultaneous fit of the two regions.
+/// If you fit a non-extended pdf in two ranges, e.g. `pdf->fitTo(data,Range("Range1,Range2"))`,
+/// it will fit the shapes in the two selected ranges and also take into account the relative
+/// predicted yields in those ranges.
 ///
-/// In case the pdf is not extended, i.e., a shape fit, it will only fit the shapes in the
-/// two selected ranges, and not take into account the relative predicted yields in those ranges.
-///
-/// In certain models (like exponential decays) and configurations (e.g. narrow ranges that are far apart),
-/// the relative normalization of the ranges may carry much more information about the function parameters
-/// than the shape of the distribution inside those ranges. Therefore, it is important to take that into
-/// account.
-///
-/// This is particularly important for cases where the 2-range fit is meant to be representative of
-/// a full-range fit, but with a blinded signal region inside it.
+/// This is useful for example to represent a full-range fit, but with a
+/// blinded signal region inside it.
 ///
 ///
 /// ### 2. Shape+rate fits (extended likelihood)
 ///
-/// Also if your pdf is already extended, i.e. measuring both the distribution in the observable as well
+/// If your pdf is extended, i.e. measuring both the distribution in the observable as well
 /// as the event count in the fitted region, some intervention is needed to make fits in ranges
 /// work in a way that corresponds to intuition.
 ///
@@ -61,11 +53,13 @@
 #include "RooDataSet.h"
 #include "RooPlot.h"
 #include "RooExtendPdf.h"
+#include "RooFitResult.h"
+
 #include "TCanvas.h"
-using namespace RooFit;
 
 void rf204b_extendedLikelihood_rangedFit()
 {
+ using namespace RooFit;
 
  // PART 1: Background-only fits
  // ----------------------------
@@ -81,7 +75,7 @@ void rf204b_extendedLikelihood_rangedFit()
 
  x.setRange("FULL",10,100);
 
- RooDataSet* data = model.generate(x, 10000);
+ std::unique_ptr<RooDataSet> data{model.generate(x, 10000)};
 
  // Construct an extended pdf, which measures the event count N **on the full range**.
  // If the actual domain of x that is fitted is identical to FULL, this has no affect.
@@ -116,35 +110,31 @@ void rf204b_extendedLikelihood_rangedFit()
  //       \cdot \mathrm{Poisson} \left( N_\mathrm{obs}^\mathrm{LEFT}  | N_\mathrm{exp} / \mathrm{frac LEFT} \right)
  //       \cdot \mathrm{Poisson} \left( N_\mathrm{obs}^\mathrm{RIGHT} | N_\mathrm{exp} / \mathrm{frac RIGHT} \right)
  // \f]
- // that will introduce additional sensitivity of the likelihood to the slope parameter alpha of the exponential model through the `frac_LEFT` and `frac_RIGHT` integrals.
- //
- // In the extreme case of an exponential function and a fit in narrow LEFT and RIGHT ranges, this sensitivity may actually be larger
- // than from the shapes.
- //
- // This is also nicely demonstrated in the example below where the uncertainty on alpha is almost 5x smaller if the extended term is included.
 
 
  TCanvas* c = new TCanvas("c", "c", 2100, 700);
  c->Divide(3);
  c->cd(1);
 
- RooFitResult* r = model.fitTo(*data, Range("LEFT,RIGHT"), Save());
+ std::unique_ptr<RooFitResult> r{model.fitTo(*data, Range("LEFT,RIGHT"), PrintLevel(-1), Save())};
+ r->Print();
 
  RooPlot* frame = x.frame();
  data->plotOn(frame);
  model.plotOn(frame, VisualizeError(*r));
  model.plotOn(frame);
- model.paramOn(frame, Label("Bkg fit. Large errors since\nnormalisation ignored"));
+ model.paramOn(frame, Label("Non-extended fit"));
  frame->Draw();
 
  c->cd(2);
 
- RooFitResult* r2 = extmodel.fitTo(*data, Range("LEFT,RIGHT"), Save());
+ std::unique_ptr<RooFitResult> r2{extmodel.fitTo(*data, Range("LEFT,RIGHT"), PrintLevel(-1), Save())};
+ r2->Print();
  RooPlot* frame2 = x.frame();
  data->plotOn(frame2);
  extmodel.plotOn(frame2);
  extmodel.plotOn(frame2, VisualizeError(*r2));
- extmodel.paramOn(frame2, Label("Bkg fit. Normalisation\nincluded"), Layout(0.4,0.95));
+ extmodel.paramOn(frame2, Label("Extended fit"), Layout(0.4,0.95));
  frame2->Draw();
 
  // PART 2: Extending with RooAddPdf
@@ -165,7 +155,7 @@ void rf204b_extendedLikelihood_rangedFit()
  RooRealVar width("width", "Width of signal model", 5.);
  RooGaussian sig("sig", "Signal model", x, mean, width);
 
- RooAddPdf modelsum("modelsum", "NSig*signal + NBkg*background", RooArgSet(sig, model), RooArgSet(Nsig, Nbkg));
+ RooAddPdf modelsum("modelsum", "NSig*signal + NBkg*background", {sig, model}, {Nsig, Nbkg});
 
  // This model will automatically insert the correction factor for the reinterpretation of Nsig and Nnbkg in the full ranges.
  //
@@ -175,7 +165,8 @@ void rf204b_extendedLikelihood_rangedFit()
  // [#1] INFO:Fitting -- RooAbsOptTestStatistic::ctor(nll_modelsum_modelsumData_RIGHT) fixing interpretation of coefficients of any RooAddPdf to full domain of observables
  // ```
 
- RooFitResult* r3 = modelsum.fitTo(*data, Range("LEFT,RIGHT"), Save());
+ std::unique_ptr<RooFitResult> r3{modelsum.fitTo(*data, Range("LEFT,RIGHT"), PrintLevel(-1), Save())};
+ r3->Print();
 
  RooPlot* frame3 = x.frame();
  data->plotOn(frame3);

diff --git a/tutorials/roofit/rf204b_extendedLikelihood_rangedFit.py b/tutorials/roofit/rf204b_extendedLikelihood_rangedFit.py
@@ -0,0 +1,173 @@
+## \file
+## \ingroup tutorial_roofit
+## \notebook -nodraw
+## This macro demonstrates how to set up a fit in two ranges for plain
+## likelihoods and extended likelihoods.
+##
+##  ### 1. Shape fits (plain likelihood)
+##
+##  If you fit a non-extended pdf in two ranges, e.g. `pdf.fitTo(data,Range="Range1,Range2")`,
+##  it will fit the shapes in the two selected ranges and also take into account the relative
+##  predicted yields in those ranges.
+##
+##  This is useful for example to represent a full-range fit, but with a
+##  blinded signal region inside it.
+##
+##
+##  ### 2. Shape+rate fits (extended likelihood)
+##
+##  If your pdf is extended, i.e. measuring both the distribution in the observable as well
+##  as the event count in the fitted region, some intervention is needed to make fits in ranges
+##  work in a way that corresponds to intuition.
+##
+##  If an extended fit is performed in a sub-range, the observed yield is only that of the subrange, hence
+##  the expected event count will converge to a number that is smaller than what's visible in a plot.
+##  In such cases, it is often preferred to interpret the extended term with respect to the full range
+##  that's plotted, i.e., apply a correction to the extended likelihood term in such a way
+##  that the interpretation of the expected event count remains that of the full range. This can
+##  be done by applying a correcion factor (equal to the fraction of the pdf that is contained in the
+##  fitted range) in the Poisson term that represents the extended likelihood term.
+##
+##  If an extended likelihood fit is performed over *two* sub-ranges, this correction is
+##  even more important: without it, each component likelihood would have a different interpretation
+##  of the expected event count (each corresponding to the count in its own region), and a joint
+##  fit of these regions with different interpretations of the same model parameter results
+##  in a number that is not easily interpreted.
+##
+##  If both regions correct their interpretatin such that N_expected refers to the full range,
+##  it is interpreted easily, and consistent in both regions.
+##
+##  This requires that the likelihood model is extended using RooAddPdf in the
+##  form SumPdf = Nsig * sigPdf + Nbkg * bkgPdf.
+##
+##  \macro_image
+##  \macro_code
+##  \macro_output
+##
+## \authors Stephan Hageboeck, Wouter Verkerke
+
+import ROOT
+
+ROOT.gROOT.SetBatch(True)
+
+# PART 1: Background-only fits
+# ----------------------------
+
+# Build plain exponential model
+x = ROOT.RooRealVar("x", "x", 10, 100)
+alpha = ROOT.RooRealVar("alpha", "alpha", -0.04, -0.1, -0.0)
+model = ROOT.RooExponential("model", "Exponential model", x, alpha)
+
+# Define side band regions and full range
+x.setRange("LEFT", 10, 20)
+x.setRange("RIGHT", 60, 100)
+
+x.setRange("FULL", 10, 100)
+
+data = model.generate(x, 10000)
+
+# Construct an extended pdf, which measures the event count N **on the full range**.
+# If the actual domain of x that is fitted is identical to FULL, this has no affect.
+#
+# If the fitted domain is a subset of `FULL`, though, the expected event count is divided by
+# \f[
+#   \mathrm{frac} = \frac{
+#     \int_{\mathrm{Fit range}} \mathrm{model}(x)  \; \mathrm{d}x }{
+#     \int_{\mathrm{Full range}} \mathrm{model}(x) \; \mathrm{d}x }.
+# \f]
+# `N` will therefore return the count extrapolated to the full range instead of the fit range.
+#
+# **Note**: When using a RooAddPdf for extending the likelihood, the same effect can be achieved with
+# [RooAddPdf::fixCoefRange()](https://root.cern.ch/doc/master/classRooAddPdf.html#ab631caf4b59e4c4221f8967aecbf2a65),
+
+N = ROOT.RooRealVar("N", "Extended term", 0, 20000)
+extmodel = ROOT.RooExtendPdf("extmodel", "Extended model", model, N, "FULL")
+
+
+# It can be instructive to fit the above model to either the LEFT or RIGHT
+# range. `N` should approximately converge to the expected number of events in
+# the full range. One may try to leave out `"FULL"` in the constructor, or the
+# interpretation of `N` changes.
+extmodel.fitTo(data, Range="LEFT", PrintLevel=-1)
+N.Print()
+
+
+# If we now do a simultaneous fit to the extended model, instead of the
+# original model, the LEFT and RIGHT range will each correct their local `N`
+# such that it refers to the `FULL` range.
+#
+# This joint fit of the extmodel will include (w.r.t. the plain model fit) a product of extended terms
+# \f[
+#     L_\mathrm{ext} = L
+#       \cdot \mathrm{Poisson} \left( N_\mathrm{obs}^\mathrm{LEFT}  | N_\mathrm{exp} / \mathrm{frac LEFT} \right)
+#       \cdot \mathrm{Poisson} \left( N_\mathrm{obs}^\mathrm{RIGHT} | N_\mathrm{exp} / \mathrm{frac RIGHT} \right)
+# \f]
+
+
+c = ROOT.TCanvas("c", "c", 2100, 700)
+c.Divide(3)
+c.cd(1)
+
+r = model.fitTo(data, Range="LEFT,RIGHT", PrintLevel=-1, Save=True)
+r.Print()
+
+frame = x.frame()
+data.plotOn(frame)
+model.plotOn(frame, VisualizeError=r)
+model.plotOn(frame)
+model.paramOn(frame, Label="Non-extended fit")
+frame.Draw()
+
+c.cd(2)
+
+r2 = extmodel.fitTo(data, Range="LEFT,RIGHT", PrintLevel=-1, Save=True)
+r2.Print()
+frame2 = x.frame()
+data.plotOn(frame2)
+extmodel.plotOn(frame2)
+extmodel.plotOn(frame2, VisualizeError=r2)
+extmodel.paramOn(frame2, Label="Extended fit", Layout=(0.4, 0.95))
+frame2.Draw()
+
+# PART 2: Extending with RooAddPdf
+# --------------------------------
+#
+# Now we repeat the above exercise, but instead of explicitly adding an extended term to a single shape pdf (RooExponential),
+# we assume that we already have an extended likelihood model in the form of a RooAddPdf constructed in the form `Nsig * sigPdf + Nbkg * bkgPdf`.
+#
+# We add a Gaussian to the previously defined exponential background.
+# The signal shape parameters are chosen constant, since the signal region is entirely in the blinded region, i.e., the fit has no sensitivity.
+
+c.cd(3)
+
+Nsig = ROOT.RooRealVar("Nsig", "Number of signal events", 1000, 0, 2000)
+Nbkg = ROOT.RooRealVar("Nbkg", "Number of background events", 10000, 0, 20000)
+
+mean = ROOT.RooRealVar("mean", "Mean of signal model", 40.0)
+width = ROOT.RooRealVar("width", "Width of signal model", 5.0)
+sig = ROOT.RooGaussian("sig", "Signal model", x, mean, width)
+
+modelsum = ROOT.RooAddPdf("modelsum", "NSig*signal + NBkg*background", [sig, model], [Nsig, Nbkg])
+
+# This model will automatically insert the correction factor for the
+# reinterpretation of Nsig and Nnbkg in the full ranges.
+#
+# When this happens, it reports this with lines like the following:
+# ```
+# [#1] INFO:Fitting -- RooAbsOptTestStatistic::ctor(nll_modelsum_modelsumData_LEFT) fixing interpretation of coefficients of any RooAddPdf to full domain of observables
+# [#1] INFO:Fitting -- RooAbsOptTestStatistic::ctor(nll_modelsum_modelsumData_RIGHT) fixing interpretation of coefficients of any RooAddPdf to full domain of observables
+# ```
+
+r3 = modelsum.fitTo(data, Range="LEFT,RIGHT", PrintLevel=-1, Save=True)
+r3.Print()
+
+frame3 = x.frame()
+data.plotOn(frame3)
+modelsum.plotOn(frame3)
+modelsum.plotOn(frame3, VisualizeError=r3)
+modelsum.paramOn(frame3, Label="S+B fit with RooAddPdf", Layout=(0.3, 0.95))
+frame3.Draw()
+
+c.Draw()
+
+c.SaveAs("rf204b_extendedLikelihood_rangedFit.png")
diff --git a/tutorials/roofit/rf212_plottingInRanges_blinding.py b/tutorials/roofit/rf212_plottingInRanges_blinding.py
@@ -42,7 +42,7 @@
 # automatically taken as the NormRange() for plotting. We want to avoid this,
 # because the point of this tutorial is to show what can go wrong when the
 # NormRange() is not specified.
-expo.removeStringAttribute("fitrange");
+expo.removeStringAttribute("fitrange")
 
 
 # Here we will plot the results