Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RF] Completely implement Offset("bin") feature #11965

Closed
Tracked by #13458
guitargeek opened this issue Dec 31, 2022 · 5 comments · Fixed by #11988 or #13665
Closed
Tracked by #13458

[RF] Completely implement Offset("bin") feature #11965

guitargeek opened this issue Dec 31, 2022 · 5 comments · Fixed by #11988 or #13665

Comments

@guitargeek
Copy link
Contributor

guitargeek commented Dec 31, 2022

Fully implement and test the new Offset("bin") feature over the test matrix that is the tensor product of BatchMode(), doing and extended fit, RooDataSet vs. RooDataHist, and SumW2 correction. The test should compute the likelihood for a template PDF created from the data, and it should be numerically compatible with zero.

void testOffsetBin()
{
   using namespace RooFit;
   using RealPtr = std::unique_ptr<RooAbsReal>;

   // Create extended PDF model
   RooRealVar x("x", "x", -10, 10);
   RooRealVar mean("mean", "mean", 0, -10, 10);
   RooRealVar sigma("sigma", "width", 4, 0.1, 10);
   RooRealVar nEvents{"n_events", "n_events", 10000, 100, 100000};

   RooGaussian gauss("gauss", "gauss", x, mean, sigma);
   RooAddPdf extGauss("extGauss", "extGauss", gauss, nEvents);

   std::unique_ptr<RooDataSet> data{extGauss.generate(x)};

   {
      // Create weighted dataset and hist to test SumW2 feature
      RooRealVar weight("weight", "weight", 0.5, 0.0, 1.0);
      auto dataW = std::make_unique<RooDataSet>("dataW", "dataW", RooArgSet{x, weight}, "weight");
      for (std::size_t i = 0; i < data->numEntries(); ++i) {
         dataW->add(*data->get(i), 0.5); // try weights that are different from unity
      }
      std::swap(dataW, data); // try to replace the original dataset with weighted dataset
   }

   std::unique_ptr<RooDataHist> hist{data->binnedClone()};

   data->Print();
   hist->Print();

   // Create template PDF based on data
   RooHistPdf histPdf{"histPdf", "histPdf", x, *hist};
   RooAddPdf extHistPdf("extHistPdf", "extHistPdf", histPdf, nEvents);

   auto& pdf = extHistPdf;

   auto const bm = "off"; // it should also work work BatchMode("cpu")

   double nllVal01 = RealPtr{pdf.createNLL(*data, BatchMode(bm), Extended(false))}->getVal();
   double nllVal02 = RealPtr{pdf.createNLL(*data, BatchMode(bm), Extended(true)) }->getVal();
   double nllVal03 = RealPtr{pdf.createNLL(*hist, BatchMode(bm), Extended(false))}->getVal();
   double nllVal04 = RealPtr{pdf.createNLL(*hist, BatchMode(bm), Extended(true)) }->getVal();

   double nllVal1  = RealPtr{pdf.createNLL(*data, BatchMode(bm), Offset("bin"), Extended(false))}->getVal();
   double nllVal2  = RealPtr{pdf.createNLL(*data, BatchMode(bm), Offset("bin"), Extended(true)) }->getVal();
   double nllVal3  = RealPtr{pdf.createNLL(*hist, BatchMode(bm), Offset("bin"), Extended(false))}->getVal();
   double nllVal4  = RealPtr{pdf.createNLL(*hist, BatchMode(bm), Offset("bin"), Extended(true)) }->getVal();

   // The final unit test should also include the SumW2 option in the test matrix

   // For all configurations, the bin offset should have the effect of bringing
   // the NLL close to zero:
   std::cout << "Unbinned fit      :  " << nllVal01 << "    " << nllVal1 << std::endl;
   std::cout << "Unbinned ext. fit : " << nllVal02 << "   " << nllVal2 << std::endl;
   std::cout << "Binned fit        :  " << nllVal03 << "   " << nllVal3 << std::endl;
   std::cout << "Binned ext. fit   : " << nllVal04 << "   " << nllVal4 << std::endl;
}
@guitargeek
Copy link
Contributor Author

Re-opening because I still need to implement this feature for fits with a RooDataSet

@dgwnb
Copy link

dgwnb commented Feb 21, 2023

Hi all,

This is Hongtao from ATLAS Higgs working group. I am the maintainer of xml analytic workspace builder and quickFit. Thank you very much for implementing this very important feature. As yo probably know, however, when making workspaces we usually use weighted RooDataSet instead of RooDataHist to store binned data. Therefore if you could make this feature available for binned RooDataSet it would be very handy for us to further test this feature in physics analyses. Thank you.

@JackHLindon
Copy link
Contributor

Hi, I would also like to give my support for this being available for RooDataSet

@Falk-Bartels
Copy link

Thanks @guitargeek for the great addition of the Offset("bin") option. Very interested to see it supported for RooDataSet as well to use it in quickFit.

guitargeek added a commit to guitargeek/root that referenced this issue Sep 19, 2023
Fully implement the `Offset("bin")` feature also for RooDataSet, both
with CPU/CUDA BatchMode and the legacy tests statistics.

This is done now by introducing a new element in the computation graph:
an "offset pdf" that is created as a RooHistPdf from the observed data,
and it is used to get the counterterm in each bin.

It was validated with the `rf614` tutorial that this binwise offsetting
is inteed fixing the convergense problems that the simple offsetting by
the initial NLL value can't fix.

Closes root-project#11965.
guitargeek added a commit to guitargeek/root that referenced this issue Sep 19, 2023
Fully implement the `Offset("bin")` feature also for RooDataSet, both
with CPU/CUDA BatchMode and the legacy tests statistics.

This is done now by introducing a new element in the computation graph:
an "offset pdf" that is created as a RooHistPdf from the observed data,
and it is used to get the counterterm in each bin.

It was validated with the `rf614` tutorial that this binwise offsetting
is indeed fixing the convergense problems that the simple offsetting by
the initial NLL value can't fix.

Closes root-project#11965.
guitargeek added a commit that referenced this issue Sep 20, 2023
Fully implement the `Offset("bin")` feature also for RooDataSet, both
with CPU/CUDA BatchMode and the legacy tests statistics.

This is done now by introducing a new element in the computation graph:
an "offset pdf" that is created as a RooHistPdf from the observed data,
and it is used to get the counterterm in each bin.

It was validated with the `rf614` tutorial that this binwise offsetting
is indeed fixing the convergense problems that the simple offsetting by
the initial NLL value can't fix.

Closes #11965.
@github-actions
Copy link

Hi @guitargeek,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

maksgraczyk pushed a commit to maksgraczyk/root that referenced this issue Jan 12, 2024
Fully implement the `Offset("bin")` feature also for RooDataSet, both
with CPU/CUDA BatchMode and the legacy tests statistics.

This is done now by introducing a new element in the computation graph:
an "offset pdf" that is created as a RooHistPdf from the observed data,
and it is used to get the counterterm in each bin.

It was validated with the `rf614` tutorial that this binwise offsetting
is indeed fixing the convergense problems that the simple offsetting by
the initial NLL value can't fix.

Closes root-project#11965.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment