Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RF] RooDataSetHelper doesn't respect RooRealVar range #11017

Closed
1 task done
dcervenkov opened this issue Jul 21, 2022 · 6 comments · Fixed by #11018
Closed
1 task done

[RF] RooDataSetHelper doesn't respect RooRealVar range #11017

dcervenkov opened this issue Jul 21, 2022 · 6 comments · Fixed by #11018

Comments

@dcervenkov
Copy link

  • Checked for duplicates

Describe the bug

I have some data in RDataFrame, and I want to fit them using RooFit. I book a RooDataSet using the RooDataSetHelper like this

auto data_set = df.Book<double, double>(
    RooDataSetHelper("data_set", "title", RooArgSet(var1, var2)), {"var1", "var2"});

The fit is wrong because the data_set doesn't respect the limits on var1 and var2, resulting in an incorrect normalization.

The only difference in the following simplified fits is that I used

auto data_set = df.Book<double, double>(

for the first one, and

auto data_set = df.Filter("var2>2005 && var2<2020").Book<double, double>(

for the second one.

all_wrong
all_right

Expected behavior

I expected the RooDataSet to account for the RooRealVar range.

To Reproduce

  1. Get any RDataFrame
  2. Create a RooRealVar with limits narrower than the corresponding data in the RDataFrame
  3. Use RooDataSetHelper to book a RooDataSet from the RDataFrame using the RooRealVar
  4. Fit the RooRealVar of the RooDataSet
  5. The fit will be wrong unless you cut on the RDataFrame to match the limits of the RooRealVar

Setup

ROOT 6.26/04
macOS 12.4
Brew install

Additional context

@dcervenkov dcervenkov added the bug label Jul 21, 2022
guitargeek added a commit to guitargeek/root that referenced this issue Jul 21, 2022
The RDataFrameHelper should be consistent with creating a RooDataSet
from a TTree, meaning out-of-range events should be skipped. This is
implemented in this commit, borrowing the logic from
`RooTreeDataStore::loadValues()`. A unit test is also implemented.

The previous logic of just taking just all values to fill the dataset
was very dangerous, because these values then clipped to the RooRealVar
limits and biased the number of events observed at the boundaries.

Closes root-project#11017.
@guitargeek
Copy link
Contributor

Thank you very much for opening this issue! This is indeed very dangerous behavior. I have opened a PR to implement the skipping of out-of-range events, which is also consistent with what happens when you are reading in a RooDataSet from a TTree.

@guitargeek guitargeek changed the title RooDataSetHelper doesn't respect RooRealVar range [RF] RooDataSetHelper doesn't respect RooRealVar range Jul 21, 2022
guitargeek added a commit to guitargeek/root that referenced this issue Jul 21, 2022
The RDataFrameHelper should be consistent with creating a RooDataSet
from a TTree, meaning out-of-range events should be skipped. This is
implemented in this commit, borrowing the logic from
`RooTreeDataStore::loadValues()`. A unit test is also implemented.

The previous logic of just taking just all values to fill the dataset
was very dangerous, because these values then clipped to the RooRealVar
limits and biased the number of events observed at the boundaries.

Closes root-project#11017.
guitargeek added a commit that referenced this issue Jul 22, 2022
The RDataFrameHelper should be consistent with creating a RooDataSet
from a TTree, meaning out-of-range events should be skipped. This is
implemented in this commit, borrowing the logic from
`RooTreeDataStore::loadValues()`. A unit test is also implemented.

The previous logic of just taking just all values to fill the dataset
was very dangerous, because these values then clipped to the RooRealVar
limits and biased the number of events observed at the boundaries.

Closes #11017.
@dcervenkov
Copy link
Author

@guitargeek Thanks for fixing it so quickly! Will this end up in the next ROOT release?

@guitargeek
Copy link
Contributor

Yes, as this is a bugfix it will also make it to 6.26.06, which will come out in a few weeks

@github-actions
Copy link

Hi @guitargeek,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

1 similar comment
@github-actions
Copy link

Hi @guitargeek,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

j-mathe pushed a commit to j-mathe/root that referenced this issue Jul 26, 2022
The RDataFrameHelper should be consistent with creating a RooDataSet
from a TTree, meaning out-of-range events should be skipped. This is
implemented in this commit, borrowing the logic from
`RooTreeDataStore::loadValues()`. A unit test is also implemented.

The previous logic of just taking just all values to fill the dataset
was very dangerous, because these values then clipped to the RooRealVar
limits and biased the number of events observed at the boundaries.

Closes root-project#11017.
guitargeek added a commit to guitargeek/root that referenced this issue Jul 26, 2022
The RDataFrameHelper should be consistent with creating a RooDataSet
from a TTree, meaning out-of-range events should be skipped. This is
implemented in this commit, borrowing the logic from
`RooTreeDataStore::loadValues()`. A unit test is also implemented.

The previous logic of just taking just all values to fill the dataset
was very dangerous, because these values then clipped to the RooRealVar
limits and biased the number of events observed at the boundaries.

Closes root-project#11017.
@github-actions
Copy link

Hi @guitargeek,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

guitargeek added a commit that referenced this issue Jul 27, 2022
The RDataFrameHelper should be consistent with creating a RooDataSet
from a TTree, meaning out-of-range events should be skipped. This is
implemented in this commit, borrowing the logic from
`RooTreeDataStore::loadValues()`. A unit test is also implemented.

The previous logic of just taking just all values to fill the dataset
was very dangerous, because these values then clipped to the RooRealVar
limits and biased the number of events observed at the boundaries.

Closes #11017.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants