-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Common Data Format] JETS #1785
Conversation
…entation. possible reason: bug in the construction of the stat covmat
…ng the commondataparaser to crush??
Co-authored-by: Juan M. Cruz-Martinez <juacrumar@lairen.eu>
Refactor preprocessing
Refactor rotations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this!
I have a question about the variants. What is the nominal
variant.
If this is equal to not using any variants, please remove them! The variants are only those which are different.
When there is more than one, what is the variant that is to be tested against the old implementation in each case? (to make sure that I'm not comparing the wrong things!)
This PR is nice because it introduces a few problems that are seen here regarding the new-old comparison.
I've tested:
CMS_2JET_7TEV_M12Y
This seems to work fine and it is compatible with the older implementation in terms of the chi2.
ATLAS_2JET_7TEV_R06 (variant bugged)
Works fine as well like the one above.
CMS_1JET_8TEV_PTY
This one is tricky. There are two problems with it:
- The theory part is "wrong", at least when looking at Theory 600 (which theory are you using?)
- The old dataset uses pT^2 as kinematic variable while this one uses pT. If you want to change the variables you will need to add a new cut (on pT this time) in
filter.yml
(here) - Then... the covmats are quite different and the chi2 computed with them is different by more than a factor of 2. You said there was a bug in the old implementation. Maybe I'm using the wrong variant? Or is the bug that big?
ATLAS_1JET_8TEV_R06_PTY
Indeed I don't find agreement with the old one, as you say, but I'm confused about the "decorrelated" variant. Is this to be compared with _DEC
? (and is this one the one that you found it was bugger?)
Please, could you rebase / cherrypick the commits and files and create a PR against #1813?
and please remove all the extra files that are not part of the implementation (such as the test_XX
)
Once you do (and add the plotting
information missing, if you want to reproduce the plots that we had for this old-new dataset you need to copy all the information from the plotting file) I'll create a report comparing the old and new versions.
I've tested the two dijet ones and they do work but of course the style is quite different (which might be wanted of course)
@@ -0,0 +1,4 @@ | |||
ATLAS_1JET_8TEV_R06_PTY: ATLAS_1JET_8TEV_R06 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ATLAS_1JET_8TEV_R06_PTY: ATLAS_1JET_8TEV_R06 | |
ATLAS_1JET_8TEV_R06_PTY: ATLAS_1JET_8TEV_R06_DEC |
This generates a problem that I didn't consider. The same dataset in the new format might refer to two different old datasets depending on the variant. Maybe it is better to have the opposite dictionary...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the solution is to swap to old:new instead of new:old
So please, change dataset_names.yml
to be something like this:
ATLAS_1JET_8TEV_R06_DEC:
dataset: ATLAS_1JET_8TEV_R06_PTY
variant: decorrelated
CMS_1JET_8TEV: CMS_1JET_8TEV_PTY
For simplicity, when no variant is to be used, we can leave it simply as a mapping. This will allow to map many old datasets to a single new (plus its variant).
This will facilitate later on applying cuts depending on the variant if needed.
Co-authored-by: Juan M. Cruz-Martinez <juacrumar@lairen.eu>
Hi @scarlehoff, I am not sure I understand. On which PR should this branch be rebased on? I would have thought #1678 ? |
All new datasets should be added to this branch #1813 The reader should be separated (and remain separated). You can merge both locally to test but don't point to that branch. The reason is that the reader is tracking master so every branch pointing to the reader will have merge conflicts unless you are constantly rebasing. That said, I don't think you should rebase, but just either cherry-pick the commits where you add data (or even just doing a new commit with the new data and that's it) and point to #1813, then I'll do the testing with the reader and report back the problems. |
See #1886 |
PR for the implementation of existing Jet datasets into new CommonData format.
The PR is branched from #1678. (See also previous PR here: #1699)
See table and report below for details
Report of Comparison Legacy Datasets vs New Datasets:
https://vp.nnpdf.science/5DpNBePRSne1TZ1Sfky6Kg==