Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Common Data Format] JETS #1785

Closed
wants to merge 341 commits into from
Closed

[New Common Data Format] JETS #1785

wants to merge 341 commits into from

Conversation

comane
Copy link
Member

@comane comane commented Aug 4, 2023

PR for the implementation of existing Jet datasets into new CommonData format.

The PR is branched from #1678. (See also previous PR here: #1699)

See table and report below for details

Dataset Old Name Dataset New Name (including Observable Name) Included In NNPDF40 Fit General Notes Status of Implementation Extra Features (TODO)
ATLAS_1JET_8TEV_R06 ATLAS_1JET_8TEV_R06_PTY New implementation agrees with old one The structure of the metadata.yaml file has been checked with the new reader. Use d’Agostini prescription for the treatment of asymmetric uncertainties
ATLAS_1JET_8TEV_R06_DEC Included in ATLAS_1JET_8TEV_R06 New implementation does not agree with old one. Splitting of uncertainties as done in the .cc file is a bit weird. Better to use the HEPdata file directly? The structure of the metadata.yaml file has been checked with the new reader. Use d’Agostini prescription for the treatment of asymmetric uncertainties
ATLAS_2JET_7TEV_R06 ATLAS_2JET_7TEV_R06_M12Y New implementation does not agree with the old one. Bug in previous implementation (see also #1700 ) The structure of the metadata.yaml file has been checked with the new reader Use d’Agostini prescription for the treatment of asymmetric uncertainties
CMS_1JET_8TEV   There seems to be a bug in the old implementation, most likely introduced in the conversion correlation matrix to Covariance matrix The structure of the metadata.yaml file has been checked with the new reader. Use d’Agostini prescription for the treatment of asymmetric uncertainties
CMS_2JET_7TEV CMS_2JET_7TEV_M12Y New Implementation agrees with old one. The structure of the metadata.yaml file has been checked with the new reader. Use d’Agostini prescription for the treatment of asymmetric uncertainties
           

Report of Comparison Legacy Datasets vs New Datasets:
https://vp.nnpdf.science/5DpNBePRSne1TZ1Sfky6Kg==

Copy link
Member

@scarlehoff scarlehoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this!

I have a question about the variants. What is the nominal variant.
If this is equal to not using any variants, please remove them! The variants are only those which are different.

When there is more than one, what is the variant that is to be tested against the old implementation in each case? (to make sure that I'm not comparing the wrong things!)

This PR is nice because it introduces a few problems that are seen here regarding the new-old comparison.

I've tested:

CMS_2JET_7TEV_M12Y

This seems to work fine and it is compatible with the older implementation in terms of the chi2.

ATLAS_2JET_7TEV_R06 (variant bugged)

Works fine as well like the one above.

CMS_1JET_8TEV_PTY

This one is tricky. There are two problems with it:

  1. The theory part is "wrong", at least when looking at Theory 600 (which theory are you using?)
  2. The old dataset uses pT^2 as kinematic variable while this one uses pT. If you want to change the variables you will need to add a new cut (on pT this time) in filter.yml (here)
  3. Then... the covmats are quite different and the chi2 computed with them is different by more than a factor of 2. You said there was a bug in the old implementation. Maybe I'm using the wrong variant? Or is the bug that big?

ATLAS_1JET_8TEV_R06_PTY

Indeed I don't find agreement with the old one, as you say, but I'm confused about the "decorrelated" variant. Is this to be compared with _DEC? (and is this one the one that you found it was bugger?)

Please, could you rebase / cherrypick the commits and files and create a PR against #1813?
and please remove all the extra files that are not part of the implementation (such as the test_XX)

Once you do (and add the plotting information missing, if you want to reproduce the plots that we had for this old-new dataset you need to copy all the information from the plotting file) I'll create a report comparing the old and new versions.
I've tested the two dijet ones and they do work but of course the style is quite different (which might be wanted of course)

buildmaster/dataset_names.yml Outdated Show resolved Hide resolved
@@ -0,0 +1,4 @@
ATLAS_1JET_8TEV_R06_PTY: ATLAS_1JET_8TEV_R06
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ATLAS_1JET_8TEV_R06_PTY: ATLAS_1JET_8TEV_R06
ATLAS_1JET_8TEV_R06_PTY: ATLAS_1JET_8TEV_R06_DEC

This generates a problem that I didn't consider. The same dataset in the new format might refer to two different old datasets depending on the variant. Maybe it is better to have the opposite dictionary...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the solution is to swap to old:new instead of new:old

So please, change dataset_names.yml to be something like this:

ATLAS_1JET_8TEV_R06_DEC:
    dataset: ATLAS_1JET_8TEV_R06_PTY
    variant: decorrelated
CMS_1JET_8TEV: CMS_1JET_8TEV_PTY

For simplicity, when no variant is to be used, we can leave it simply as a mapping. This will allow to map many old datasets to a single new (plus its variant).
This will facilitate later on applying cuts depending on the variant if needed.

@comane
Copy link
Member Author

comane commented Dec 7, 2023

Please, could you rebase / cherrypick the commits and files and create a PR against #1813? and please remove all the extra files that are not part of the implementation (such as the test_XX)

Hi @scarlehoff, I am not sure I understand. On which PR should this branch be rebased on? I would have thought #1678 ?

@scarlehoff
Copy link
Member

All new datasets should be added to this branch #1813

The reader should be separated (and remain separated). You can merge both locally to test but don't point to that branch. The reason is that the reader is tracking master so every branch pointing to the reader will have merge conflicts unless you are constantly rebasing.

That said, I don't think you should rebase, but just either cherry-pick the commits where you add data (or even just doing a new commit with the new data and that's it) and point to #1813, then I'll do the testing with the reader and report back the problems.

@comane comane closed this Dec 7, 2023
@comane
Copy link
Member Author

comane commented Dec 7, 2023

See #1886

@scarlehoff scarlehoff deleted the test_old_jet_cd_to_new_cd branch November 14, 2024 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants