[New Common Data Format] JETS #1785

comane · 2023-08-04T11:32:01Z

PR for the implementation of existing Jet datasets into new CommonData format.

The PR is branched from #1678. (See also previous PR here: #1699)

See table and report below for details

Dataset Old Name	Dataset New Name (including Observable Name)	Included In NNPDF40 Fit	General Notes	Status of Implementation	Extra Features (TODO)
ATLAS_1JET_8TEV_R06	ATLAS_1JET_8TEV_R06_PTY	❌	New implementation agrees with old one	The structure of the metadata.yaml file has been checked with the new reader.	Use d’Agostini prescription for the treatment of asymmetric uncertainties
ATLAS_1JET_8TEV_R06_DEC	Included in ATLAS_1JET_8TEV_R06	✅	New implementation does not agree with old one. Splitting of uncertainties as done in the .cc file is a bit weird. Better to use the HEPdata file directly?	The structure of the metadata.yaml file has been checked with the new reader.	Use d’Agostini prescription for the treatment of asymmetric uncertainties
ATLAS_2JET_7TEV_R06	ATLAS_2JET_7TEV_R06_M12Y	✅	New implementation does not agree with the old one. Bug in previous implementation (see also #1700 )	The structure of the metadata.yaml file has been checked with the new reader	Use d’Agostini prescription for the treatment of asymmetric uncertainties
CMS_1JET_8TEV		✅	There seems to be a bug in the old implementation, most likely introduced in the conversion correlation matrix to Covariance matrix	The structure of the metadata.yaml file has been checked with the new reader.	Use d’Agostini prescription for the treatment of asymmetric uncertainties
CMS_2JET_7TEV	CMS_2JET_7TEV_M12Y	✅	New Implementation agrees with old one.	The structure of the metadata.yaml file has been checked with the new reader.	Use d’Agostini prescription for the treatment of asymmetric uncertainties

Report of Comparison Legacy Datasets vs New Datasets:
https://vp.nnpdf.science/5DpNBePRSne1TZ1Sfky6Kg==

…entation. possible reason: bug in the construction of the stat covmat

…ng the commondataparaser to crush??

Co-authored-by: Juan M. Cruz-Martinez <juacrumar@lairen.eu>

Refactor preprocessing

Refactor rotations

scarlehoff

Thank you for this!

I have a question about the variants. What is the nominal variant.
If this is equal to not using any variants, please remove them! The variants are only those which are different.

When there is more than one, what is the variant that is to be tested against the old implementation in each case? (to make sure that I'm not comparing the wrong things!)

This PR is nice because it introduces a few problems that are seen here regarding the new-old comparison.

I've tested:

CMS_2JET_7TEV_M12Y

This seems to work fine and it is compatible with the older implementation in terms of the chi2.

ATLAS_2JET_7TEV_R06 (variant bugged)

Works fine as well like the one above.

CMS_1JET_8TEV_PTY

This one is tricky. There are two problems with it:

The theory part is "wrong", at least when looking at Theory 600 (which theory are you using?)
The old dataset uses pT^2 as kinematic variable while this one uses pT. If you want to change the variables you will need to add a new cut (on pT this time) in filter.yml (here)
Then... the covmats are quite different and the chi2 computed with them is different by more than a factor of 2. You said there was a bug in the old implementation. Maybe I'm using the wrong variant? Or is the bug that big?

ATLAS_1JET_8TEV_R06_PTY

Indeed I don't find agreement with the old one, as you say, but I'm confused about the "decorrelated" variant. Is this to be compared with _DEC? (and is this one the one that you found it was bugger?)

Please, could you rebase / cherrypick the commits and files and create a PR against #1813?
and please remove all the extra files that are not part of the implementation (such as the test_XX)

Once you do (and add the plotting information missing, if you want to reproduce the plots that we had for this old-new dataset you need to copy all the information from the plotting file) I'll create a report comparing the old and new versions.
I've tested the two dijet ones and they do work but of course the style is quite different (which might be wanted of course)

buildmaster/dataset_names.yml

scarlehoff · 2023-11-13T13:25:58Z

buildmaster/dataset_names.yml

@@ -0,0 +1,4 @@
+ATLAS_1JET_8TEV_R06_PTY: ATLAS_1JET_8TEV_R06


Suggested change

ATLAS_1JET_8TEV_R06_PTY: ATLAS_1JET_8TEV_R06

ATLAS_1JET_8TEV_R06_PTY: ATLAS_1JET_8TEV_R06_DEC

This generates a problem that I didn't consider. The same dataset in the new format might refer to two different old datasets depending on the variant. Maybe it is better to have the opposite dictionary...

I think the solution is to swap to old:new instead of new:old

So please, change dataset_names.yml to be something like this:

ATLAS_1JET_8TEV_R06_DEC: dataset: ATLAS_1JET_8TEV_R06_PTY variant: decorrelated CMS_1JET_8TEV: CMS_1JET_8TEV_PTY

For simplicity, when no variant is to be used, we can leave it simply as a mapping. This will allow to map many old datasets to a single new (plus its variant).
This will facilitate later on applying cuts depending on the variant if needed.

…ncertainties

Co-authored-by: Juan M. Cruz-Martinez <juacrumar@lairen.eu>

comane · 2023-12-07T15:35:58Z

Please, could you rebase / cherrypick the commits and files and create a PR against #1813? and please remove all the extra files that are not part of the implementation (such as the test_XX)

Hi @scarlehoff, I am not sure I understand. On which PR should this branch be rebased on? I would have thought #1678 ?

scarlehoff · 2023-12-07T15:43:24Z

All new datasets should be added to this branch #1813

The reader should be separated (and remain separated). You can merge both locally to test but don't point to that branch. The reason is that the reader is tracking master so every branch pointing to the reader will have merge conflicts unless you are constantly rebasing.

That said, I don't think you should rebase, but just either cherry-pick the commits where you add data (or even just doing a new commit with the new data and that's it) and point to #1813, then I'll do the testing with the reader and report back the problems.

comane · 2023-12-07T16:23:36Z

See #1886

comane added 16 commits August 1, 2023 14:42

modified metadata for it to be compatible with reader

3ceeb19

it is definitions, not definition

99b9442

added testing module

293c9ef

changed metadata files so as to be readable by parser

dd548a9

CMS_1JET_8TEV new implementation is disagreeing with the older implem…

69138ba

…entation. possible reason: bug in the construction of the stat covmat

added minor modifications to commondataparser

b892812

added HEPdata tables for alternative scenario

0eb60df

how to include both scenarios in the metadata.yaml files without havi…

c65407c

…ng the commondataparaser to crush??

added decorr unc + names of unc

9b94416

this is how variants are included in metadataa.yaml

a993cea

ATLAS 2 JET

9fd2266

finalized ATLAS_1JET

5db76e2

finalized ATLAS 2JET

554fe31

more or less final version of CMS 1JET

cbd6612

more or less final version of CMS 2 JET

da722ef

added reportengine to test new datasets

9fd5aa5

comane requested review from Zaharid, enocera and scarlehoff August 4, 2023 11:32

APJansen and others added 11 commits August 7, 2023 14:06

Add Optional typehint

02161d5

Add np.testing

b0e8f26

Set seed to 1 instead of 0

9d9b0c0

Co-authored-by: Juan M. Cruz-Martinez <juacrumar@lairen.eu>

update values test_preprocessing

5b49c83

Define MSR and VSR indices as global constants

8dc3814

Remove build method

a418e99

Add/improve documentation

9685827

factor out definitions of normalization to top as global constants

bc41e61

Undo joining of photon integral

eb73f5a

Merge pull request #1777 from NNPDF/refactor_preprocessing

56b53e8

Refactor preprocessing

Merge pull request #1780 from NNPDF/refactor_rotations

5492542

Refactor rotations

comane added 15 commits October 27, 2023 09:26

ATLAS 2 JET

5de718b

finalized ATLAS_1JET

8ca0f20

finalized ATLAS 2JET

4f9c354

more or less final version of CMS 1JET

516f49b

more or less final version of CMS 2 JET

8ac8bd3

added reportengine to test new datasets

6d62d08

updated metadatafile

6de3610

added dataset name mapping

d72281c

metadata file as per doc

e4368c9

merged

49eafb2

.

1140585

removed cut in pt

2dee9f0

added comment to metadata regarding cut in pt

4414470

minor modifications to metadata

9d29296

minor mod to metadata

6d26730

scarlehoff mentioned this pull request Nov 13, 2023

Status of the new commondata format implementation #1709

Closed

77 tasks

scarlehoff reviewed Nov 13, 2023

View reviewed changes

scarlehoff mentioned this pull request Nov 14, 2023

Re-implementation of the LHCb Collider DY Datasets #1826

Closed

comane and others added 7 commits December 1, 2023 11:24

CMS_2JET_7TEV, removed nominal variant + specified as MULT the MULT u…

213166b

…ncertainties

ATLAS_2JET_7TEV_R06, added MULT, removed nominal

8fb5a85

corrected ATLAS_2JET bugged uncertainties

5f8aa75

added CMS 1JET bugged version + changed kinematis

7d3c825

added CMS 1JET bugged version + changed kinematis

23369b9

MULT and ADD for CMS_1JET_8TEV

e3658d8

Update buildmaster/dataset_names.yml

eadbe3e

Co-authored-by: Juan M. Cruz-Martinez <juacrumar@lairen.eu>

comane closed this Dec 7, 2023

scarlehoff deleted the test_old_jet_cd_to_new_cd branch November 14, 2024 10:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New Common Data Format] JETS #1785

[New Common Data Format] JETS #1785

comane commented Aug 4, 2023

scarlehoff left a comment •

edited

Loading

scarlehoff Nov 13, 2023

scarlehoff Nov 14, 2023

comane commented Dec 7, 2023

scarlehoff commented Dec 7, 2023

comane commented Dec 7, 2023

	ATLAS_1JET_8TEV_R06_PTY: ATLAS_1JET_8TEV_R06
	ATLAS_1JET_8TEV_R06_PTY: ATLAS_1JET_8TEV_R06_DEC

[New Common Data Format] JETS #1785

[New Common Data Format] JETS #1785

Conversation

comane commented Aug 4, 2023

scarlehoff left a comment • edited Loading

Choose a reason for hiding this comment

CMS_2JET_7TEV_M12Y

ATLAS_2JET_7TEV_R06 (variant bugged)

CMS_1JET_8TEV_PTY

ATLAS_1JET_8TEV_R06_PTY

scarlehoff Nov 13, 2023

Choose a reason for hiding this comment

scarlehoff Nov 14, 2023

Choose a reason for hiding this comment

comane commented Dec 7, 2023

scarlehoff commented Dec 7, 2023

comane commented Dec 7, 2023

scarlehoff left a comment •

edited

Loading