Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Common Datasets #1813

Merged
merged 105 commits into from
Feb 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
105 commits
Select commit Hold shift + click to select a range
13cd312
add dataset_names.yml
scarlehoff Oct 9, 2023
1ed3803
atlas jets
t7phy Oct 18, 2023
4064456
cms jets
t7phy Oct 18, 2023
77cfdb8
address comments atlas jet
t7phy Oct 23, 2023
029a5aa
address comments atlas dijet
t7phy Oct 23, 2023
5ae02a3
address comments cms jet
t7phy Oct 23, 2023
f59b7db
move data under validphys
scarlehoff Feb 1, 2024
68307fe
Implementing subsets of LHCb data
Radonirinaunimi Oct 25, 2023
068aacb
Update dataset namings
Radonirinaunimi Oct 25, 2023
738e5f7
Address comments from pre-review
Radonirinaunimi Oct 26, 2023
13b9747
Combined di-electron and di-muon for Z 13TeV
Radonirinaunimi Oct 26, 2023
ee3d729
Add tentative implementation for ATLAS DY 7 TeV low-mass measurement
cschwan Oct 27, 2023
0483bed
Apply suggestions from code review
cschwan Oct 27, 2023
a26f1c4
Rename NC DY -> Z0
Radonirinaunimi Oct 29, 2023
5f35bc3
fix setname in renaming NC DY -> Z0
Radonirinaunimi Oct 29, 2023
2c2a6ec
add NC & CC DY productions in muon rapidity at 7TeV
Radonirinaunimi Oct 30, 2023
2beaa50
add NC & CC DY productions in muon rapidity at 8TeV
Radonirinaunimi Oct 30, 2023
1cf2791
fix some metadata entries in NC & CC DY at 7 TeV
Radonirinaunimi Oct 30, 2023
1f9a683
fix center of mass energy in DY 8 TeV
Radonirinaunimi Oct 30, 2023
a67808e
Init LHCB_WENU_8TEV_RATIO
niclaurenti Nov 2, 2023
4393113
fix various issues
Radonirinaunimi Nov 8, 2023
cccc0c1
Update dataset name mapping
Radonirinaunimi Nov 8, 2023
da727a1
Uncomment extra-labels in lhcb
Radonirinaunimi Nov 9, 2023
b5801ae
include conversion factor under theory
Radonirinaunimi Nov 9, 2023
b8f2d6a
clean up and keep only LHCb
Radonirinaunimi Nov 10, 2023
e9161eb
Fix incorrect set name
Radonirinaunimi Nov 13, 2023
736a89d
fixed remaining problematic datasets
Radonirinaunimi Nov 13, 2023
b6de7b5
swap old and new and add an example with a variant
scarlehoff Nov 14, 2023
0bb7402
Fix minor details in descriptions
Radonirinaunimi Nov 28, 2023
86328bd
fix ambiguities in defining syst treatments
Radonirinaunimi Nov 28, 2023
7b58940
replace sqrt_s -> sqrts
Radonirinaunimi Dec 20, 2023
136a36f
merged #1826, collider DY LHCb
scarlehoff Feb 1, 2024
fdbbfc5
cms ttb 5tev tot
t7phy Nov 4, 2023
80a8eed
cms ttb 7tev tot
t7phy Nov 4, 2023
f2eb044
atlas ttb 7tev tot
t7phy Nov 4, 2023
ed33ffe
atlas ttb 8tev tot
t7phy Nov 4, 2023
52789e8
cms ttb 8tev tot
t7phy Nov 4, 2023
cde3ca2
cms ttb 13tev tot
t7phy Nov 4, 2023
bb12ee4
atlas ttb 13tev tot
t7phy Nov 4, 2023
c40974e
old new mapping
t7phy Dec 3, 2023
464df2f
Update dataset_names.yml
t7phy Dec 18, 2023
137cff9
dataset labels
t7phy Jan 22, 2024
6ce7a71
merged #1834, ttb integrated xs
scarlehoff Feb 1, 2024
220df06
atlas ttb 13 tev lj
t7phy Nov 5, 2023
7bc63d9
atlas ttb 8tev
t7phy Nov 6, 2023
7894f1f
filters go brrrr...
t7phy Nov 7, 2023
072191a
old new mapping
t7phy Dec 3, 2023
392f6e9
merge #1837, ttb atlas
scarlehoff Feb 1, 2024
893b40b
cms 8tev ttb
t7phy Nov 6, 2023
5e7649e
cms ttb 13tev
t7phy Nov 7, 2023
43b3abe
filters go brrrr....
t7phy Nov 7, 2023
b9bda11
make rapidity absolute
t7phy Nov 22, 2023
690eab9
add conversion factors
t7phy Dec 3, 2023
c4b07eb
old new mapping
t7phy Dec 3, 2023
451e0c1
fixed typos
t7phy Jan 22, 2024
5228c46
merge #1836, TTB cms, added kinematics_override to CMS_TTBAR_8TEV_LJ_DIF
scarlehoff Feb 1, 2024
3c0bc64
Start re-implementing CMS datasets
Radonirinaunimi Nov 28, 2023
ba77030
Fix naming
Radonirinaunimi Nov 28, 2023
7f60489
start adding CMS_Z0_7TEV_DIMUON_2D
Radonirinaunimi Jan 7, 2024
c5a247a
minor adjustments
Radonirinaunimi Jan 8, 2024
7791ba2
add CMS_WPWM_8TEV_MUON
Radonirinaunimi Jan 8, 2024
ef9575d
merge #1869, CMS DY
scarlehoff Feb 1, 2024
070a82b
added ATLAS_1JET_8TEV_R06
comane Dec 7, 2023
6da90d6
ATLAS_2JET_7TEV_R06
comane Dec 7, 2023
3a3b0c1
added CMS_1JET_8TEV/
comane Dec 7, 2023
7d218aa
added CMS_2JET_7TEV/
comane Dec 7, 2023
9f173f3
added dataset_names.yml
comane Dec 7, 2023
00084af
removed nominal from ATLAS_1JET
comane Dec 7, 2023
deffdf6
changed sys to MULT
comane Dec 10, 2023
3b57186
MULT uncertainties
comane Dec 10, 2023
bc9a249
MULT decorrelated uncertainties
comane Dec 10, 2023
cf99d12
added kin override
comane Dec 10, 2023
7b7c2ad
Update buildmaster/dataset_names.yml
comane Dec 10, 2023
fa1ccac
Update buildmaster/CMS_1JET_8TEV/metadata.yaml
comane Dec 10, 2023
c398798
Update buildmaster/dataset_names.yml
comane Dec 10, 2023
2221ed7
Update buildmaster/ATLAS_1JET_8TEV_R06/metadata.yaml
comane Dec 10, 2023
fd0625f
Update buildmaster/ATLAS_2JET_7TEV_R06/metadata.yaml
comane Dec 10, 2023
f98e809
Update buildmaster/ATLAS_1JET_8TEV_R06/metadata.yaml
comane Dec 10, 2023
1fc1b30
added units
comane Dec 10, 2023
eda734b
Update buildmaster/CMS_2JET_7TEV/metadata.yaml
comane Dec 10, 2023
f8c65c4
Update buildmaster/ATLAS_2JET_7TEV_R06/metadata.yaml
comane Dec 10, 2023
9807ccc
Update buildmaster/CMS_1JET_8TEV/metadata.yaml
comane Dec 10, 2023
cf55efb
merge #1886, jet data
scarlehoff Feb 1, 2024
e79d594
add script for automatic conversion
scarlehoff Feb 2, 2024
afa241f
first bunch of datasets: HERA
scarlehoff Feb 2, 2024
92f6764
add a script to check the datasets
scarlehoff Feb 5, 2024
a492d68
add mapping with full 4.0
scarlehoff Feb 5, 2024
733b497
change the name of CMS_TTBAR_13TEV_LJ_2016_DIF to avoid conflicts
scarlehoff Feb 6, 2024
ef72324
update the mapping
scarlehoff Feb 6, 2024
90313ee
add positivity datasets to the mapping
scarlehoff Feb 6, 2024
69a54bf
add legacy variants to already existing datasets
scarlehoff Feb 7, 2024
7d64544
added integrability datasets
scarlehoff Feb 7, 2024
a09d6d4
correct typos
scarlehoff Feb 7, 2024
ca0d9cd
update map: separate target, join nuclear corrections
scarlehoff Feb 7, 2024
41fb9bb
update names according to discussion at code meeting
scarlehoff Feb 7, 2024
426b9c5
Update buildmaster/old_new_porting_map.yml
scarlehoff Feb 7, 2024
41fb6b1
Update buildmaster/old_new_porting_map.yml
scarlehoff Feb 8, 2024
4eda140
update lhcb
scarlehoff Feb 8, 2024
030e1d1
update JETS -> DIJET
scarlehoff Feb 8, 2024
679a16e
update W/Z 7TeV 2011 data to have a explicit reference to the observa…
scarlehoff Feb 9, 2024
71b7396
update the autoport script, add dataset_names.yaml
scarlehoff Feb 12, 2024
3c65d51
add all new commondata automatically ported from old commondata
scarlehoff Feb 12, 2024
b9effd2
standardize variable names
scarlehoff Feb 15, 2024
970cac1
Merge pull request #1931 from NNPDF/automatic_port_old_commondata
scarlehoff Feb 15, 2024
df0c562
add a link to the datafiles in the root of the repository
scarlehoff Feb 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
334 changes: 334 additions & 0 deletions buildmaster/old_new_check.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,334 @@
#!/usr/bin/env python3
"""
This scripts checks that the datasets in the file ``dataset_names.yml``
are truly the same the same for all intents and purposes
For that we check that the:
1. Central value are the same
2. The covariance matrix are the same
3. The computed chi2 is the same
4. The t0 chi2 is the same (so multiplicative and additive are equivalent in the new and old implementation)

In order to perform this check we require a version of validphys that has both the old and new leader.
For instance commit 9ddf580f98c99f3c953e7be328df4a4c95c43c14

All checks are done assuming the same theory for the old and new dataset.
"""

from argparse import ArgumentParser
from dataclasses import dataclass
from functools import cached_property

from lhapdf import setVerbosity
import numpy as np
import pandas as pd
import yaml

from validphys.api import API
from validphys.convolution import central_predictions
from validphys.datafiles import path_commondata as old_cd_root
from validphys.loader import Loader

dataset_names_path = old_cd_root.with_name("new_commondata") / "dataset_names.yml"

setVerbosity(0)
pdf = "NNPDF40_nnlo_as_01180"
pdf_load = API.pdf(pdf=pdf)
DEFAULT_STRICT = False

# This will only work if the code was installed in edit mode, but this is a development script so, fair game
runcard40 = (
old_cd_root.parent.parent.parent.parent.parent
/ "n3fit"
/ "runcards"
/ "reproduce_nnpdf40"
/ "NNPDF40_nnlo_as_01180_1000.yml"
)
if runcard40.exists():
runcard40_data_raw = yaml.safe_load(runcard40.read_text())["dataset_inputs"]
runcard40_data = [i["dataset"] for i in runcard40_data_raw]
else:
runcard40_data = None


class CheckFailed(Exception):
pass


@dataclass
class DToCompare:
old_name: str
new_name: str
variant: str = None
theory_id: int = 717
old_theory_id: int = 717

def __str__(self):
return f"[old: {old_name} vs new: {new_name}]"

@property
def is_positivity(self):
return self.new_name.startswith("NNPDF_POS")

@property
def is_integrability(self):
return self.new_name.startswith("NNPDF_INTEG")

@property
def is_lagrange(self):
return self.is_integrability or self.is_positivity

@property
def generic(self):
return {"use_cuts": "internal", "pdf": pdf}

@property
def dataset_input_old(self):
di = {"dataset": self.old_name}
if self.is_lagrange:
di["maxlambda"] = 1.0

if self.is_positivity:
return {"posdataset": di, "theoryid": self.old_theory_id}
elif self.is_integrability:
return {"integdataset": di, "theoryid": self.old_theory_id}
return {"dataset_input": di, "theoryid": self.old_theory_id}

@property
def dataset_input_new(self):
di = {"dataset": self.new_name}
if self.variant is not None:
di["variant"] = self.variant
if self.is_lagrange:
di["maxlambda"] = 1.0

if self.is_positivity:
return {"posdataset": di, "theoryid": self.theory_id}
elif self.is_integrability:
return {"integdataset": di, "theoryid": self.theory_id}
return {"dataset_input": di, "theoryid": self.theory_id}

def api_load_dataset(self, dinput):
"""Load a dataset (positivity or not) with VP"""
if self.is_lagrange:
if self.new_name.startswith("NNPDF_POS"):
return API.posdataset(**self.generic, **dinput)
else:
return API.integdataset(**self.generic, **dinput)
else:
return API.dataset(**self.generic, **dinput)

@cached_property
def ds_old(self):
"""Load the old-commondata dataset"""
return self.api_load_dataset(self.dataset_input_old)

@cached_property
def cd_old(self):
"""Load the commondata object out of the dataset"""
return self.ds_old.load_commondata()

@cached_property
def ds_new(self):
"""Load the new-commondata dataset"""
return self.api_load_dataset(self.dataset_input_new)

@cached_property
def cd_new(self):
"""Load the commondata object out of the dataset"""
return self.ds_new.load_commondata()

def api_call(self, api_function, extra_config=None):
"""Apply a validphys API call for a given function for both the old and new datasets"""
if extra_config is None:
extra_config = {}
old_val = getattr(API, api_function)(
**self.generic, **extra_config, **self.dataset_input_old
)
new_val = getattr(API, api_function)(
**self.generic, **extra_config, **self.dataset_input_new
)
return old_val, new_val


def check_central_values(dcontainer, strict=DEFAULT_STRICT):
"""Check the central values

By default, exit if they are not _the same_
"""
cd_old = dcontainer.cd_old
cd_new = dcontainer.cd_new
if np.allclose(cd_old.central_values, cd_new.central_values):
return True

print(f"# Problem in the data values comparison of {dcontainer}")
if strict:
raise CheckFailed

_od = cd_old.central_values
_nd = cd_new.central_values
rat = np.abs(_od / _nd)
if not np.allclose(rat, 1.0, rtol=1e-3):
if not np.allclose(rat, 1.0, rtol=1e-2):
print("Relative differences are above 1e-2! Panic!")
df = pd.concat([_od, _nd, rat], axis=1)
breakpoint()
return False
else:
print("Relative differences between 1e-3 and 1e-2... acceptable...")
else:
print("Relative differences under 1e-3, continuing comparison...")


def check_theory(dcontainer):
"""Returns the old and new predictions"""
new_pred = central_predictions(dcontainer.ds_new, pdf_load)
old_pred = central_predictions(dcontainer.ds_old, pdf_load)
return old_pred, new_pred


def check_chi2(dcontainer, strict=DEFAULT_STRICT, rtol=1e-5):
"""Checks whether the chi2 is the same
A failure in the comparison of the chi2 can come from either:
1. Data
2. Covmat
3. Theory
Try to give as much information as possible before failure
"""
chi2_old, chi2_new = dcontainer.api_call("central_chi2")
if np.allclose(chi2_old, chi2_new, rtol=rtol):
return True

if strict:
raise CheckFailed(f"Different chi2: {chi2_old:.4} vs {chi2_new:.4}")

print(f"# Differences in the computation of chi2 {chi2_old:.5} vs {chi2_new:.5}")
# Check the predictions first
old_pred, new_pred = check_theory(dcontainer)
if not np.allclose(new_pred, old_pred):
print("... but the predictions were already different")

old_covmat, new_covmat = dcontainer.api_call("covmat_from_systematics")
if not np.allclose(old_covmat, new_covmat):
print(" The covmats are different", end="")
if not np.allclose(np.diag(old_covmat), np.diag(new_covmat)):
print(" ...even the diagonal!")
else:
print(" ...but the diagonal is the same!")


def run_comparison(dcontainer):
"""
Given an old and new datasets in a container, perform a full comparison
"""
# Check central data
check_central_values(dcontainer)

if dcontainer.is_lagrange:
pred_old, pred_new = check_theory(dcontainer)
if np.allclose(pred_old, pred_new):
print(f" > Comparison ok for positivity dataset {dcontainer}")
return

# chi2!
# Computing the chi2 is checking:
# 1. That the theory is loaded in the same way
# 2. That the (experimental) covmat is created equal
# Note that any problem in the data check above might break this
check_chi2(dcontainer)

# Check the chi2... with t0!
# This checks that the ADD and MULT systematics are loaded in the same way

chi2_old_t0, chi2_new_t0 = dcontainer.api_call(
"central_chi2", extra_config={"use_t0": True, "t0pdfset": pdf}
)
if not np.allclose(chi2_old_t0, chi2_new_t0, rtol=1e-5):
raise CheckFailed(f"The t0 chi2 is different: {chi2_old_t0:.5} vs {chi2_new_t0:.5}")

print(f"> Comparison ok! {dcontainer}")


def check_40(old_name):
if runcard40_data is not None and old_name not in runcard40_data:
print(f"\033[92m Dataset {old_name} was not part of 4.0\033[0m")


if __name__ == "__main__":
parser = ArgumentParser()
parser.add_argument(
"-s", "--stop", help="Stop on failure by raising the exception", action="store_true"
)
parser.add_argument(
"-l", "--only-legacy", help="Check only those with variant: legacy", action="store_true"
)
parser.add_argument(
"-v", "--verbose", help="Print the whole information on the failure", action="store_true"
)
parser.add_argument(
"-f",
"--filter",
help="Simple filter to select a subset of datasets, applied on the new data",
type=str,
nargs='+',
)
parser.add_argument("-t", "--tid", help="Theory id, default 717", default=717)
parser.add_argument("--old_tid", help="Old Theory id, default 717", default=717)
args = parser.parse_args()

all_ds_names = yaml.safe_load(dataset_names_path.read_text())

for old_name, new_ds in all_ds_names.items():
if isinstance(new_ds, str):
new_name = new_ds
variant = None
else:
new_name = new_ds["dataset"]
variant = new_ds.get("variant")

if args.only_legacy and variant != "legacy":
continue

if args.filter is not None:
if not any(filter_word in new_name for filter_word in args.filter):
continue

if args.verbose:
print(f"###########\n")

try:
# Create the DToCompare container class
# which knows how to call validphys for the various informations it needs
# and eases the printing
dcontainer = DToCompare(
old_name, new_name, variant=variant, theory_id=args.tid, old_theory_id=args.tid
)
run_comparison(dcontainer)
except CheckFailed as e:
print(f"> Failure for \033[91m\033[1m{old_name}: {new_name}\033[0m\033[0m (check)")
# Regardless of the failure mode, tell the user whether this is a 4.0 dataset
# But only in case of failure, otherwise why should we care
check_40(old_name)
if args.stop:
raise e
if args.verbose:
print(e)
except Exception as e:
print(f"> Failure for \033[91m\033[1m{old_name}: {new_name}\033[0m\033[0m")
check_40(old_name)
if args.stop:
raise e
if args.verbose:
print(e)
except BaseException as e:
print(f"> Failure for \033[91m\033[1m{old_name}: {new_name}\033[0m\033[0m")
# Before raising the exception, check whether this is just a question of not having the right theory names
theory_info = Loader().check_theoryID(args.tid)
fk_path = dcontainer.ds_new.commondata.metadata.theory.fktables_to_paths(
theory_info.path
)
if not fk_path[0][0].exists():
print("Seems like the theory dictionary is not pointing to actual theories")
if args.stop:
raise e
if args.verbose:
print(e)
Loading