-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Test implementation of new commondata format #1500
Conversation
One thing missing from the list (I only realised now that I started with the parser) would be an automatic installation. I guess to first approximation these should be part of the validphys package, right? Since the fit needs validphys anyway. (it's not important for the parser, I can hardcode the folder for now) |
@enocera dataset_input:
- dataset: "NMCPD"
- variant: "dw" and if no variant is selected it defauls to the standard one. |
@scarlehoff Do you mean in a vp runcard? |
Yes. Instead of having to write NMCPD_dw |
Then yes, I think that we want to have something similar in spirit to, but more general than, the current |
Or, as you say, the |
Great, I like this idea. I'll implement it in this spirit. The idea is then that "sys" will disappear? |
Yes, definitely. The problem with |
Do we want to have a |
And for @enocera, this is a perfect and total substitution, right? So I can just scrap all the "old" commondata stuff as I find it in the code? No need to keep legacy wrappers like I have to do for the fktables. |
I'd answer yes because the idea (my idea) is to rework all the datasets in the new format. |
buildmaster/NMCPD/kinematics.json
Outdated
@@ -0,0 +1 @@ | |||
{"kinematics_avg":[{"x":[0.0015,0.0015,0.0015,0.0015,0.0015,0.003,0.003,0.003,0.003,0.003,0.003,0.003,0.005,0.005,0.005,0.005,0.005,0.005,0.005,0.005,0.005,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.008,0.0125,0.0125,0.0125,0.0125,0.0125,0.0125,0.0125,0.0125,0.0125,0.0125,0.0125,0.0125,0.0125,0.0125,0.0175,0.0175,0.0175,0.0175,0.0175,0.0175,0.0175,0.0175,0.0175,0.0175,0.0175,0.0175,0.0175,0.0175,0.025,0.025,0.025,0.025,0.025,0.025,0.025,0.025,0.025,0.025,0.025,0.025,0.025,0.025,0.025,0.035,0.035,0.035,0.035,0.035,0.035,0.035,0.035,0.035,0.035,0.035,0.035,0.035,0.035,0.035,0.035,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.07,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.09,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.11,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.18,0.18,0.18,0.18,0.18,0.18,0.18,0.18,0.18,0.18,0.18,0.18,0.18,0.18,0.18,0.225,0.225,0.225,0.225,0.225,0.225,0.225,0.225,0.225,0.225,0.225,0.225,0.225,0.225,0.275,0.275,0.275,0.275,0.275,0.275,0.275,0.275,0.275,0.275,0.275,0.275,0.275,0.275,0.35,0.35,0.35,0.35,0.35,0.35,0.35,0.35,0.35,0.35,0.35,0.35,0.35,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.45,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.675,0.675,0.675,0.675,0.675,0.675,0.675,0.675,0.675,0.675 ]},{"Q2":[0.16,0.25,0.35,0.45,0.6,0.17,0.25,0.35,0.45,0.63,0.88,1.12,0.16,0.25,0.35,0.45,0.61,0.88,1.13,1.38,1.71,0.16,0.25,0.35,0.45,0.64,0.86,1.12,1.37,1.75,2.24,2.73,3.46,0.16,0.26,0.35,0.45,0.62,0.88,1.12,1.37,1.74,2.23,2.74,3.46,4.47,5.41,0.25,0.35,0.45,0.62,0.88,1.12,1.37,1.75,2.24,2.73,3.48,4.47,5.49,6.83,0.26,0.35,0.45,0.62,0.86,1.13,1.37,1.74,2.24,2.74,3.45,4.47,5.48,6.92,8.92,0.36,0.45,0.64,0.86,1.13,1.38,1.74,2.24,2.74,3.46,4.45,5.47,6.92,8.96,11.45,14.36,0.46,0.61,0.88,1.13,1.37,1.74,2.25,2.74,3.46,4.46,5.46,6.9,8.93,11.44,14.82,19.19,0.68,0.86,1.11,1.38,1.74,2.24,2.75,3.47,4.47,5.47,6.91,8.91,11.4,14.89,19.63,26.07,0.9,1.11,1.38,1.76,2.24,2.75,3.49,4.47,5.46,6.91,8.92,11.37,14.87,19.74,26.36,34.74,1.13,1.38,1.75,2.24,2.74,3.49,4.47,5.46,6.9,8.92,11.37,14.85,19.74,26.52,35.32,44.94,1.4,1.75,2.24,2.74,3.47,4.48,5.47,6.9,8.92,11.37,14.84,19.76,26.55,35.28,46.95,1.82,2.24,2.75,3.47,4.47,5.49,6.92,8.93,11.37,14.85,19.75,26.61,35.37,47.01,63.04,2.28,2.74,3.48,4.47,5.47,6.97,8.93,11.38,14.85,19.74,26.64,35.42,46.95,63.23,2.78,3.46,4.47,5.47,6.89,8.94,11.37,14.87,19.74,26.6,35.43,46.98,63.48,90.68,3.57,4.51,5.48,6.9,8.91,11.43,14.85,19.74,26.63,35.45,47.12,63.52,96.35,4.54,5.47,6.93,8.92,11.34,14.89,19.77,26.64,35.5,47.26,63.65,98.05,5.53,6.88,8.91,11.34,14.85,19.74,26.64,35.57,47.16,63.56,98.82,7.04,8.88,11.36,14.85,19.79,26.49,35.4,47.03,63.53,99.03]},{"y":[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]}]"kinematics_min":[{"x":null},{"Q2":null},{"y":null}]"kinematics_max":[{"x":null},{"Q2":null},{"y":null}]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these null
expected? What should be returned if an user asks for the "minimums array", just the same thing as the avg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I placed a null
because this is a case in which bin edges are not provided by the experimentalists; I'd say that if null
, then min=avg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks.
I also I think there might have a bug in the script that outputs the json because the kinematics_max
and kinematics_min
are missing a comma in front.
Also, I think [{"x":null},{"Q2":null},{"y":null}]
could just be a single dictionary: {"x":null, "Q2":null, "y":null}
(I'm using the null
as example because the arrays are big, but the same applies.
buildmaster/NMCPD/metadata.yaml
Outdated
arXiv: "hep-ex/9611022" | ||
iNSPIRE: "https://inspirehep.net/literature/424154" | ||
hepdata: "https://www.hepdata.net/record/ins426595" | ||
tables: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tables: | |
tables: |
I like the separation between header and object and also would like the parsing stage of vp to be as fast as possible, so rather prefer the arrangement where we have two classes. |
buildmaster/NMCPD/metadata.yaml
Outdated
variants: | ||
- nuclear_uncertainty_choice: | ||
deweighted: uncertainties_dw.json | ||
shifted: uncertainties_dw.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
variants: | |
- nuclear_uncertainty_choice: | |
deweighted: uncertainties_dw.json | |
shifted: uncertainties_dw.json | |
variants: | |
deweighted: uncertainties_dw.json | |
shifted: uncertainties_dw.json |
I think it is better to have just one level of variance here, otherwise the fit runcards might become too complicated.
Then we might want to have metadata in the variance type of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can variants change things other than the uncertainties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a very good question. The answer is yes, in the future. Example: datasets for which EW corrns have not been subtracted, in the case in which these are instead in the theory. So maybe we should list both the data.json, uncertainty.json and kinematics.json files for each variant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of having a default selection and then allowing the variant to change a number of things.
So:
variants
deweighted:
uncertainties: uncertainties_dw.json
ew:
data: ew_subtracted.json
etc
This can be done with one single class as well. In any case, I already started putting all in one for the toy model, separating it in two should be easy. Instead what I'm separating is |
The one class we need to have is the one representing precisely whatever the user wrote in the runcard. As @RoyStegeman can tell, not having that e.g. for positivity is a source of of pain. |
That has nothing to do with this, the kinematics or uncertainties are not written in the runcard but are part of the metadata of the commondata and that needs to be held by the CommonDataSpec. |
One big annoyance with json for scientific applications is that it cannot handle NaN or infinity. I wonder how relevant is that in here. For one we could parse the same files with yaml and write e.g. |
@Zaharid Are you wondering how relevant |
I guess both. |
So, I don't think that NaN or inf can appear as sensible values of data central values or uncertainties or kinematic bins. Moving to |
Hi @enocera, I don't understand the format of the uncertainties:
What should I parse from here. Are the numbers following Or even directly a list of dictionaries
and then the number of systematic is just the number of dictionaries in the list (so no need to keep track with the |
So:
|
Ah! Ok. Then I guess we should have some kind of metadata (thinking about yaml in this example):
|
Yes, definitely. This is what I was thinking about after Zahari suggested to abandon |
There was also the suggestion to add the information on what is being fitted here. This might be worth adding already? |
We have discussed doing this several times and it would indeed be nice if we could have it in a pull request form. A key requirement is the ability to check that the current version of the tables matches the latest version in hepdata. |
I agree that this is definitely something that we want. This could be easily implemented and something I can take care of. |
I now have a module that reads the hepdata URL from This can be used to perform what you, @Zaharid, mentioned above. But I just want to make sure that I understand exactly what you have in mind. What I have been playing with is a github action that--using the module--checks every now and then hepdata if a new version of a dataset is available. And if so, it opens a PR with the new version and keeps pushing commits whenever new versions are available until the PR is merged. Is this somehow in line with what you think? |
…ary files for the reader (no removals or rawdata)
…ary files for the reader (no removals or rawdata)
…dtaed metadata.yaml
eea53bc
to
37b4aab
Compare
Hi @Radonirinaunimi, in this branch there are some utilities that might be useful (while the dataset implementation is now outdated). Do you want to keep these tools? Otherwise we can close this PR. |
I think it'd be better to close this PR, and if needed I'd implement these utilities in another PR. Especially since after having re-implemented datasets, I'd approach these utilities in a different way now. |
ok, thanks! |
This PR addresses #1436.
The proposed layout is as follows. For each data set, there is a folder called
DATASET_NAME
. Each folder contains:metadata.yaml
file;filter.py
file.The
filter.py
script reads themetadata.yaml
file and generates no less than three.json
files:data.json
, for the central values of the data;kinematics.json
for the kinematics of the data;uncertainties.json
for the uncertainties of the data.There can be as many variants as needed for each of these three files. Variants are specified in the
metdata.yaml
.TO DO
filter.py
.json
files in lieu of the 'DATAand
SYSTYPE` files. (Add vp parser for new commondata format #1511)