-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New metadata format #1684
New metadata format #1684
Conversation
Thanks. If I'm not mistaken the only change with respect to the previous version of the commondata is the |
Yes |
5290ba1
to
9849418
Compare
I have a few comments about the files in this PR, and some more (at the bottom) that are more specific and that should be answered I guess by @enocera Please don't leave version empty. It should be 1 if it is the initial version. For instance version: 1
version_comment: "initial version" The Rather than having an empty FK_tables:
- - CMSTOPDIFF8TEVTTRAPNORM-TOPDIFF8TEVTTRAP
- - CMSTOPDIFF8TEVTTRAPNORM-TOPDIFF8TEVTOT
operation: RATIO The For the following I would like to have also @enocera input: There's "a problem" with the kinematics. In the past we always had three kinematic variables, however in the observables in this PR there is only two. It is of course reasonable that different datasets have a different number of kinematic variables (then I will fill in one with 0s or something since there are several points in vp in which three are assumed). However, looking at the "old commondata" version of from validphys.api import API
cd = API.dataset(dataset_input={"dataset": "CMSTOPDIFF8TEVTTRAPNORM"}, use_cuts="nocuts", theoryid=200).load_commondata()
cd.commondata_table RE the uncertainties, in the previous versions (FTDY or NMC) it was called Also, inspecting the old version of So I guess I have two requests: |
The number of kinematic variables in the new format should never be smaller than the number of kinematic variables in the old format. For plotting purposes, the three variables are all needed. Let me fix this with @t7phy .
@scarlehoff Let me check all these issues with Tanishq (I'm going to talk to him in 2 mins) and we will come back to you. |
@scarlehoff The old commondata format is as follows {idat, kin1, kin2, kin3, data, stat, {systematics}}, but {systematics} is twice the number of systematic uncertainties in the (old) metadata. The reason being that the first is the absolute value of the sys uncertainty, and the second is the percentage value, see https://docs.nnpdf.science/data/exp-data-files.html?highlight=commondata#commondata-file-format. Does this answer your question?
Let me check carefully the implementation and I will tell you when you can proceed with the testing further. |
This is done
I will do in next commits
I would prefer to leave them empty, because the goal is to generate all the theories for which data is there and implemented. This emptiness will be a reminder to generate them, and eventually have this field populated, as opposed to throwing away the
I will modify this in the next commits
@scarlehoff The final convention is that we use |
Yes, and no. It's still not clear to me how the convention on the old commondatas are transported to the new ones since now we have only the "real (used) one". Does that mean upon loading a variant of the uncertainties how each uncertainty is to be treated is fixed forever? (now that I'm writing it down I'm getting flashes, I guess this is @Zaharid's point in #1691 but I need to fully remember all this if I'm to document it...)
I would insist on throwing away the field itself. That way I have freedom deciding which fields are or not optional in the theory (plus the format itself) and the people implementing the data has the freedom to add new datasets without knowing what the theory format is.
I'll update to follow the convention used here (also for the other fields if they change). Don't worry too much about NMC since I guess that one is going to be redone (btw, if you want to cherry-pick the CMS dataset from this branch and rebase it on top of master or utils, please do so). |
a6c63d9
to
56fe007
Compare
6410bca
to
26948a2
Compare
26948a2
to
8725a81
Compare
b117b34
to
639e6c6
Compare
closing this as we now have the agreed upon format for metadata with implementation examples in #1813 |
This PR implements CMS_ttBar_8TeV dataset with the new metadata as discussed in Ams code meeting. @scarlehoff this builds upon the commondata reader branch, however, could you please tweak the reader script in this branch and test out if it works on the new format?
backward_compatbility.yaml in the buildmaster directory is the dictionary as you had proposed, please have a look at that too.