-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New CommonData Reader #1678
New CommonData Reader #1678
Conversation
@enocera how can I check that I'm reading all data and uncertainties correctly? The number of systematics in this commondata is not the same as in the old one, isn't it? Or the old one contains all the variations in the same table? vs so, data and statistics seem correct but the systematic part not really? Also, the process name is different. Is this fine? I'm using the |
I think you may have merged something in a funny way: There are a bunch of duplicated classes such as |
Continuing from here, #1679 (comment), Now, at the moment one needs to define some thoery metadata always. The easiest way to satisfy the above would be to make that optional, but I think one can make an argument to also offload it to a separate file like we do for the uncertainties. |
It isn't indeed. I realise that all the data set variants are actually the same (they shouldn't) and that they correspond to the variant of the data set with no nuclear uncertainties. So the extra systematics in the old data set are for the extra nuclear uncertainties that have not been implemented (yet?) in any of the variants of the new data set.
That is just a mistake. |
Having the theory be optional (so that one can deal with the data in some ways before having a prediction for it) I think it is a good idea.
is too lenient. Of the current set of required options (I've taken out theory) setname: str
ndata: int
observable: dict
kinematics: ValidKinematics
kinematic_coverage: dict
data_central: ValidPath
data_uncertainties: list[ValidPath]
dataset_label: str
plot_x: str
figure_by: list[str]
nnpdf_metadata: dict
version: int I think only |
I meant that out of the possible external per-point files (fk-{stuff}, uncertainties, experimental values, kinematics), only the kinematics are intrinsic to what the dataset it, with the rest being optional and subject to being overwritten by variants. I don't have strong opinions on what should be required wrt the top level metadata but, if we start more strict, we can always add defaults later on. |
I guess it follows that the current We could also use a bunch of boolean methods for checks like |
Would you like to move the notebook to some example directory or similar? |
The notebook is just there for people to use it while they are implementing new cd. It won't be merged to master.
I was planning to check on construction. But if we allow for them to be empty I guess yes, the parser will need to do that.
sure
And will be the only one for the time being I guess. It is just the CommonMetaData that will be allowed to have missing pieces (so that if in the future there's an use case for a half-constructed CommonData it can be done) but I consider the current* *understanding current as "whatever is in master". I'm happy changing that and as I said at the beginning I'll keep rebasing this branch so that it is able to track whatever the CommonData needs to have in master. |
I guess there are a few possible ways to go about it, which have different trade offs in terms of how much code we break and how elegant they are. For example, we have said that we want positivity predictions to be basically commondata files without experimental central values (which is why the central values are in a separate, optional, file, different from the uncertainties). We could make that be a "normal" commondata file that just returns zero when asked about central values,. That is close what we have now, but less nice. We could make the commondata object itself have only optional central values, but that would break a lot of stuff unless we walk over the whole code and add checks as needed. Or we could have a bunch of different "final realisations" that depend on what capabilities an action needs (e.g. the inputs would be things like |
Actually would be kind of nice to have some example notebook. People do seem to like those. |
But then the example notebooks need to be maintained. And so they would need to be tested, etc, etc.
This I like. It was also my original idea (two branches ago)
If you mean, "this is done in master and then this is modified to follow that master as required" I completely agree. I'm not planning to merge this until all the new commondata is ready. I'm currently pruning the dangling methods, attributes and functions that have been left from the python commondata and closure test. That will allow us to take out the C++. Then I'll try to unify the interfaces (so we don't have things like the |
5290ba1
to
9849418
Compare
The reader is now updated for the new format. The only thing I'm not happy with how I've solved (@Zaharid) is the way I'm keeping track of the parent since with validphys I'm loading the SetMetaData which can include many ObservableMetaData. Then I select a particular observable. But I would like this Observable to still know everything about its parent Set. My easy solution is to say, at loading time (since there is only one way to load it and it is the parent doing it), For the rest, nothing changes with respect to the previous loader. @t7phy if you update the top data to use the new naming format (maybe you already did, I don't remember right now) and @enocera adds a few DIS datasets (with variants, which is something not well tested yet!) I'll update the notebook to do a test to ensure that the data/uncertainties/kinematics have not changed (I would like to do such a test before merging this into master). I believe it might be interesting to merge this into master in a way in which both commondata formats are allowed at the same time since it will surely help people with debugging and porting (but this is a discussion for another moment ofc) |
#1684 is now updated for testing with the reader based on all newly agreed conventions. Have a go at the reader test, in case I have missed something, please let me know. @scarlehoff |
At this point it would really help to have #1691 , because it is starting to confuse me. Do I remember correctly that "all the things about the parent" are the metadata? I don't think there is anything wrong with passing the metadata as self, but I think that instead of adding hacks to the dataclass that is supposed to more or less represent the input verbatim, I would make a new one. |
I think we are close to be able to document it.
No. The metadata we are interested in within vp is the |
@scarlehoff could you please test the reader on this specific dataset/folder: https://github.com/NNPDF/nnpdf/tree/gluon_pdf_ncd/buildmaster/ATLAS_TTBAR_8TEV_LJ_DIF As you know, for the other PR i still need to delete |
You have to remove the But I'll try it tomorrow |
Ok, you can test this and also #1684 is now fixed. Besides these, the following two can also be tested: so I expect the 4 of these so far to be fully compatible with the reader (I hope). |
@t7phy Although these are for new data sets (not in NNPDF4.0), aren't they? So perhaps @scarlehoff wants to try first the old data sets. |
https://github.com/NNPDF/nnpdf/tree/gluon_pdf_ncd/buildmaster/ATLAS_TTBAR_8TEV_LJ_DIF and #1684 are old, the other 2 are new. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I see you're implementing comments now, here are some more
What is the status of #1708? |
Co-authored-by: Roy Stegeman <roystegeman@live.nl>
Not bad, but not fantastic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A very tiny one (again) that I've just stumbled upon when skimming through.
Okay I think that would deserve relatively high priority, especially once this is merged. |
Co-authored-by: Tanjona Rabemananjara <rrabeman@nikhef.nl>
Add a library of process dependent options
Co-authored-by: Roy Stegeman <roystegeman@live.nl>
The previous fitbot failed because of a problem in the server, but the fit was uploaded. |
Greetings from your nice fit 🤖 !
Check the report carefully, and please buy me a ☕ , or better, a GPU 😉! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, you put a lot of work in this
This branch contains not only the final reader, but also the new commondata and all changes from master.
Some checks performed
The fitbot there act as a check that old theories can still be used (the fitbot uses theory 200... it will soon be changed to theory 700). There are small changes in the fitbot (while the regression tests have not changed) because there are some dataset for which the precision of the data has changed (e.g., we had 4 digits before and now have 5), but it is only a few LHCB and maybe maybe jets, so it doesn't show up in any regression.
I will submit another fitbot just before the merge to update the reference bot.
This is a comparison to the last baseline (NNPDF40_nnlo_as_01180_qcd) where I've used exactly the same runcard (i.e., I haven't changed the names of the datasets that have been translated automatically by vp) https://vp.nnpdf.science/ClK5YFI-TjCBkzTeewuFow== (since the report is also done with new data, it might be informative to compare to a report done with old data 1)
And this one is the same fit but this time the names of the runcard have also been updated: https://vp.nnpdf.science/QaBlf8XvSmSe8UWMvzIy3g==
In both cases they are <60 replica fits.
The fits have been done with this commit 1a8bf48 which corresponds to this one 7e6599a before the rebase on top of the last batch of comments in the other branch.
Checks for sets of data separated by experiment
https://vp.nnpdf.science/chwFM_lJR025vREJ1I9VPQ==
https://vp.nnpdf.science/Toy_r6uFRm-h1oUFEZF6hw==
https://vp.nnpdf.science/-2VyNN3CTHWnb26fUQkUJQ==
https://vp.nnpdf.science/YPAQHvMtTeyBuNq12nl6RQ==
https://vp.nnpdf.science/UHU-TYLJQCuE-8lRHe8Aog==
https://vp.nnpdf.science/oSZLrPg3Tyyf-i2mE33G-Q==
https://vp.nnpdf.science/wCFRKBNsSA2U2O_6Sa75Zw==
https://vp.nnpdf.science/E0IDIgWFRF6tvdFk3pD0mA==
https://vp.nnpdf.science/0YL4eaPTT7eR3wM9IQwj5w==
You should be able to use it with
You can even install this in a separated isolated virtualenvironment in your computer without having to clone or checkout the repository with:
python -m venv nnpdf_venv source nnpdf_venv/bin/activate python -m pip install git+https://github.com/NNPDF/nnpdf.git@final_reader_for_new_commondata_mk2
Leaving the original msg here:
This branch / PR is the final reader for the new commondata files.
As you can see, this is a bit less ambitious than the previous PR. I've added a reader and only a reader. The changes to other parts of the code are minimal and well contained*. The reader is able to read the new commondata but the output is an object of the old type. This will allow me to keep this branch on track with
master
.@enocera in principle people can branch out of this branch to implement the new dataset so that they can test. As long as they don't change the code itself there will be no conflicts. If people find it difficult I can test the new commondatas myself as they get implemented.
@Zaharid since this branch is only going to add a reader and thus there will be no conflicts with master, feel free to comment or even modify the parsers to your heart's content. This is even almost orthogonal to people's implementation of the new data so it should be fine.
For people implementing new commondata:
The new data should be found together with the old commondata inside the commondata/new folder found in:
validphy2/src/validphys/datafiles/
. Note that the "new" part will dissapear. It is just to have the new and old commondata files differentiated.So, for instance, if you implement a set of data like
CMS_TTBAR_13TEV_2L_DIF
then it will be inside:validphy2/src/validphys/datafiles/commondata/new/CMS_TTBAR_13TEV_2L_DIF
.If you installed the code with
pip install -e .
then you can update the metadata there, no need to reinstall or anything (like it used to be the case with cmake).Then you can load the new commondata in the same way you would do so with Validphys
The you can load the new commondata with:
Important: take into account that the theory id must be one of the new ones, since the
theory
is read from the commondata itself.TODO
vp
wherepeek_commondata_metadata
is used one can also use the newCommonMetaData
class. It should be a trivial change but will wait until the end.load
andload_commondata
(as now they will both do the same thing)(feel free to edit this comment with other to-do items that might be important)
*note that this is possible only now that there is no longer the need to keep C++ compatible or literal objects in the pipeline.