-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Utils for the new commondata #1693
Conversation
I'm confused. There is no new commondata in master so this for sure should not point to master. If this is for the generation of new commondata (and for the filters) this should go in a PR together with the first "new commondata" that we agreed during the code meeting should be basically the update of #1500 with the new style. Also, as a general comment, please use the same style as it is used in the rest of the code. |
This is pointing to master because multiple datasets that will be in different branches by different people will all be requiring these utils. This PR was agreed upon discussions following the code meeting as to where these should be included. |
Then it should be in the "master" of the new commondata from which all will branch (which is #1500, or #1678 once I have a reader for the new version, but that's a bit of a catch 22, I'll do it quickly as soon as I have something to test upon) |
In that case could you at least rebase #1678 (as this is where dataset braches are pointing to, as you asked) from #1693? So implementations can go on even if the final reader will need some time |
Once there is a new template I can test and compare, I'll update the reader and everyone can rebase on top of the changed #1678 If you want this utilities to be usable for everyone then you can rebase this on top of #1678 as well so that there is master --- reader --- utilities ---- {all new commondata} I'm also happy to have master --- utilities ---- {all new commondata} is you feel that tracking the reader is too much of a hassle. I can take care of updating the reader as new implemented data create new corner cases. In any case, the reader should not include any utilities to create new data. In any of the two cases, since all new commondata, reader and utilities should be orthogonal changes rebasing, cherry-picking and merging these commits should be painless (actually, if it isn't, that means that something went wrong!) |
It was my understanding that, that was the point of #1684, but anyway...
Fine, then I think it should be master --- utils --- all new commondata. This as, utils will not be modified (unless someone asks for some specific functions they might need) whereas reader will be so I will prefer to keep this standalone. Once the reader is ready, we can decide what to base on what (and hopefully it will be simple) |
As it was discussed on Wednesday I'd like to have something I can compare the previous version to, otherwise it is hard for me to know what is a bug and what is a feature :P
Perfect.
Actually, I think this is a good reason to have it stand-alone, that way as people add new functions they can be included here and be propagated upon merge to the ones that are not using it these new functions. And once everything is implemented (and therefore the utilities are final) then it can be reviewed/modified (for instance, I think we should leave the top level commondata for the one that is used by vp during the fits/reports, and I would separate utilities to generate data... but that's a discussion for the future) |
A few general notes: These functions should decide if they are dealing with python lists or numpy arrays. It should be most likely be arrays, and so we should not cast them to python lists in various places. It seems to me in practice it is easier to get well formatted arrays out of various raw data parsers (and if not, we can have utils for that). I don't believe we should have formatting conventions that drop the information of the shape such as in the return value of triMat_to_fullMat. Especially if we are also going to have other conventions in other functions such as in the return value of Functions with "mode" flags should be likely broken into separate functions, for each of the modes. It is easier to understand what Numpy has a lot of functionality that removes the need to write explicit loops. Examples of things that may be useful include The naming of functions and variables should follow the usual convention and be |
These are of course points to be finalized, however, it should be noted that the choices made here are due to practical considerations made while datasets were being implemented. The following are the explanations:
|
These suggest that we should have utilities to convert to and from numpy at the edges, when reading and writing to a file. But the list representation (in its various flavours) does make things needlessly fiddly when computing with it. For example the corrmat to covmat could be the one liner |
What is the status of this PR? Should the people who implement commondata implement their own utils (which I think is the case now)? In other words, why weren't the so-far implemented commondata not relying on the |
It is my understanding that its review is on To Do list of @enocera , for all the top, jets and dis+jets datasets, they rely on this file being in the validphys folder, i.e. all these datasets do not have their own utilities. I agree, the idea was to base all the new datasets off this branch, but I don't know what happened to that idea 🤔 |
This is not the case for Jets for instance. And for the datasets that rely on the utility files to be on I (and presumably the people who are implementing the Collider DY) am/are confused now on what should be done. |
The new jet datasets do use these (because I implemented them). The old ones by Mark do not. I agree, it would be ideal to branch out from this, and this indeed what was agreed upon. |
I was indeed referring to the re-implementation of the old jets. @enocera, @scarlehoff What should be done about this? |
If the question regards the branching out, branch out of this one if it is convenient. When it is done just point to #1813 in the PR and I will deal with the merging. At a first stage I will only merge the data itself. |
My questions basically touched two main points:
Let me just add that, in the same way as in old implementation, the utils should be at the heart of data implementation. In any case, cc-ing @cschwan, @peterkrack, and @niclaurenti who are also implementing the Collider DY as this might be useful information. |
What is the status of this? Is it still intended to be merged? |
Not sure, last we discussed, @enocera had said he would like to review it, but I am not sure if he had the chance to do so..??
Ideally, yes, because otherwise a lot of new implementations and reimplementations would not work (i.e. their filters would not work) because they rely on these utils. |
Okay, in that case, could you rebase instead of merging master into this? |
Ok sure, I will undo the merge commits and then rebase |
No need to undo the merging. If you just rebase it's fine. Or rather, while rebasing you get that for free anyway. |
If @enocera is busy I could have a look as well some point this week. |
7ed0f6d
to
e8feb46
Compare
that would be helpful |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the filter.py
files have only received the bare minimum of scrutiny (many of them are broken now) and we're long past the stage where I can ask for significant changes anyway, I won't closely review these utils. Reviewing this only made sense if there was the intention of a larger effort to review and fix all the filter.py
files, but there isn't.
I would however like to ask you to move the commondata_utils.py
inside the new_commondata
folder (which @comane is turning into it's own package in #1965), since it more suitable there than inside validphys. I think the tests should be scrapped.
e8feb46
to
d810096
Compare
done! |
Thanks! On second thought, do any of the dataset implementations that are currently in a PR and have not yet been merged use these utils? |
I am not sure, but I don't think so. The most extensive use of the utils was in new (di)jets impl., old and new ttb impl. and all dis+j impl. All of these are merged but their filters would not work without these utils. |
Let's do the following: The function below def covmat_to_artunc(ndata, covmat_list, no_of_norm_mat=0): can go to And the others go to one of the datasets that use them (and the rest can import from that) |
The PR adds utils for the new commondata in validphys such that they can be used without the need to duplicate the utils functions in every filter file.