Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset variants #494

Closed
Zaharid opened this issue Jul 1, 2019 · 4 comments · Fixed by #1678
Closed

Dataset variants #494

Zaharid opened this issue Jul 1, 2019 · 4 comments · Fixed by #1678
Assignees

Comments

@Zaharid
Copy link
Contributor

Zaharid commented Jul 1, 2019

We have recently been taking the approach of never modifying existing datasets (which I think is a good thing) and instead adding fixed versions of them with different names. For example we have the so called *_SF sets, which stands for "symmetrization fix".

I find this approach to also be problematic. For one I seemed to be the only one at the meeting last week who seemed to know what these things are and that you should use them instead of the default ones. Also, all that might be documented somewhere, but I couldn't find it (admittedly I only spent 30 seconds searching but still).

I think a better approach would be to teach the code to know that there are variants of the various datasets (the variant used in 3.1 the variant that had a bug fixed and so on). And then have ways to warn you if you are using a deprecated variant or to know what the default variants are.

This ties with:

@enocera
Copy link
Contributor

enocera commented Jul 1, 2019

Two comments.

  • Even if you spent more than 30 secs, you wouldn't find any documentation - as documentation is restricted to a couple of lines in the old and new .cc files in buildmaster/filter. This is just to reinforce your statement "I find this approach to be problematic".
  • Default variants will in principle depend on what you're doing. For instance, we consciously decided not to switch to the "new variants" for the theory uncertainty papers. My understanding is that the "new" variants will become "default" variants only when we will seriously start to work on NNPDF4.0.

@Zaharid
Copy link
Contributor Author

Zaharid commented Jul 1, 2019

* Default variants will in principle depend on what you're doing. For instance, we consciously decided not to switch to the "new variants" for the theory uncertainty papers. My understanding is that the "new" variants will become "default" variants only when we will seriously start to work on NNPDF4.0.

There is some discussion on this at #226. Since then I have reached the conclusion that we really want two kinds of files:

  • The valiedphys runcard where you enter things that are filled with defaults in a way that makes sense for the 90% use case. These defaults can change with time. And they can be overridden in the runcard if needed. And you can be warned if you override with strange defaults.

  • A "lock file" that is generated from the runcard, with all the defaults explicitly resolved. The lock file is a valid runcard itself and you can give it to validphys (or nnfit or whatever). That would give always the same result (in terms of physics), with bugs and all.

@Zaharid
Copy link
Contributor Author

Zaharid commented Jul 1, 2019

@wilsonmr I guess the above should be added as an objective for reportengine 1.0.

@Zaharid
Copy link
Contributor Author

Zaharid commented Jul 1, 2019

See #496.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants