Lock files #496

Zaharid · 2019-07-01T16:15:44Z

The current validphys (or nnfit or whatever) runcards serve two purposes:

To specify the user input in an easy and quick way, that is hopefully obvious and obviously correct.
To provide a way to reproduce the result by running validphys on the same runcard.

These two are in tension with each other. See discussion in #226. And also e.g. #402 or #495 or #494.

For example if you make all the settings very explicit without any defaults whatsoever (a bit like nnfit and the corresponding settings in vp) it becomes easy to keep backward compatibility but then the user needs to know about a lot of settings (that are mostly completely undocumented at the moment) and they typically get it wrong. Granted wrong in a way that can be checked by someone who knows about the right settings, which is still much better than e.g. plots produced by secret codes. We had a rather spectacular instance of this failure mode recently.

On the other hand, if we make it too magical with default being assigned in a complicated context dependent way, the 90% use case will work fine, but it becomes vert hard to change anything in a way that allows the runcard to keep being a source of truth. Either we never change the defaults (which results in more complicated runcards eventually, when we find they are bugged) or we find a different mechanism for archiving what exactly the runcard means.

The way other similar systems work, such as conda, cargo (the rust package manager) or nodejs is to have a simple to enter input file that once executed produces another file (called lock file) which actually resolves and locks the dependencies to concrete values.

I think we should teach reportengine to do that (I have the issue here for better? visibility). In addition to writing a runcad.yaml file to the input folder, it should write something like runcard.lock.yaml (or a better name, maybe lock.yaml?) which contains everything resolved and is fully reproducible in principle. So if yo write something like:

dataspecs:
   - theoryid: 52
     pdf: NNPDF31_nlo_as_0118

   - theoryid: 53
     pdf: NNPDF31_nnlo_as_0118

data:
   - ATLAS1JET11

actions_:
   - plot_fancy_dataspecs

the lock file should read something like:

dataspecs:
   - theoryid: 52
     pdf: NNPDF31_nlo_as_0118


   - theoryid: 53
     pdf: NNPDF31_nnlo_as_0118


data:
   - ATLAS1JET11


actions_:
   - plot_fancy_dataspecs

#copied from some metadata file; this key overrides the loockup of files containing defaults
data_defaults: 
   - ATLAS1JET11:
        variant: SF
        systype: DEFAULT
        #TODO: define semantics for this
        theory_based_settings:
           - NNLO:
                 cfac: [QCD]

The funny thing is that I have no idea how to do this with reportengine. Good thing we are thinking on rewriting it.

The text was updated successfully, but these errors were encountered:

Zaharid · 2019-07-01T16:25:51Z

Probably it would be easier to copy to the lock file the relevant defaultts files (or keys therein) than to try to reconstruct namespaces that might be generated dynamically in complicated ways. Updated it above.

Zaharid · 2019-07-01T16:36:45Z

@siranipour feel "free" to have a look at this. I guess we want similar semantics for the filters.

Zaharid · 2019-07-02T10:30:11Z

At the end, this discussion is very similar to

https://doc.rust-lang.org/cargo/guide/cargo-toml-vs-cargo-lock.html

wilsonmr · 2019-07-02T13:35:55Z

the example makes sense, the point here is the lock file could be taken as a full input by validphys right?

I feel like this also ties into when users use vp-comparefits since the runcard output of this is pretty opaque - would be nice to see the inputs actually filled in with these files.

Also probably this plays into the proposed direction change with config parser where it becomes a simpler object which takes some input and returns a namespace - at some point the full namespace will be resolved and if dumped to a yaml like file pretty much fulfills the criteria here right?

Zaharid · 2019-07-02T14:45:33Z

the example makes sense, the point here is the lock file could be taken as a full input by validphys right?

Yes, indeed.

I feel like this also ties into when users use vp-comparefits since the runcard output of this is pretty opaque - would be nice to see the inputs actually filled in with these files.

Yeah, that's #402. Of course nothing stops somebody from adding support for vp-comparefits to write what it is doing right now.

Also probably this plays into the proposed direction change with config parser where it becomes a simpler object which takes some input and returns a namespace - at some point the full namespace will be resolved and if dumped to a yaml like file pretty much fulfills the criteria here right?

I am not sure. One thing is that the configparser should care less about say production rules and more about tracking which line of the input caused an error or which is the set of unused keys in a namespace (unused keys typically indicate an error in the runcard). However we want to be able to generate complicated namespaces based on production rules (like he thing that is done to compute intersection of cuts). Note this is not one namespace in total, but potentially many namespaces per action depending on how you use collect and so on.

At the moment I don't imagine how an interface to write these complicated namespaces to a file would work. That is not to say it can't be made, just that I don't envision it (e.g. note that in the example above you would have to write data in each dataspec because the cfactors are different). Instead what I was suggesting was to have a mechanism to resolve defaults that is such that it reads them from some file if some key is not present in the input and otherwise it reads them from input. So in the example instead of trying to rewrite dataspecs we just append the relevant defaults in the end. I can see various problems with that as well such as hoe much of the default specification we should write, or if we can have the concept of versioned or deprecated defaults (like in #224).

siranipour · 2019-11-04T12:40:42Z

Is there a PR for this? A working implementation of this would be really cool/super useful.

Zaharid · 2019-11-04T12:48:41Z

No there isn't. Also, I'd say part of this should go in reportengine.

…

On Mon, 4 Nov 2019, 12:40 siranipour, ***@***.***> wrote: Is there a PR for this? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#496?email_source=notifications&email_token=ABLJWUSAATS6JTDFTHYJEOLQSAJ4XA5CNFSM4H4T6LZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEC7DFTA#issuecomment-549335756>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABLJWUSVSRXJA44GE5G6G43QSAJ4XANCNFSM4H4T6LZA> .

wilsonmr · 2019-11-04T12:53:30Z

Yeah I was supposed to be working on this but got massively sidetracked, I should start doing something

Zaharid · 2019-11-20T14:44:33Z

A possible design is the following; The lock file starts off being just the YAML object we load, essentially a dictionary with the input. The extra information we put in it always originates from a simple key value pair, which does not depend on namespaces or anything like that. So something like:

filter_defaults: "4.0"

which is processed by something innocent looking like

parse_filter_defaults(self, filter_defaults: str): return filter_defaults

The idea is that the config parser (for the moment, but I hate that piece of code even without the extra functionality so it should be reworked at some point) gains understanding of methods of the form

record_filter_defaults(self, processed_filter_defaults):
    #return a dictionary of rules loaded from harcoded path computed based on filter_defautls

The effect that this would have is to create (in the lock file) a mapping called e.g. filter_defaults_recorded_spec_ if it doesn't exists and add a key to it called "4.0" with the actual rules, i.e.

# lock.yaml
# [Content of runcard.yaml]
filter_defaults_recorded_spec_: {
   "4.0":  ... #all the rules in the filter file

}

Then we have something like:

produce_default_filter_rules(self, filter_defaults, filter_defaults_recorded_spec_:(dict, type(None))):
    if filter_defaults_recorded_spec_ is None or filter_defaults not in filter_defaults_recorded_spec_:
        #load rules from hardcoded path
    else:
        #load rules from mapping

I think this should be general enough.

wilsonmr · 2019-11-20T14:53:13Z

I actually opened a branch for this today, will take a look soon

wilsonmr · 2019-11-21T11:40:50Z

should all of these changes happen at the level of report engine? they seem quite generic

Zaharid · 2019-11-21T11:51:31Z

I guess that as far as what I described in the last comment goes, yes.

…

On Thu, 21 Nov 2019, 11:40 wilsonmr, ***@***.***> wrote: should all of these changes happen at the level of report engine? they seem quite generic — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#496?email_source=notifications&email_token=ABLJWURJLLYPVIB3PNNOZOLQUZXUFA5CNFSM4H4T6LZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEZ57UI#issuecomment-557047761>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABLJWUUYRRHNM3FSV5K2HOTQUZXUFANCNFSM4H4T6LZA> .

Zaharid · 2019-11-21T15:20:31Z

Ok, to be factually correct, we want the functionality to interpret things of the form def record_XXX(self, computed_value_of_XXX) as things that need to be added to the lock file to be in reportengine, but we don't want reportengine to know about filters and the like. That is, the code samples in my comment should go in vp in some form.

Zaharid assigned wilsonmr Jul 1, 2019

Zaharid mentioned this issue Jul 1, 2019

Dataset variants #494

Closed

wilsonmr mentioned this issue Jul 18, 2019

Roadmap for experiment API #483

Closed

5 tasks

wilsonmr mentioned this issue Aug 27, 2019

[WIP] Data Keyword #515

Closed

scarlehoff mentioned this issue Aug 29, 2019

Fit runcards should not have innecesary information #536

Closed

This was referenced Sep 20, 2019

Declarative filter rules #439

Merged

Render datasets in documentation #576

Open

siranipour self-assigned this Nov 4, 2019

Zaharid mentioned this issue Nov 19, 2019

Correcting D0EASY rule to include absolute value of central value #613

Merged

wilsonmr mentioned this issue Jan 8, 2020

Lockfiles NNPDF/reportengine#24

Merged

scarlehoff closed this as completed Mar 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lock files #496

Lock files #496

Zaharid commented Jul 1, 2019 •

edited

Loading

Zaharid commented Jul 1, 2019 •

edited

Loading

Zaharid commented Jul 1, 2019

Zaharid commented Jul 2, 2019

wilsonmr commented Jul 2, 2019

Zaharid commented Jul 2, 2019

siranipour commented Nov 4, 2019 •

edited

Loading

Zaharid commented Nov 4, 2019 via email

wilsonmr commented Nov 4, 2019

Zaharid commented Nov 20, 2019

wilsonmr commented Nov 20, 2019

wilsonmr commented Nov 21, 2019

Zaharid commented Nov 21, 2019 via email

Zaharid commented Nov 21, 2019

Lock files #496

Lock files #496

Comments

Zaharid commented Jul 1, 2019 • edited Loading

Zaharid commented Jul 1, 2019 • edited Loading

Zaharid commented Jul 1, 2019

Zaharid commented Jul 2, 2019

wilsonmr commented Jul 2, 2019

Zaharid commented Jul 2, 2019

siranipour commented Nov 4, 2019 • edited Loading

Zaharid commented Nov 4, 2019 via email

wilsonmr commented Nov 4, 2019

Zaharid commented Nov 20, 2019

wilsonmr commented Nov 20, 2019

wilsonmr commented Nov 21, 2019

Zaharid commented Nov 21, 2019 via email

Zaharid commented Nov 21, 2019

Zaharid commented Jul 1, 2019 •

edited

Loading

Zaharid commented Jul 1, 2019 •

edited

Loading

siranipour commented Nov 4, 2019 •

edited

Loading