Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock files #496

Closed
Zaharid opened this issue Jul 1, 2019 · 13 comments
Closed

Lock files #496

Zaharid opened this issue Jul 1, 2019 · 13 comments
Assignees

Comments

@Zaharid
Copy link
Contributor

Zaharid commented Jul 1, 2019

The current validphys (or nnfit or whatever) runcards serve two purposes:

  • To specify the user input in an easy and quick way, that is hopefully obvious and obviously correct.

  • To provide a way to reproduce the result by running validphys on the same runcard.

These two are in tension with each other. See discussion in #226. And also e.g. #402 or #495 or #494.

For example if you make all the settings very explicit without any defaults whatsoever (a bit like nnfit and the corresponding settings in vp) it becomes easy to keep backward compatibility but then the user needs to know about a lot of settings (that are mostly completely undocumented at the moment) and they typically get it wrong. Granted wrong in a way that can be checked by someone who knows about the right settings, which is still much better than e.g. plots produced by secret codes. We had a rather spectacular instance of this failure mode recently.

On the other hand, if we make it too magical with default being assigned in a complicated context dependent way, the 90% use case will work fine, but it becomes vert hard to change anything in a way that allows the runcard to keep being a source of truth. Either we never change the defaults (which results in more complicated runcards eventually, when we find they are bugged) or we find a different mechanism for archiving what exactly the runcard means.

The way other similar systems work, such as conda, cargo (the rust package manager) or nodejs is to have a simple to enter input file that once executed produces another file (called lock file) which actually resolves and locks the dependencies to concrete values.

I think we should teach reportengine to do that (I have the issue here for better? visibility). In addition to writing a runcad.yaml file to the input folder, it should write something like runcard.lock.yaml (or a better name, maybe lock.yaml?) which contains everything resolved and is fully reproducible in principle. So if yo write something like:

dataspecs:
   - theoryid: 52
     pdf: NNPDF31_nlo_as_0118

   - theoryid: 53
     pdf: NNPDF31_nnlo_as_0118

data:
   - ATLAS1JET11

actions_:
   - plot_fancy_dataspecs

the lock file should read something like:

dataspecs:
   - theoryid: 52
     pdf: NNPDF31_nlo_as_0118


   - theoryid: 53
     pdf: NNPDF31_nnlo_as_0118


data:
   - ATLAS1JET11


actions_:
   - plot_fancy_dataspecs

#copied from some metadata file; this key overrides the loockup of files containing defaults
data_defaults: 
   - ATLAS1JET11:
        variant: SF
        systype: DEFAULT
        #TODO: define semantics for this
        theory_based_settings:
           - NNLO:
                 cfac: [QCD]

The funny thing is that I have no idea how to do this with reportengine. Good thing we are thinking on rewriting it.

@Zaharid
Copy link
Contributor Author

Zaharid commented Jul 1, 2019

Probably it would be easier to copy to the lock file the relevant defaultts files (or keys therein) than to try to reconstruct namespaces that might be generated dynamically in complicated ways. Updated it above.

@Zaharid
Copy link
Contributor Author

Zaharid commented Jul 1, 2019

@siranipour feel "free" to have a look at this. I guess we want similar semantics for the filters.

@Zaharid
Copy link
Contributor Author

Zaharid commented Jul 2, 2019

At the end, this discussion is very similar to

https://doc.rust-lang.org/cargo/guide/cargo-toml-vs-cargo-lock.html

@wilsonmr
Copy link
Contributor

wilsonmr commented Jul 2, 2019

the example makes sense, the point here is the lock file could be taken as a full input by validphys right?

I feel like this also ties into when users use vp-comparefits since the runcard output of this is pretty opaque - would be nice to see the inputs actually filled in with these files.

Also probably this plays into the proposed direction change with config parser where it becomes a simpler object which takes some input and returns a namespace - at some point the full namespace will be resolved and if dumped to a yaml like file pretty much fulfills the criteria here right?

@Zaharid
Copy link
Contributor Author

Zaharid commented Jul 2, 2019

the example makes sense, the point here is the lock file could be taken as a full input by validphys right?

Yes, indeed.

I feel like this also ties into when users use vp-comparefits since the runcard output of this is pretty opaque - would be nice to see the inputs actually filled in with these files.

Yeah, that's #402. Of course nothing stops somebody from adding support for vp-comparefits to write what it is doing right now.

Also probably this plays into the proposed direction change with config parser where it becomes a simpler object which takes some input and returns a namespace - at some point the full namespace will be resolved and if dumped to a yaml like file pretty much fulfills the criteria here right?

I am not sure. One thing is that the configparser should care less about say production rules and more about tracking which line of the input caused an error or which is the set of unused keys in a namespace (unused keys typically indicate an error in the runcard). However we want to be able to generate complicated namespaces based on production rules (like he thing that is done to compute intersection of cuts). Note this is not one namespace in total, but potentially many namespaces per action depending on how you use collect and so on.

At the moment I don't imagine how an interface to write these complicated namespaces to a file would work. That is not to say it can't be made, just that I don't envision it (e.g. note that in the example above you would have to write data in each dataspec because the cfactors are different). Instead what I was suggesting was to have a mechanism to resolve defaults that is such that it reads them from some file if some key is not present in the input and otherwise it reads them from input. So in the example instead of trying to rewrite dataspecs we just append the relevant defaults in the end. I can see various problems with that as well such as hoe much of the default specification we should write, or if we can have the concept of versioned or deprecated defaults (like in #224).

@siranipour
Copy link
Contributor

siranipour commented Nov 4, 2019

Is there a PR for this? A working implementation of this would be really cool/super useful.

@Zaharid
Copy link
Contributor Author

Zaharid commented Nov 4, 2019 via email

@wilsonmr
Copy link
Contributor

wilsonmr commented Nov 4, 2019

Yeah I was supposed to be working on this but got massively sidetracked, I should start doing something

@Zaharid
Copy link
Contributor Author

Zaharid commented Nov 20, 2019

A possible design is the following; The lock file starts off being just the YAML object we load, essentially a dictionary with the input. The extra information we put in it always originates from a simple key value pair, which does not depend on namespaces or anything like that. So something like:

filter_defaults: "4.0"

which is processed by something innocent looking like

parse_filter_defaults(self, filter_defaults: str): return filter_defaults

The idea is that the config parser (for the moment, but I hate that piece of code even without the extra functionality so it should be reworked at some point) gains understanding of methods of the form

record_filter_defaults(self, processed_filter_defaults):
    #return a dictionary of rules loaded from harcoded path computed based on filter_defautls

The effect that this would have is to create (in the lock file) a mapping called e.g. filter_defaults_recorded_spec_ if it doesn't exists and add a key to it called "4.0" with the actual rules, i.e.

# lock.yaml
# [Content of runcard.yaml]
filter_defaults_recorded_spec_: {
   "4.0":  ... #all the rules in the filter file

}

Then we have something like:

produce_default_filter_rules(self, filter_defaults, filter_defaults_recorded_spec_:(dict, type(None))):
    if filter_defaults_recorded_spec_ is None or filter_defaults not in filter_defaults_recorded_spec_:
        #load rules from hardcoded path
    else:
        #load rules from mapping

I think this should be general enough.

@wilsonmr
Copy link
Contributor

I actually opened a branch for this today, will take a look soon

@wilsonmr
Copy link
Contributor

should all of these changes happen at the level of report engine? they seem quite generic

@Zaharid
Copy link
Contributor Author

Zaharid commented Nov 21, 2019 via email

@Zaharid
Copy link
Contributor Author

Zaharid commented Nov 21, 2019

Ok, to be factually correct, we want the functionality to interpret things of the form def record_XXX(self, computed_value_of_XXX) as things that need to be added to the lock file to be in reportengine, but we don't want reportengine to know about filters and the like. That is, the code samples in my comment should go in vp in some form.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants