Roadmap for experiment API #483

scarrazza · 2019-06-16T13:34:15Z

Following the discussion in #476 the next steps require at least:

Integration with validphys

replace CommonData object calls with the new dataframe
implement conversion mapping between experiments and the new data key.

New features

implement filtering
implement artificial replica generation
implement covariance matrix computation

The text was updated successfully, but these errors were encountered:

Zaharid · 2019-06-18T10:45:07Z

These all seem pretty sensible and uncontroversial. It would be good to have some of this done for the meeting so we can focus on the stuff that actually needs to be discussed, like e.g. a more robust defaults system.

wilsonmr · 2019-07-18T11:53:59Z

My understanding is that data can at least be thought of as a flat list of datasets - correct me if I'm wrong

For computing total chi2 it is clear that sum_group chi2^(group) is not necessarily equal to total chi2 unless the covariance matrix arranged by groups is block diagonal. In the most extreme case - like the fits with theory covariance the correct total chi2 is only given by taking all of data and calculating chi2 on it. Can anyone think of a reasonable example where calculating total chi2 on data would give the wrong answer?

If not then surely config.py can be modified such that experiments key is read from runcard, user is warned that experiments key is deprecated and the essentially two level runcard list of experiments is flattened and then parsed as data, then we have backwards compatibility.

Then the grouping could be specified manually with similar syntax as the experiments, but slightly simpler - since the datset input was already specified in data, so only names are required:

data: fromfit
grouping:
 - name: group1
    datasets:
     - datasetname 1
     - datasetname 2

alternatively we have the groupby option which then allows you to groupby a metadata key:

data: fromfit
grouping:
    groupby: nnpdf31_process

the groupby is expanded (and later stored in lock file #496) and the grouping is a namespace list of groups (maybe there would need to be a GroupSpec which probably shares a lot in common with ExperimentSpec?) which in turn are namespace lists of datasets

In this way the experiments key gets nicely deprecated whilst retaining backward compatibility and the arbitrary group of datasets is easier to understand (than the current hack of using ExperimentSpec).

Zaharid · 2019-07-18T13:25:54Z

My understanding is that data can at least be thought of as a flat list of datasets - correct me if I'm wrong

For computing total chi2 it is clear that sum_group chi2^(group) is not necessarily equal to total chi2 unless the covariance matrix arranged by groups is block diagonal. In the most extreme case - like the fits with theory covariance the correct total chi2 is only given by taking all of data and calculating chi2 on it. Can anyone think of a reasonable example where calculating total chi2 on data would give the wrong answer?

The total chi2 on data is what we typically want. If you neglect the correlations, you can get very different results (and you cannot in general approximate the sum to any reasonable precision). This is discussed a bit in the alpha_s paper.

https://arxiv.org/abs/1802.03398

If not then surely config.py can be modified such that experiments key is read from runcard, user is warned that experiments key is deprecated and the essentially two level runcard list of experiments is flattened and then parsed as data, then we have backwards compatibility.

I agree with this, but I don't really see the relation with the previous point?

Then the grouping could be specified manually with similar syntax as the experiments, but slightly simpler - since the datset input was already specified in data, so only names are required:
data: fromfit
grouping:
 - name: group1
    datasets:
     - datasetname 1
     - datasetname 2

That is what we had for alpha_s. See
https://vp.nnpdf.science/6YsazxTxRIibUCb24zOFDA==/input/runcard.yaml
It would be good to have this for quick tests and speculative things, but once you get into "production" it really is a pain to write the same grouping each time, and even more of a pain that it has to change slightly and then you have to make sure you are copying from the right place.

alternatively we have the groupby option which then allows you to groupby a metadata key:
data: fromfit
grouping:
    groupby: nnpdf31_process

That is what we should have done from the start.

the groupby is expanded (and later stored in lock file #496) and the grouping is a namespace list of groups (maybe there would need to be a GroupSpec which probably shares a lot in common with ExperimentSpec?) which in turn are namespace lists of datasets

In this way the experiments key gets nicely deprecated whilst retaining backward compatibility and the arbitrary group of datasets is easier to understand (than the current hack of using ExperimentSpec).

wilsonmr · 2019-07-18T13:36:27Z

If not then surely config.py can be modified such that experiments key is read from runcard, user is warned that experiments key is deprecated and the essentially two level runcard list of experiments is flattened and then parsed as data, then we have backwards compatibility.

I agree with this, but I don't really see the relation with the previous point?

Well I was checking I wasn't missing something, because the checkpoint:

implement conversion mapping between experiments and the new data key.

is basically trivial as def parse_experiments(self, exps): -> data

alternatively we have the groupby option which then allows you to groupby a metadata key:
data: fromfit
grouping:
    groupby: nnpdf31_process
That is what we should have done from the start.

Of course. Both options should be available.. I'm just thinking it would be nice to make some progress with this. From what I can tell, the data keyword can be done completely in parallel with the #476 since that now just refers to defining what the common data object is and how it is loaded rather than how we group it

Zaharid · 2019-07-18T14:47:54Z

Well yes, progress here has been slow and would be welcome. Ideally we should have had this before merging #499 but I already gave up on that.

Zaharid · 2021-03-26T13:10:50Z

Going to close this as we have more up to date issues for discussing it.

scarrazza assigned Zaharid, scarrazza, scarlehoff and wilsonmr Jun 16, 2019

Zaharid mentioned this issue Jan 9, 2020

Incorporating general theory covmats into nnpdf code #628

Closed

Zaharid closed this as completed Mar 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap for experiment API #483

Roadmap for experiment API #483

scarrazza commented Jun 16, 2019

Zaharid commented Jun 18, 2019

wilsonmr commented Jul 18, 2019

Zaharid commented Jul 18, 2019

wilsonmr commented Jul 18, 2019 •

edited

Loading

Zaharid commented Jul 18, 2019

Zaharid commented Mar 26, 2021

Roadmap for experiment API #483

Roadmap for experiment API #483

Comments

scarrazza commented Jun 16, 2019

Zaharid commented Jun 18, 2019

wilsonmr commented Jul 18, 2019

Zaharid commented Jul 18, 2019

wilsonmr commented Jul 18, 2019 • edited Loading

Zaharid commented Jul 18, 2019

Zaharid commented Mar 26, 2021

wilsonmr commented Jul 18, 2019 •

edited

Loading