Make `pm.sample` return `InferenceData` by default #4372

AlexAndorra · 2020-12-22T18:50:08Z

The return_inferencedata kwarg has been there for a while in pm.sample, without big bugs appearing, so we should add a DeprecationWarning in 3.11, saying that the default format will change to InferenceData in the next major release, aka 4.0.0.
Feel free to comment if you wanna take on this issue 🖖

The text was updated successfully, but these errors were encountered:

chandan5362 · 2021-01-25T16:26:38Z

Hi @AlexAndorra .
As far as I know, currently it returns MultiTrace by default and it raises FutureWarning if no argument is passed to return_inferencedata .
So, Do you mean to change the FutureWarning to DeprecationWarning ?

AlexAndorra · 2021-01-25T17:54:51Z

Hi @chandan5362 ! Yes, exactly. And then actually implementing the default switch in the code -- and see whether our tests suite still passes. You wanna take that on?

chandan5362 · 2021-01-25T20:07:38Z

You wanna take that on?

Yeah! sure, I would love to.
I will only have to change the warning section of the given block.

if return_inferencedata is None:
        v = packaging.version.parse(pm.__version__)
        if v.release[0] > 3 or v.release[1] >= 10:  # type: ignore
            warnings.warn(
                "In an upcoming release, pm.sample will return an `arviz.InferenceData` object instead of a `MultiTrace` by default. "
                "You can pass return_inferencedata=True or return_inferencedata=False to be safe and silence this warning.",
                FutureWarning,
            )
        # set the default
        return_inferencedata = False

am I going good?

AlexAndorra · 2021-01-26T10:02:53Z

Hi @chandan5362,
Yep. Although I was mainly talking about the second part, which is really the crux of the PR: actually implementing the default switch in the code -- and see whether our tests suite still passes

chandan5362 · 2021-01-26T11:50:26Z

Yep. Although I was mainly talking about the second part, which is really the crux of the PR: actually implementing the default switch in the code -- and see whether our tests suite still passes

You meant setting return_inferencedata default to True?

OriolAbril · 2021-01-26T12:29:06Z

I got a little lost in hypothetical changes. TL;DR I think we should start updating the code to return_inferencedata=True as soon as possible and once it is done or mostly done, change the default. I will probably be weird to have a deprecation error and still have plenty of official docs triggering it, even more so to change the default and continue having plenty of docs use MultiTrace.

Yeah, changing the default to True will definitely break the code in several places, maybe not only the tests. It is not clear how much work will it require but it will probably mean going over the whole library updating calls to pm.sample. My guess would be that tests will require a significant amount of job, the library itself should not be too time consuming (if any change is needed at all) and documentation will be also a hard job.

I was thinking that maybe we could already start working on it using rcParams or something like that. As I understand it, the final goal is to remove return_inferencedata and have objects always return inferencedata, so changing everything to use return_inferencedata=True would probably be useful but its not ideal because it requires a 2nd comb of the library on the long run to remove these arguments. However, using an rcParam could reduce the 2nd comb to a line per file or so.

To use an example, updating https://docs.pymc.io/notebooks/multilevel_modeling.html to use inferencedata instead of multitrace required considerable work, and the more specific the notebook the more work it will be (basically everything that differs from az.plot... or az.summary will have to be rewritten!). But eventually it will still require to go through each occurrence of pm.sample and remove return_inferencedata.

Also, these updates will allow us to write moving from MultiTrace to InferenceData guidance. For example, something like this:

idata = pm.sample(...)
trace = idata.posterior

or

idata = pm.sample(...)
trace = idata.posterior.stack(sample=("chain", "draw"))

can make "multitrace code" like trace["variable_name"] still work

chandan5362 · 2021-01-26T18:30:05Z

I totally agree with @OriolAbril .
In fact, many of the test suits have been written considering MultiTrace attributes.
Just for the reference, I am attaching the tests logs here.

============================================================= short test summary info ==============================================================
FAILED pymc3/tests/test_sampling.py::TestSample::test_parallel_sample_does_not_reuse_seed - AttributeError: 'InferenceData' object has no attribu...
FAILED pymc3/tests/test_sampling.py::TestSample::test_parallel_start - AttributeError: 'InferenceData' object has no attribute 'get_values'
FAILED pymc3/tests/test_sampling.py::TestSample::test_sample_tune_len - TypeError: object of type 'InferenceData' has no len()
FAILED pymc3/tests/test_sampling.py::TestSample::test_trace_report[True-NUTS] - AttributeError: 'InferenceData' object has no attribute 'report'
FAILED pymc3/tests/test_sampling.py::TestSample::test_trace_report[True-Metropolis] - AttributeError: 'InferenceData' object has no attribute 're...
FAILED pymc3/tests/test_sampling.py::TestSample::test_trace_report[True-Slice] - AttributeError: 'InferenceData' object has no attribute 'report'
FAILED pymc3/tests/test_sampling.py::TestSample::test_trace_report[False-NUTS] - AttributeError: 'InferenceData' object has no attribute 'report'
FAILED pymc3/tests/test_sampling.py::TestSample::test_trace_report[False-Metropolis] - AttributeError: 'InferenceData' object has no attribute 'r...
FAILED pymc3/tests/test_sampling.py::TestSample::test_trace_report[False-Slice] - AttributeError: 'InferenceData' object has no attribute 'report'
FAILED pymc3/tests/test_sampling.py::TestSample::test_trace_report_bart - AttributeError: 'InferenceData' object has no attribute 'report'
FAILED pymc3/tests/test_sampling.py::TestSample::test_sampler_stat_tune[1] - AttributeError: 'InferenceData' object has no attribute 'get_sampler...
FAILED pymc3/tests/test_sampling.py::TestSample::test_sampler_stat_tune[2] - AttributeError: 'InferenceData' object has no attribute 'get_sampler...
FAILED pymc3/tests/test_sampling.py::TestSample::test_callback_can_cancel - TypeError: object of type 'InferenceData' has no len()
FAILED pymc3/tests/test_sampling.py::TestSamplePPC::test_normal_scalar - AttributeError: 'InferenceData' object has no attribute 'report'
FAILED pymc3/tests/test_sampling.py::TestSamplePPC::test_normal_vector - AttributeError: 'InferenceData' object has no attribute 'nchains'
FAILED pymc3/tests/test_sampling.py::TestSamplePPC::test_model_shared_variable - TypeError: 'InferenceData' object is not subscriptable
FAILED pymc3/tests/test_sampling.py::TestSamplePPC::test_deterministic_of_observed - TypeError: object of type 'InferenceData' has no len()
FAILED pymc3/tests/test_sampling.py::TestSamplePPC::test_deterministic_of_observed_modified_interface - AttributeError: 'InferenceData' object ha...
FAILED pymc3/tests/test_sampling.py::TestSamplePosteriorPredictive::test_point_list_arg_bug_fspp - TypeError: 'InferenceData' object is not subsc...
FAILED pymc3/tests/test_sampling.py::TestSamplePosteriorPredictive::test_point_list_arg_bug_spp - TypeError: 'InferenceData' object is not subscr...
FAILED pymc3/tests/test_sampling.py::TestSamplePosteriorPredictive::test_sample_from_xarray_prior - AttributeError: 'InferenceData' object has no...
FAILED pymc3/tests/test_sampling.py::TestSamplePosteriorPredictive::test_sample_from_xarray_posterior - AttributeError: 'InferenceData' object ha...
FAILED pymc3/tests/test_sampling.py::TestSamplePosteriorPredictive::test_sample_from_xarray_posterior_fast - AttributeError: 'InferenceData' obje...
============================================= 23 failed, 60 passed, 712 warnings in 489.86s (0:08:09) ==============================================

These failures only account for test_sampling.py.
I am expecting multiple test suits failure if I go for entire codebase.

chandan5362 · 2021-01-27T11:41:49Z

what do you say @AlexAndorra ?

AlexAndorra · 2021-01-27T17:57:38Z

Yeah, I agree with @OriolAbril's roadmap 👌 Especially:

I was thinking that maybe we could already start working on it using rcParams or something like that. As I understand it, the final goal is to remove return_inferencedata and have objects always return inferencedata, so changing everything to use return_inferencedata=True would probably be useful but its not ideal because it requires a 2nd comb of the library on the long run to remove these arguments. However, using an rcParam could reduce the 2nd comb to a line per file or so.

If that all sounds clear to you, I think we can start @chandan5362 !

chandan5362 · 2021-01-28T14:28:36Z

Yeah sure @AlexAndorra , we can start working.
But, I am not very much familiar with rcParam and how are we going to use it for our purpose .
May be @OriolAbril could help me out here a little.
Also, from where should we start?

OriolAbril · 2021-01-28T15:21:09Z

Pymc3 does not have rcparams (yet?), if we want to use that instead of setting return_inferencedata=True we can set something up. We can use arviz functions to generate the class so it should not be too much code nor too much work but not sure it is worth it.

Are there other params that could benefit from that? If it's only want it does not make much sense

chandan5362 · 2021-01-28T18:00:48Z

if we want to use that instead of setting return_inferencedata=True we can set something up. We can use arviz functions to generate the class so it should not be too much code nor too much work but not sure it is worth it.

Are you @OriolAbril referring to from_pymc3 function to generate InferenceData?

OriolAbril · 2021-01-28T20:45:40Z

I see 3 main options:

Use return_inferencedata=True: Allows updating everything effective immediate, with the drawback of having to go back to each pm.sample instance at some indeterminate point in the future to remove the argument (I am assuming there will be 2 steps, first setting the default to True and afterwards delete the argument)
Add a pymc3.return_inferencedata to arviz.rcParams. ArviZ is a dependency of PyMC3, so ArviZ rcParams are always available to pymc and can be read to set pymc defaults. Allows updating everything starting with next ArviZ release (which could be a patch release to accelerate releasing this) and would only require one change per file/module (depends on how they are executed). For example, setting it in conf.py will change the default of all ipython/plot directives (not sure there are any though), setting in a notebook will set the default for the whole notebook. Downside, it is kind of dirty and probably confusing for users.
Add a pymc3.rcParams with a return_inferencedata, mcmc.return_inferencedata, sample.return_inferencedata... key. Allows updating as soon as the pymc3.rcParams pr is merged and would also require a change per file/module. Downside, even though class constructors and validators can be imported from ArviZ, it is probably an overkill to add that feature for a single parameter. It may be worth it if there were a desire to add more parameters (not sure what could be added though, tuning method? default number of draws?)

chandan5362 · 2021-01-29T03:02:02Z

@AlexAndorra, what should we do ?

AlexAndorra · 2021-01-29T20:30:31Z

I wonder if option 1 is not the way to go, since it should be somewhat easy to do the change in one go with an IDE 🤔
Also, I don't remember if we're planning to just switch the default return type, or also completely deprecate the MultiArray backend -- do you, @michaelosthege ?

michaelosthege · 2021-01-29T23:18:39Z

@AlexAndorra @chandan5362 Sorry I did not notice that this conversation was going on.

There are 99 occurences of .sample( in our test suite.

This is the option I had in mind so far. 99 occurrences is just fine to modify in a single commit. And taking out the explicit kwarg at some point in the future is also fine.
I don't like this option. Mixing config between libraries is confusing. Also aesara.config would be an alternative and can do contextually changing the setting.
We can use a rcParams (or aesara.ConfigParser) anyways, for example to host the Model(check_bounds=...) setting.

I think we can do both - Option 1 in #4446 and adding pymc3.rcParams or pymc3.config is independent of that.

As much as I would like to get rid of the MultiTrace backend, I don't think we can do this in the near future. Defaulting to return_inferencedata=True comes first and hopefully after re-writing the PyMC3 internals for RandomVariable we can reconsider our trace backend.

chandan5362 · 2021-01-31T13:17:25Z

@AlexAndorra @michaelosthege So, I am proceeding with option 1. let's see how much time does it take?
Also, once we modify the tests. we will have take care of it in the documentations as well. So, How are we planning deal with?

chandan5362 · 2021-02-06T08:41:39Z

Just to get on the pace, How much time do we have to complete this PR before rolling out to the users?

michaelosthege · 2021-02-06T16:22:08Z

@chandan5362 hold on - in our PyMC meeting yesterday we discussed about this topic. We concluded that because PyMC3 4.0 is getting closer, we don't need to switch the default in a PyMC3 3.x version.

However, many tests rely on the current default of return_inferencedata=False and so do some examples in https://github.com/pymc-devs/pymc-examples.

I think that some of them are currently not compatible with return_inferencedata=True and we'll need to find out which of them are.
In order to avoid git conflicts as much as possible, I think the best way for you to contribute on this issue is to go through example notebooks in https://github.com/pymc-devs/pymc_examples and try to refactor them to use return_inferencedata=True as much as possible. This will help to identify limitations and make sure that all central functionality is fully compatible.

Sorry about this back and forth. The Timeline right now is somewhat delicate ;)

chandan5362 · 2021-02-17T06:28:11Z

It's okay @michaelosthege , No problem at all.
Though above links are not working, I will try refactoring the notebooks stored in https://github.com/pymc-devs/pymc-examples .

Also removes some unnecessary XFAIL marks. Closes pymc-devs#4372, pymc-devs#4740

Also removes some unnecessary XFAIL marks. Closes pymc-devs#4372, pymc-devs#4740 Co-authored-by: Oriol Abril <oriol.abril.pla@gmail.com>

Also removes some unnecessary XFAIL marks. Closes #4372, #4740 Co-authored-by: Oriol Abril <oriol.abril.pla@gmail.com>

AlexAndorra added maintenance help wanted labels Dec 22, 2020

AlexAndorra added this to the vNext (3.11.0) milestone Dec 22, 2020

twiecki modified the milestones: vNext (3.11.0), 4.0.0 Dec 22, 2020

AlexAndorra mentioned this issue Dec 22, 2020

Deprecate the use of from_pymc3 outside the model context arviz-devs/arviz#1470

Closed

chandan5362 mentioned this issue Jan 29, 2021

sanity check for return_inferencedata=True #4446

Closed

michaelosthege mentioned this issue Apr 21, 2021

Introduce pm.config or rcParams for global & local config settings #4657

Open

michaelosthege mentioned this issue Jun 5, 2021

Remove InferenceData object warning from v4 #4740

Closed

michaelosthege added a commit to michaelosthege/pymc that referenced this issue Jun 6, 2021

Return InferenceData by default

d6e98da

Also removes some unnecessary XFAIL marks. Closes pymc-devs#4372, pymc-devs#4740

michaelosthege mentioned this issue Jun 6, 2021

Return InferenceData by default #4744

Merged

6 tasks

michaelosthege added a commit to michaelosthege/pymc that referenced this issue Jun 6, 2021

Return InferenceData by default

ca3c236

Also removes some unnecessary XFAIL marks. Closes pymc-devs#4372, pymc-devs#4740

michaelosthege added a commit to michaelosthege/pymc that referenced this issue Jun 7, 2021

Return InferenceData by default

e0d261d

Also removes some unnecessary XFAIL marks. Closes pymc-devs#4372, pymc-devs#4740

michaelosthege added a commit to michaelosthege/pymc that referenced this issue Jun 7, 2021

Return InferenceData by default

a584af1

Also removes some unnecessary XFAIL marks. Closes pymc-devs#4372, pymc-devs#4740 Co-authored-by: Oriol Abril <oriol.abril.pla@gmail.com>

michaelosthege closed this as completed in #4744 Jun 7, 2021

michaelosthege added a commit that referenced this issue Jun 7, 2021

Return InferenceData by default

0923d25

Also removes some unnecessary XFAIL marks. Closes #4372, #4740 Co-authored-by: Oriol Abril <oriol.abril.pla@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `pm.sample` return `InferenceData` by default #4372

Make `pm.sample` return `InferenceData` by default #4372

AlexAndorra commented Dec 22, 2020

chandan5362 commented Jan 25, 2021

AlexAndorra commented Jan 25, 2021

chandan5362 commented Jan 25, 2021

AlexAndorra commented Jan 26, 2021

chandan5362 commented Jan 26, 2021

OriolAbril commented Jan 26, 2021

chandan5362 commented Jan 26, 2021

chandan5362 commented Jan 27, 2021

AlexAndorra commented Jan 27, 2021

chandan5362 commented Jan 28, 2021

OriolAbril commented Jan 28, 2021

chandan5362 commented Jan 28, 2021

OriolAbril commented Jan 28, 2021

chandan5362 commented Jan 29, 2021 •

edited

Loading

AlexAndorra commented Jan 29, 2021

michaelosthege commented Jan 29, 2021

chandan5362 commented Jan 31, 2021

chandan5362 commented Feb 6, 2021

michaelosthege commented Feb 6, 2021 •

edited

Loading

chandan5362 commented Feb 17, 2021

Make pm.sample return InferenceData by default #4372

Make pm.sample return InferenceData by default #4372

Comments

AlexAndorra commented Dec 22, 2020

chandan5362 commented Jan 25, 2021

AlexAndorra commented Jan 25, 2021

chandan5362 commented Jan 25, 2021

AlexAndorra commented Jan 26, 2021

chandan5362 commented Jan 26, 2021

OriolAbril commented Jan 26, 2021

chandan5362 commented Jan 26, 2021

chandan5362 commented Jan 27, 2021

AlexAndorra commented Jan 27, 2021

chandan5362 commented Jan 28, 2021

OriolAbril commented Jan 28, 2021

chandan5362 commented Jan 28, 2021

OriolAbril commented Jan 28, 2021

chandan5362 commented Jan 29, 2021 • edited Loading

AlexAndorra commented Jan 29, 2021

michaelosthege commented Jan 29, 2021

chandan5362 commented Jan 31, 2021

chandan5362 commented Feb 6, 2021

michaelosthege commented Feb 6, 2021 • edited Loading

chandan5362 commented Feb 17, 2021

Make `pm.sample` return `InferenceData` by default #4372

Make `pm.sample` return `InferenceData` by default #4372

chandan5362 commented Jan 29, 2021 •

edited

Loading

michaelosthege commented Feb 6, 2021 •

edited

Loading