Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diagnostics + Stats to use xarray #501

Closed
ahartikainen opened this issue Jan 5, 2019 · 6 comments
Closed

Diagnostics + Stats to use xarray #501

ahartikainen opened this issue Jan 5, 2019 · 6 comments

Comments

@ahartikainen
Copy link
Contributor

Use InferenceData as a default data-structure for diagnostics and stats. Use dim names to do calculations correctly. (chain,draw).

@ahartikainen
Copy link
Contributor Author

I think most of the functions could eat InferenceData and then use some default (bfmi, etc). We do this already in loo.

What we could change, is that user can define those default values. --> InferenceData, value_name(, group)

@OriolAbril
Copy link
Member

OriolAbril commented Mar 26, 2019

I would like to start working on that @ahartikainen . But I have some questions.

For the loo function it is quite straightforward to use xarray in some more places as the input data is an InferenceData object, whereas many of the others are functions for numpy ndarrays which are applied to xarray objects via apply_ufunc. How should that scheme change? (in case it should change)

I will start working on the loo function and submit a PR to show it is a work in progress. For example:

log_likelihood = inference_data.sample_stats.log_likelihood
n_samples = log_likelihood.chain.size * log_likelihood.draw.size
new_shape = (n_samples,) + log_likelihood.shape[2:]
log_likelihood = log_likelihood.values.reshape(*new_shape)

can be modified to:

log_likelihood = inference_data.sample_stats.log_likelihood.stack(n_samples=('chain','draw')) 

Which poses the question of whether or not to maintain the xarray datatype and modify the psislw to use xarray datatypes instead of numpy arrays, or leave it as it is. I would modify both functions, but I think this affects the library quite deeply, and probably some conventions will need to be chosen.

@ahartikainen
Copy link
Contributor Author

I have done most of the rewrite already for stats+diagnostics, so I think maybe we can come back to this after the PR is ready.

@OriolAbril
Copy link
Member

I checked this a little further, and even though the code may look cleaner, the xarray version runs around 10 times slower in my computer.So I don't think it is worth it. I leave the xarray version (the previous post is already outdated) here just in case:

log_likelihood = inference_data.sample_stats.log_likelihood
log_likelihood = log_likelihood.stack(n_samples=('chain','draw'), data_points=log_likelihood.dims[2:])
n_samples, n_data_points = log_likelihood.shape

Also, fun fact, this will only work on python>=3.6, because from 3.6 on, kwargs maintain their order.

@ahartikainen
Copy link
Contributor Author

Interesting insight.

I do think maybe the cleanest approach is to implement "unit" functions that take in (chain, draw)-shape or (chain*draw)-shape objects, do the function with numpy and then call xarray with ufunc wrapper.

@OriolAbril
Copy link
Member

I gave some more though to this, because I am using the coords information in pointwise elpd plots, thus I need them to be dataarray objects. Therefore there are 2 options, either leave the calculation with arrays like it is done now and once finished convert it back to dataarray, or do the calculations with dataarrays the whole time (either via apply_ufunc or via xarray internals, preferably apply_ufunc to be compatible with arrays).

I started with psislw, and I managed to convert it to a ufunc and apply it to the radon example. I only modified psislw and added the argument check_shape to _multiple_ufunc in make_ufunc to make it compatible with the output shape of psislw. Here is the link to the example. It does not add much overhead to the calculation.

I will try it with logsumexp too. If you have any hints on how to start, they will be really welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants