Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add contours of highest posterior density for 2D KDE plot #1665

Merged
merged 33 commits into from
May 13, 2021

Conversation

wm1995
Copy link
Contributor

@wm1995 wm1995 commented Apr 15, 2021

Description

This pull request is designed to add the option to plot contours of highest posterior density, as discussed in #1651 with @aloctavodia. I think I've got the basic code working, but I'd deeply appreciate any comments. I'll start working my way through the checklist for a complete pull request now.

Update (22/04/21): I think I'm nearly there with the pull request. I've attached a couple of sample plots

HDI contour plot with 1, 2 and 3 sigma contours

This is an example KDE plot of samples from a 2D multivariate normal distribution, with the 1-, 2-, and 3-sigma contours shown - the code is here.

HDI contour pair plot with 30%, 60% and 90% HDI contours

This is an example pair plot of the centered_eight example with 30%, 60% and 90% HDI contours - the code is here.

I tried to test the new feature with the Bokeh backend as well but because of #1673 the test fails.

Update (23/04/21): I've added some example Bokeh plots, they're the same as the matplotlib ones just with the different backend
HDI
bokeh_pair
(These are just the HDI examples, I added a couple more general contour ones as well to mirror those available for matplotlib)

Update (24/04/21): Noticed an error in the Bokeh pair plot with the HDI, so I've quickly fixed that and updated the picture above to reflect it.

Checklist

  • Follows official PR format
  • Includes a sample plot to visually illustrate the changes (only for plot-related functions)
  • New features are properly documented (with an example if appropriate)?
  • Includes new or updated tests to cover the new feature
  • Code style correct (follows pylint and black guidelines)
  • Changes are listed in changelog
  • Ensure pair_plot compatibility with an example
  • Add Bokeh tests and examples

This commit adds a function `_find_hpd_contours` that is designed to
calculate the contour values corresponding to given HPD confidence
levels for `_fast_kde_2d`.
@codecov
Copy link

codecov bot commented Apr 15, 2021

Codecov Report

Merging #1665 (1c496ac) into main (f5c0cf8) will increase coverage by 0.04%.
The diff coverage is 93.54%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1665      +/-   ##
==========================================
+ Coverage   90.84%   90.89%   +0.04%     
==========================================
  Files         108      108              
  Lines       11730    11760      +30     
==========================================
+ Hits        10656    10689      +33     
+ Misses       1074     1071       -3     
Impacted Files Coverage Δ
arviz/stats/density_utils.py 76.98% <81.81%> (+0.15%) ⬆️
arviz/plots/backends/bokeh/pairplot.py 87.50% <100.00%> (+0.06%) ⬆️
arviz/plots/backends/matplotlib/pairplot.py 92.14% <100.00%> (+0.04%) ⬆️
arviz/plots/kdeplot.py 100.00% <100.00%> (ø)
arviz/plots/backends/bokeh/kdeplot.py 96.73% <0.00%> (+3.26%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f5c0cf8...1c496ac. Read the comment docs.

@@ -20,6 +20,7 @@ def plot_kde(
quantiles=None,
rotated=False,
contour=True,
hpd_levels=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All hpd should be changed to hdi, or maybe even to levels directly. Is there any guarantee that those areas are the minimal area with probability X?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed all hpd_levels to levels in plot_kde, and hdi_levels in _find_hdi_contours (which I've also renamed). I was wondering whether hdi_probs might be more appropriate than either levels or hdi_levels, given that it'd be similar to the argument for arviz.hdi?

I think the areas given are as close to minimal area as we can get with a discretised KDE?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on using hdi_probs for consistency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it to hdi_probs in cf10583

Comment on lines 274 to 278
# Need to include 0 & 1 if not included so that contours plot as expected
if 0 not in hpd_levels:
hpd_levels.append(0)
if 1 not in hpd_levels:
hpd_levels.append(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I included 0 and 1 was to ensure that all the contours are filled in correctly (plt.contourf will not fill the highest contour otherwise, and using the fill_last argument to ensure the lowest contour is filled introduces an extraneous contour on the plot that I think outlines all non-zero values)

I've rewritten this section so that it's hopefully clearer that we're including 0 and density.max(), but please let me know if it's not clear and I can rewrite it again (or put some comments explaining the reasoning)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is clear

Comment on lines 1069 to 1072
if hpd_level == 0:
contour_inds[idx] = cum_density.shape[0] - 1
else:
contour_inds[idx] = np.argmax(cum_density >= (1 - hpd_level) * norm)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of prefer the mask approach used in corner.py because I find it easier to understand, I select all the values with cumsum lower than the level then get the last one as level. With argmax I never remember what happens with ties, if the first occurrence or the last one is selected.

Side notes, if we use this approach instead of try except, we shouldn't check for == but use np.isclose. Also, can't cum_density.shape[0] - 1 be directly -1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the approach to reflect the one used in corner.py

arviz/stats/density_utils.py Outdated Show resolved Hide resolved
This commit refactors the argument `hpd_levels` to `levels` in
`plot_kde`. Additionally, it changes all references to "highest
posterior density" in comments and error messages to "highest density
interval".
This commit refactors `hpd_levels` to `hdi_levels` in the function
`_find_hpd_contours`. It additionally replaces all mentions of "highest
probability density" with "highest density interval".
As with the last two commits, this commit removes the remaining
references to "highest posterior density", replacing
`_find_hpd_contours` with `_find_hdi_contours`.
This commit updates `_find_hdi_contours` to use the method that is
employed in the package `corner.py`.
arviz/plots/kdeplot.py Outdated Show resolved Hide resolved
],
)
@pytest.mark.parametrize("contour_sigma", [np.array([1, 2, 3])])
def test_find_hdi_contours(mean, cov, contour_sigma):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried that this unit test is overly complicated, but I wasn't sure how best to approach it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is ok.

This commit will refactor the `levels` argument of `plot_kde()` and the
`hdi_levels` argument of `_find_hdi_contours()` to `hdi_probs` for both,
with the hope that this provides an interface consistent with that given
for `arviz.hdi()`.
This commit passes a copy of the `kde_kwargs` dict when calling the
`plot_kde` method multiple times. If the dictionary is not copied before
being passed, the dictionary is effectively passed by value, which can
lead to unexpected behaviour when multiple calls to `kde_kwargs` are
made.
@wm1995
Copy link
Contributor Author

wm1995 commented Apr 22, 2021

I think I've made all the requested changes and followed the proper format - I'll mark the code as ready for review again

@wm1995 wm1995 marked this pull request as ready for review April 22, 2021 14:34
This commit merges bug fixes to Bokeh's `plot_kde` to facilitate
testing of the new HDI contours feature with the Bokeh backend.
@wm1995 wm1995 marked this pull request as draft April 22, 2021 16:31
@wm1995
Copy link
Contributor Author

wm1995 commented Apr 22, 2021

I'm going to add some tests for the Bokeh backend, given that #1673 has been resolved

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far, adding bokeh and bokeh example counterparts would be great too

@@ -74,6 +75,8 @@ def plot_kde(
contour : bool
If True plot the 2D KDE using contours, otherwise plot a smooth 2D KDE.
Defaults to True.
hdi_probs : list
Confidence levels for highest density (2-dimensional) interval contours of a 2D KDE.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confidence feels strange, in this context, maybe something like "Plots highest density credibility regions for the provided probabilities. Defaults to matplotlib chosen levels with no fixed probability associated"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this in 6f2d912

This commit changes the title of `examples/bokeh/bokeh_plot_kde_2d.py`
to be in line with the corresponding matplotlib example. (With the
inconsistent titles, the title for both the matplotlib and bokeh
examples showed as "2d KDE Bokeh" on the example gallery main page,
which is misleading).
@wm1995 wm1995 marked this pull request as ready for review April 23, 2021 12:00
@wm1995
Copy link
Contributor Author

wm1995 commented Apr 23, 2021

Right, I've tested that it works with Bokeh and added a few Bokeh examples to complement the matplotlib ones, so hopefully it's all good to go now!

This commit will update `plot_pair` to deepcopy `kde_kwargs`, fixing a
bug that caused unexpected behaviour with the kde formatting.
Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great and very thorough, thanks!

I left two final comments as kind of cherry on top

arviz/plots/kdeplot.py Outdated Show resolved Hide resolved
examples/bokeh/bokeh_plot_kde_2d.py Outdated Show resolved Hide resolved
This commit updates `plot_kde` to use `hdi_probs` over `levels` if both
are specified, and warn the user that both have been specified.
This commit adds a couple of new unit tests to verify that the user is
warned when both `levels` and `hdi_probs` have been specified to
`plot_kde`.
@wm1995
Copy link
Contributor Author

wm1995 commented Apr 25, 2021

I don't know whether it would be more appropriate to file a bug report or just add a commit here to fix it but the CI checks are failing because the newly-updated Pylint doesn't like lines 404:405 of arviz/stats/density_utils.py (they're not lines I've touched, but it throws an R1731 consider-using-max-builtin). I'll put in a commit here to fix it, but I think it'll cause checks on other builds to fail too until fixed, so perhaps a bug report would be a better course of action?

This was linked to issues Apr 25, 2021
@wm1995
Copy link
Contributor Author

wm1995 commented Apr 27, 2021

Is there anything else you need me to do for the PR?

@OriolAbril
Copy link
Member

It looks like you now have to merge/rebase to fix the conflicts with main, other than that I think everything is ready. I'll wait for someone else to review and merge.

@wm1995
Copy link
Contributor Author

wm1995 commented Apr 27, 2021

I've resolved the conflict, hopefully the checks will pass without the need for further intervention from me. Let me know if there's anything more you need!

@OriolAbril OriolAbril merged commit 7be29cf into arviz-devs:main May 13, 2021
@OriolAbril
Copy link
Member

Thanks @wm1995 !

@wm1995 wm1995 deleted the hpd-contours branch May 13, 2021 17:31
@wm1995
Copy link
Contributor Author

wm1995 commented May 13, 2021

Thank you!

utkarsh-maheshwari pushed a commit to utkarsh-maheshwari/arviz that referenced this pull request May 27, 2021
…#1665)

* Add a function to calculate HPD regions on 2D KDEs

This commit adds a function `_find_hpd_contours` that is designed to
calculate the contour values corresponding to given HPD confidence
levels for `_fast_kde_2d`.

* Add option to plot HPD regions to `plot_kde`

* Change `hpd_levels` to `levels` in `plot_kde`

This commit refactors the argument `hpd_levels` to `levels` in
`plot_kde`. Additionally, it changes all references to "highest
posterior density" in comments and error messages to "highest density
interval".

* Refactor `hpd_levels` to `hdi_levels` in helper fn

This commit refactors `hpd_levels` to `hdi_levels` in the function
`_find_hpd_contours`. It additionally replaces all mentions of "highest
probability density" with "highest density interval".

* Refactor HDI contour fn to `_find_hdi_contours`

As with the last two commits, this commit removes the remaining
references to "highest posterior density", replacing
`_find_hpd_contours` with `_find_hdi_contours`.

* Update contour-finding to use corner.py algorithm

This commit updates `_find_hdi_contours` to use the method that is
employed in the package `corner.py`.

* Increase clarity in `plot_kde` contour-finding

* Add unit test for `_find_hdi_contours`

* Add unit tests for `levels` keyword of `plot_kde`

* Make minor stylistic changes for pylint compliance

* Update `_find_hdi_contours` docstring

* Update `plot_kde` docstring

* Update new unit tests for mypy compliance

* Refactor `levels` and `hdi_levels` to `hdi_probs`

This commit will refactor the `levels` argument of `plot_kde()` and the
`hdi_levels` argument of `_find_hdi_contours()` to `hdi_probs` for both,
with the hope that this provides an interface consistent with that given
for `arviz.hdi()`.

* Add an example of the HDI contour plot interface

* Add example to `plot_kde` docstring

* Update CHANGELOG.md to add HDI contour feature

* Update `plot_pair` to pass `kde_kwargs` by value

This commit passes a copy of the `kde_kwargs` dict when calling the
`plot_kde` method multiple times. If the dictionary is not copied before
being passed, the dictionary is effectively passed by value, which can
lead to unexpected behaviour when multiple calls to `kde_kwargs` are
made.

* Add example of KDE pair plot with HDI contours

* Update `kde_plot` HDI contour example docstring

* Update docstring for `plot_kde`

* Add tests to catch bug in `plot_kde` with Bokeh

* Update docstring for `hdi_probs` arg in `plot_kde`

* Add unit tests for HDI contours with Bokeh backend

* Update title of existing Bokeh 2d KDE example

This commit changes the title of `examples/bokeh/bokeh_plot_kde_2d.py`
to be in line with the corresponding matplotlib example. (With the
inconsistent titles, the title for both the matplotlib and bokeh
examples showed as "2d KDE Bokeh" on the example gallery main page,
which is misleading).

* Add new examples for 2d KDE plots with Bokeh

* Update Bokeh `plot_pair` to deepcopy `kde_kwargs`

This commit will update `plot_pair` to deepcopy `kde_kwargs`, fixing a
bug that caused unexpected behaviour with the kde formatting.

* Update example 2d KDE plot titles for clarity

* Update `plot_kde` to use `hdi_probs` over `levels`

This commit updates `plot_kde` to use `hdi_probs` over `levels` if both
are specified, and warn the user that both have been specified.

* Add unit tests to test `plot_kde` warnings

This commit adds a couple of new unit tests to verify that the user is
warned when both `levels` and `hdi_probs` have been specified to
`plot_kde`.

* Update code to be Pylint-2.8.1 compliant
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pylint 2.8.1 fails arviz/stats/density_utils.py contour plot
3 participants