-
-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add contours of highest posterior density for 2D KDE plot #1665
Conversation
This commit adds a function `_find_hpd_contours` that is designed to calculate the contour values corresponding to given HPD confidence levels for `_fast_kde_2d`.
Codecov Report
@@ Coverage Diff @@
## main #1665 +/- ##
==========================================
+ Coverage 90.84% 90.89% +0.04%
==========================================
Files 108 108
Lines 11730 11760 +30
==========================================
+ Hits 10656 10689 +33
+ Misses 1074 1071 -3
Continue to review full report at Codecov.
|
arviz/plots/kdeplot.py
Outdated
@@ -20,6 +20,7 @@ def plot_kde( | |||
quantiles=None, | |||
rotated=False, | |||
contour=True, | |||
hpd_levels=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All hpd
should be changed to hdi
, or maybe even to levels
directly. Is there any guarantee that those areas are the minimal area with probability X?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed all hpd_levels
to levels
in plot_kde
, and hdi_levels
in _find_hdi_contours
(which I've also renamed). I was wondering whether hdi_probs
might be more appropriate than either levels
or hdi_levels
, given that it'd be similar to the argument for arviz.hdi
?
I think the areas given are as close to minimal area as we can get with a discretised KDE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree on using hdi_probs for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed it to hdi_probs
in cf10583
arviz/plots/kdeplot.py
Outdated
# Need to include 0 & 1 if not included so that contours plot as expected | ||
if 0 not in hpd_levels: | ||
hpd_levels.append(0) | ||
if 1 not in hpd_levels: | ||
hpd_levels.append(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I included 0 and 1 was to ensure that all the contours are filled in correctly (plt.contourf
will not fill the highest contour otherwise, and using the fill_last
argument to ensure the lowest contour is filled introduces an extraneous contour on the plot that I think outlines all non-zero values)
I've rewritten this section so that it's hopefully clearer that we're including 0 and density.max()
, but please let me know if it's not clear and I can rewrite it again (or put some comments explaining the reasoning)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is clear
arviz/stats/density_utils.py
Outdated
if hpd_level == 0: | ||
contour_inds[idx] = cum_density.shape[0] - 1 | ||
else: | ||
contour_inds[idx] = np.argmax(cum_density >= (1 - hpd_level) * norm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of prefer the mask approach used in corner.py because I find it easier to understand, I select all the values with cumsum lower than the level then get the last one as level. With argmax I never remember what happens with ties, if the first occurrence or the last one is selected.
Side notes, if we use this approach instead of try except, we shouldn't check for ==
but use np.isclose
. Also, can't cum_density.shape[0] - 1
be directly -1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the approach to reflect the one used in corner.py
This commit refactors the argument `hpd_levels` to `levels` in `plot_kde`. Additionally, it changes all references to "highest posterior density" in comments and error messages to "highest density interval".
This commit refactors `hpd_levels` to `hdi_levels` in the function `_find_hpd_contours`. It additionally replaces all mentions of "highest probability density" with "highest density interval".
As with the last two commits, this commit removes the remaining references to "highest posterior density", replacing `_find_hpd_contours` with `_find_hdi_contours`.
This commit updates `_find_hdi_contours` to use the method that is employed in the package `corner.py`.
], | ||
) | ||
@pytest.mark.parametrize("contour_sigma", [np.array([1, 2, 3])]) | ||
def test_find_hdi_contours(mean, cov, contour_sigma): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit worried that this unit test is overly complicated, but I wasn't sure how best to approach it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think is ok.
This commit will refactor the `levels` argument of `plot_kde()` and the `hdi_levels` argument of `_find_hdi_contours()` to `hdi_probs` for both, with the hope that this provides an interface consistent with that given for `arviz.hdi()`.
This commit passes a copy of the `kde_kwargs` dict when calling the `plot_kde` method multiple times. If the dictionary is not copied before being passed, the dictionary is effectively passed by value, which can lead to unexpected behaviour when multiple calls to `kde_kwargs` are made.
I think I've made all the requested changes and followed the proper format - I'll mark the code as ready for review again |
This commit merges bug fixes to Bokeh's `plot_kde` to facilitate testing of the new HDI contours feature with the Bokeh backend.
I'm going to add some tests for the Bokeh backend, given that #1673 has been resolved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far, adding bokeh and bokeh example counterparts would be great too
arviz/plots/kdeplot.py
Outdated
@@ -74,6 +75,8 @@ def plot_kde( | |||
contour : bool | |||
If True plot the 2D KDE using contours, otherwise plot a smooth 2D KDE. | |||
Defaults to True. | |||
hdi_probs : list | |||
Confidence levels for highest density (2-dimensional) interval contours of a 2D KDE. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confidence feels strange, in this context, maybe something like "Plots highest density credibility regions for the provided probabilities. Defaults to matplotlib chosen levels with no fixed probability associated"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated this in 6f2d912
This commit changes the title of `examples/bokeh/bokeh_plot_kde_2d.py` to be in line with the corresponding matplotlib example. (With the inconsistent titles, the title for both the matplotlib and bokeh examples showed as "2d KDE Bokeh" on the example gallery main page, which is misleading).
Right, I've tested that it works with Bokeh and added a few Bokeh examples to complement the matplotlib ones, so hopefully it's all good to go now! |
This commit will update `plot_pair` to deepcopy `kde_kwargs`, fixing a bug that caused unexpected behaviour with the kde formatting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great and very thorough, thanks!
I left two final comments as kind of cherry on top
This commit updates `plot_kde` to use `hdi_probs` over `levels` if both are specified, and warn the user that both have been specified.
This commit adds a couple of new unit tests to verify that the user is warned when both `levels` and `hdi_probs` have been specified to `plot_kde`.
I don't know whether it would be more appropriate to file a bug report or just add a commit here to fix it but the CI checks are failing because the newly-updated Pylint doesn't like lines 404:405 of |
Is there anything else you need me to do for the PR? |
It looks like you now have to merge/rebase to fix the conflicts with |
I've resolved the conflict, hopefully the checks will pass without the need for further intervention from me. Let me know if there's anything more you need! |
Thanks @wm1995 ! |
Thank you! |
…#1665) * Add a function to calculate HPD regions on 2D KDEs This commit adds a function `_find_hpd_contours` that is designed to calculate the contour values corresponding to given HPD confidence levels for `_fast_kde_2d`. * Add option to plot HPD regions to `plot_kde` * Change `hpd_levels` to `levels` in `plot_kde` This commit refactors the argument `hpd_levels` to `levels` in `plot_kde`. Additionally, it changes all references to "highest posterior density" in comments and error messages to "highest density interval". * Refactor `hpd_levels` to `hdi_levels` in helper fn This commit refactors `hpd_levels` to `hdi_levels` in the function `_find_hpd_contours`. It additionally replaces all mentions of "highest probability density" with "highest density interval". * Refactor HDI contour fn to `_find_hdi_contours` As with the last two commits, this commit removes the remaining references to "highest posterior density", replacing `_find_hpd_contours` with `_find_hdi_contours`. * Update contour-finding to use corner.py algorithm This commit updates `_find_hdi_contours` to use the method that is employed in the package `corner.py`. * Increase clarity in `plot_kde` contour-finding * Add unit test for `_find_hdi_contours` * Add unit tests for `levels` keyword of `plot_kde` * Make minor stylistic changes for pylint compliance * Update `_find_hdi_contours` docstring * Update `plot_kde` docstring * Update new unit tests for mypy compliance * Refactor `levels` and `hdi_levels` to `hdi_probs` This commit will refactor the `levels` argument of `plot_kde()` and the `hdi_levels` argument of `_find_hdi_contours()` to `hdi_probs` for both, with the hope that this provides an interface consistent with that given for `arviz.hdi()`. * Add an example of the HDI contour plot interface * Add example to `plot_kde` docstring * Update CHANGELOG.md to add HDI contour feature * Update `plot_pair` to pass `kde_kwargs` by value This commit passes a copy of the `kde_kwargs` dict when calling the `plot_kde` method multiple times. If the dictionary is not copied before being passed, the dictionary is effectively passed by value, which can lead to unexpected behaviour when multiple calls to `kde_kwargs` are made. * Add example of KDE pair plot with HDI contours * Update `kde_plot` HDI contour example docstring * Update docstring for `plot_kde` * Add tests to catch bug in `plot_kde` with Bokeh * Update docstring for `hdi_probs` arg in `plot_kde` * Add unit tests for HDI contours with Bokeh backend * Update title of existing Bokeh 2d KDE example This commit changes the title of `examples/bokeh/bokeh_plot_kde_2d.py` to be in line with the corresponding matplotlib example. (With the inconsistent titles, the title for both the matplotlib and bokeh examples showed as "2d KDE Bokeh" on the example gallery main page, which is misleading). * Add new examples for 2d KDE plots with Bokeh * Update Bokeh `plot_pair` to deepcopy `kde_kwargs` This commit will update `plot_pair` to deepcopy `kde_kwargs`, fixing a bug that caused unexpected behaviour with the kde formatting. * Update example 2d KDE plot titles for clarity * Update `plot_kde` to use `hdi_probs` over `levels` This commit updates `plot_kde` to use `hdi_probs` over `levels` if both are specified, and warn the user that both have been specified. * Add unit tests to test `plot_kde` warnings This commit adds a couple of new unit tests to verify that the user is warned when both `levels` and `hdi_probs` have been specified to `plot_kde`. * Update code to be Pylint-2.8.1 compliant
Description
This pull request is designed to add the option to plot contours of highest posterior density, as discussed in #1651 with @aloctavodia. I think I've got the basic code working, but I'd deeply appreciate any comments. I'll start working my way through the checklist for a complete pull request now.
Update (22/04/21): I think I'm nearly there with the pull request. I've attached a couple of sample plots
This is an example KDE plot of samples from a 2D multivariate normal distribution, with the 1-, 2-, and 3-sigma contours shown - the code is here.
This is an example pair plot of the
centered_eight
example with 30%, 60% and 90% HDI contours - the code is here.I tried to test the new feature with the Bokeh backend as well but because of #1673 the test fails.Update (23/04/21): I've added some example Bokeh plots, they're the same as the matplotlib ones just with the different backend
(These are just the HDI examples, I added a couple more general contour ones as well to mirror those available for matplotlib)
Update (24/04/21): Noticed an error in the Bokeh pair plot with the HDI, so I've quickly fixed that and updated the picture above to reflect it.
Checklist
pair_plot
compatibility with an example