Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle degenerate Beta distribution cases #1881

Closed
wants to merge 2 commits into from

Conversation

quildtide
Copy link
Contributor

@quildtide quildtide commented Aug 8, 2024

Combined with JuliaStats/StatsFuns.jl#167 , this pull request will significantly improve handling of degenerate beta distributions.

These degenerate beta distributions are the following:

  • Beta(Inf, b) is Dirac(1)
  • Beta(a, Inf) is Dirac(0)
  • Beta(Inf, Inf) is Dirac(0.5)

There are two other degenerate beta distributions that can occur when alpha or beta are 0, but Distributions.jl does not currently allow these to be constructed normally. I have not added checks for these cases. My priority on this pull request is to fix inaccurate behavior for things that are allegedly supported by Distributions.jl, not to add even more allegedly-supported edge cases.

This pull request is not directly dependent on JuliaStats/StatsFuns.jl#167; all tests in this pull request should pass regardless of whether StatsFuns.jl is accepted. However, functions like cdf, pdf, quantile will continue to be inaccurate until then. median will continue to hang on median(Beta(Inf, 1) because it is dependent on quantile, but it will work (via generic fallback) fine once the changes to StatsFuns are made.

Between these two pull requests, degenerate Beta distribution cases will behave more or less identically to corresponding Dirac distributions, EXCEPT with one notable difference; pdf(Dirac(1), 1) == 1 because it is a pmf (as we treat Dirac as discrete) while pdf(Beta(Inf, 1), 1) == NaN because it is, by definition, a value which integrates to 1 over a 0-length range.

Exceptions

I don't work with KL divergence enough to properly comprehend implementing support for degenerate beta distributions. Someone else can figure this one out.

Performance Regressions

Between this and JuliaStats/StatsFuns.jl#167, I expect almost every single method involving the Beta distribution to experience performance regressions. I believe this is worth it, however, because the status quo often returns inaccurate results or causes hangs (e.g. median(Beta(Inf, 1)).

I have tried to minimize the performance regressions, but we're going from things like minimum(::Beta) = 0 (which gets compiled out to just a constant if the type is known) to multiple branches (as there are 3 distinct degenerate cases). These are necessary sacrifices for accurate behavior.

Tests

I have placed new tests in a new file, degenerate.jl. There are many other degenerate distributions which are allegedly supported by Distributions.jl currently, but currently have many issues (e.g. Bernoulli(0), Bernoulli(1), Binomial(n, 0), Binomial(n, 1). I fully intend to add more tests to that file in the future.

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 93.10345% with 4 lines in your changes missing coverage. Please review.

Project coverage is 86.09%. Comparing base (47c040b) to head (8404b33).

Files Patch % Lines
src/univariate/continuous/beta.jl 93.10% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1881      +/-   ##
==========================================
+ Coverage   85.99%   86.09%   +0.10%     
==========================================
  Files         144      144              
  Lines        8666     8712      +46     
==========================================
+ Hits         7452     7501      +49     
+ Misses       1214     1211       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut feeling is that it would be much simpler to ensure in the constructor that alpha and beta are finite positive numbers.

@quildtide
Copy link
Contributor Author

quildtide commented Aug 8, 2024

My gut feeling is that it would be much simpler to ensure in the constructor that alpha and beta are finite positive numbers.

This is true. On the other hand, that would also be a breaking change.

Additionally, an argument might emerge that Beta(0,b) and Beta(a, 0) are worth supporting, as they can occasionally show up in Bayesian inference. It happens that StatsFuns.jl already supports these two cases (your work, actually). There's also Beta(0,0), or Haldane's prior, which apparently behaves like Bernoulli(0.5).

If we do eventually support Beta(0,b) and Beta(a, 0) and alter methods like minimum/maximum to handle them correctly, then the overhead for supporting Beta(Inf, b) , Beta(a, Inf), and Beta(Inf, Inf) will not be that significant. I suppose that also means that performance of StatsFuns functions like quantile and cdf are not impacted that much by this pull req; it's more things like minimum and maximum, which are no longer constant type-wide.

On the StatsFuns.jl side, it has been pointed out by @andreasnoack that Beta(Inf, Inf) is only Dirac(0.5) if the constraint that α = β is observed. However, R does treat Beta(Inf, Inf) as Dirac(0.5). R handles the following edge cases:

  • Beta(0, b) = Dirac(0)
  • Beta(a, 0) = Dirac(1)
  • Beta(Inf, b) = Dirac(1)
  • Beta(a, Inf) = Dirac(0)
  • Beta(Inf, Inf) = Dirac(0.5)
  • Beta(0, 0) = Bernoulli(0,0)

There is one major disagreement between the R implementation and my implementation. I believe that degenerate distributions should have support at only one point (i.e. support(Beta(Inf, 1)) = [1, 1]) while R maintains the normal support. If we go with the R route, it'll actually make the performance implications mostly negligible.

But I personally think that minimum(Bernoulli(1)) == 0 and minimum(Normal(0,0)) == -Inf are incorrect and should be amended, even if it causes performance regressions.

@devmotion
Copy link
Member

On the other hand, that would also be a breaking change.

It might be technically breaking but IMO it can be argued that it's not actually breaking since these cases weren't handled correctly anyway.

@quildtide quildtide closed this Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants