-
Notifications
You must be signed in to change notification settings - Fork 248
Add beta distribution #391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add beta distribution #391
Conversation
self.log_a = params[0] | ||
self.log_b = params[1] | ||
self.a = np.exp(params[0]) # since params[0] is log(a) | ||
self.b = np.exp(params[1]) # since params[1] is log(b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might need to introduce clipping here because sometimes the algorithm overflows and sets value a or b to 0.
@BaerVervergaert can you merge master into your PR when you have time? That way we can test 3.13 as well |
Implements the Beta distribution for NGBoost. | ||
|
||
The Beta distribution has two parameters, a and b. | ||
The scipy loc and scale parameters are held constant for this implementation. | ||
LogScore is supported for the Beta distribution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""
Implements the Beta distribution for NGBoost.
The Beta distribution is defined on the interval [0, 1] and is parameterized
by two shape parameters a > 0 and b > 0. The distribution is useful for
modeling bounded continuous data, such as proportions, probabilities, or
normalized measurements.
Parameters
----------
params : array-like, shape (n_samples, 2)
Array containing the distribution parameters in log space:
- params[:, 0]: log(a) - first shape parameter
- params[:, 1]: log(b) - second shape parameter
Attributes
----------
a : array-like, shape (n_samples,)
First shape parameter (a > 0), obtained by exponentiating log_a
b : array-like, shape (n_samples,)
Second shape parameter (b > 0), obtained by exponentiating log_b
dist : scipy.stats.beta
Scipy beta distribution object for sampling and PDF calculations
log_a : array-like, shape (n_samples,)
Log of the first shape parameter
log_b : array-like, shape (n_samples,)
Log of the second shape parameter
""" | ||
|
||
n_params = 2 | ||
scores = [BetaLogScore] # will implement this later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add CRPSScore to be consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look at it
def __init__(self, params): | ||
self._params = params | ||
|
||
# create other objects that will be useful later | ||
self.log_a = params[0] | ||
self.log_b = params[1] | ||
self.a = np.exp(params[0]) # since params[0] is log(a) | ||
self.b = np.exp(params[1]) # since params[1] is log(b) | ||
self.dist = dist(a=self.a, b=self.b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this might help here
"""
Initialize Beta distribution with parameters.
Parameters
----------
params : array-like, shape (n_samples, 2)
Array containing log(a) and log(b) parameters
Raises
------
ValueError
If params has wrong shape, contains NaN/Inf values, or results in
non-positive shape parameters
"""
# Validate input shape
if len(params) != 2:
raise ValueError(
f"Beta distribution requires exactly 2 parameters, got {len(params)}"
)
# Validate parameter values
if np.any(np.isnan(params)) or np.any(np.isinf(params)):
raise ValueError(
"Invalid parameters: NaN or Inf values detected. "
"Parameters must be finite numbers."
)
# Store parameters
self._params = params
self.log_a = params[0]
self.log_b = params[1]
# Convert to shape parameters
self.a = np.exp(params[0])
self.b = np.exp(params[1])
# Validate resulting shape parameters
if np.any(self.a <= 0) or np.any(self.b <= 0):
raise ValueError(
"Beta distribution requires positive shape parameters. "
f"Got a={self.a}, b={self.b}"
)
# Create scipy distribution object
self.dist = dist(a=self.a, b=self.b)```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback :)
It's much appreciated.
Added the Beta distribution (scipy.stats.beta) with loc fixed at zero and scale fixed at one.