Skip to content

Conversation

BaerVervergaert
Copy link

Added the Beta distribution (scipy.stats.beta) with loc fixed at zero and scale fixed at one.

  • The Beta distribution has support for LogScore
  • Added Beta LogScore test_beta test

self.log_a = params[0]
self.log_b = params[1]
self.a = np.exp(params[0]) # since params[0] is log(a)
self.b = np.exp(params[1]) # since params[1] is log(b)
Copy link
Author

@BaerVervergaert BaerVervergaert Jul 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need to introduce clipping here because sometimes the algorithm overflows and sets value a or b to 0.

@alejandroschuler alejandroschuler self-requested a review July 7, 2025 16:49
@ryan-wolbeck ryan-wolbeck self-requested a review July 14, 2025 18:34
@ryan-wolbeck
Copy link
Collaborator

@BaerVervergaert can you merge master into your PR when you have time? That way we can test 3.13 as well

Comment on lines +39 to +43
Implements the Beta distribution for NGBoost.

The Beta distribution has two parameters, a and b.
The scipy loc and scale parameters are held constant for this implementation.
LogScore is supported for the Beta distribution.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


    """
    Implements the Beta distribution for NGBoost.
    
    The Beta distribution is defined on the interval [0, 1] and is parameterized
    by two shape parameters a > 0 and b > 0. The distribution is useful for
    modeling bounded continuous data, such as proportions, probabilities, or
    normalized measurements.
    
    Parameters
    ----------
    params : array-like, shape (n_samples, 2)
        Array containing the distribution parameters in log space:
        - params[:, 0]: log(a) - first shape parameter
        - params[:, 1]: log(b) - second shape parameter
    
    Attributes
    ----------
    a : array-like, shape (n_samples,)
        First shape parameter (a > 0), obtained by exponentiating log_a
    b : array-like, shape (n_samples,)
        Second shape parameter (b > 0), obtained by exponentiating log_b
    dist : scipy.stats.beta
        Scipy beta distribution object for sampling and PDF calculations
    log_a : array-like, shape (n_samples,)
        Log of the first shape parameter
    log_b : array-like, shape (n_samples,)
        Log of the second shape parameter

"""

n_params = 2
scores = [BetaLogScore] # will implement this later
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add CRPSScore to be consistent?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look at it

Comment on lines +50 to +58
def __init__(self, params):
self._params = params

# create other objects that will be useful later
self.log_a = params[0]
self.log_b = params[1]
self.a = np.exp(params[0]) # since params[0] is log(a)
self.b = np.exp(params[1]) # since params[1] is log(b)
self.dist = dist(a=self.a, b=self.b)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this might help here

    """
    Initialize Beta distribution with parameters.
    
    Parameters
    ----------
    params : array-like, shape (n_samples, 2)
        Array containing log(a) and log(b) parameters
        
    Raises
    ------
    ValueError
        If params has wrong shape, contains NaN/Inf values, or results in
        non-positive shape parameters
    """
    # Validate input shape
    if len(params) != 2:
        raise ValueError(
            f"Beta distribution requires exactly 2 parameters, got {len(params)}"
        )
    
    # Validate parameter values
    if np.any(np.isnan(params)) or np.any(np.isinf(params)):
        raise ValueError(
            "Invalid parameters: NaN or Inf values detected. "
            "Parameters must be finite numbers."
        )
    
    # Store parameters
    self._params = params
    self.log_a = params[0]
    self.log_b = params[1]
    
    # Convert to shape parameters
    self.a = np.exp(params[0])
    self.b = np.exp(params[1])
    
    # Validate resulting shape parameters
    if np.any(self.a <= 0) or np.any(self.b <= 0):
        raise ValueError(
            "Beta distribution requires positive shape parameters. "
            f"Got a={self.a}, b={self.b}"
        )
    
    # Create scipy distribution object
    self.dist = dist(a=self.a, b=self.b)```

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback :)

It's much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants