[API] `fit` API design - separating data from model specification #23

fkiraly · 2024-09-13T14:29:20Z

Thanks a lot for the interesting presentation in the sktime meetup today!

Following on from the presentation of the interface, I believe you have model.fit(y, x1, x2, ...), where y is the target matrix, and xi are regressors that you regress on different parameters of a parametric distribution, number varying with distribution.

I have a strong, firm opinion that this is not a good interface design, as it mixes model specification into the fit call - namely, the selection of the regressors, which is clearly a modelling choice.

In sklearn-like framework interfaces, it is a strict, nearly absolute API design line that model specification must happen in __init__, and this would also be a major friction point with an skpro interface.

Instead, I would design the interface as follows:

fit has two args, just X and y, and in this sequence (not y first).
same for update(X, y) and predict(X)
there is an additional arg in __init__ (not fit etc), regressors or vars or similar, which is a dict with keys being distribution parameter name, and values being an iterable specifying the columns within X. The default is None , meaning that X used internally for fitting is the same for as the X seen as input to the methods, for all parameters.

I am very very certain about not using the current design, but less (yet still quite) certain about the "optimal" interface here.

If you are worried about specifying cross-effects or inter-dependencies, imo this does not impact the above, as those are modeling choices too, so in the sklearn-like API designs they should go in some __init__ and not in the fit either.

Minor added details, I feel less strongly about those:

I would use predict(X) for the predictive mean, then the estimator can be used as sklearn estimator
and call the current predict instead predict_proba_params or similar

The text was updated successfully, but these errors were encountered:

simon-hirsch · 2024-09-13T18:01:03Z

Thanks for the extensive comment!

I generally like the idea and I see the additional benefit that this generalizes quite well to supporting pandas/polars, as we then pass a dict of int - column names and can use some df.to_numpy() method.

To a certain extent it will also reduce the memory overhead if, as we (externally) don't need multiple X matrices anymore. Internally however we'll need to continue to work with different Gramian matrices for different distribution parameters.

@BerriJ what do you think?

fkiraly · 2024-09-13T19:34:31Z

Internally however we'll need to continue to work with different Gramian matrices for different distribution parameters.

If you have a single numpy array and create two different column subsets in the same scope (without explicitly copying), this will not occupy twice the memory, as it creates a so-called view rather than a memory copy. Thus, handling this internally is doubly advised.

See here: https://numpy.org/doc/stable/user/basics.copies.html

simon-hirsch · 2024-09-14T05:45:04Z

If you have a single numpy array and create two different column subsets in the same scope (without explicitly copying), this will not occupy twice the memory, as it creates a so-called view rather than a memory copy. Thus, handling this internally is doubly advised.

Yes, but for each distribution parameter we need to store the Gramians G and H.

gram_param = X[:, subset].T @ X[:, subset]

where subset is coming from the dict. The gram will occupy memory (thought much less since it is $J \times J$ and not $N \times J$).

fkiraly · 2024-09-14T14:02:35Z

Yes, but for each distribution parameter we need to store the Gramians G and H.

Very silly remark perhaps, but isn't gram_param corresponding to subset not just a submatrix of the full Gramian? Assuming every column is used somewhere, would it not suffice to compute the full Gramian and consider the right submatrices for each distribution param?

simon-hirsch · 2024-09-15T10:09:19Z

Yes,

without weights that's true, but we have the weights needed for the iteratively reweighted least-squares and they're different for each distribution parameter.

So the full specification of the gram is

$G = X^T W \Gamma X$

where $W$ is a diagonal of (estimation) weights and $\Gamma$ is a diagonal of discounting weights. The estimation weights depend on the derivatives of the log-likelihood wrt to the parameter (See the first few pages of the paper for the exact definition). Since at least the weights for $W$ are different for each parameter, we have different Grams.

We could also introduce different forgetting for different distribution parameters, e.g. allowing the volatility to adapt faster than the mean estimation - helpful if we use this for volatility tracking with an intercept only.

fkiraly · 2024-09-15T20:50:33Z

ok, I see - though also in the weighted case I do not see an advantage from passing multiple copies of column-subset X, compared with passing just one X and subsetting or computing the Gramians later.

simon-hirsch · 2024-09-16T05:42:45Z

Yes, that's still a valid point.

simon-hirsch · 2024-09-18T10:14:55Z

@fkiraly your feedback is much appreciated at #25. Thanks!

simon-hirsch · 2024-10-01T12:32:36Z

closed with #25 :)

simon-hirsch changed the title ~~fit API design - separating data from model specification~~ [API] fit API design - separating data from model specification Sep 15, 2024

This was referenced Sep 16, 2024

[API] Are hyperparameters for creating lagged variables necessary? #24

Open

Draft: Refactor API #25

Merged

simon-hirsch closed this as completed Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[API] `fit` API design - separating data from model specification #23

[API] `fit` API design - separating data from model specification #23

fkiraly commented Sep 13, 2024 •

edited

Loading

simon-hirsch commented Sep 13, 2024

fkiraly commented Sep 13, 2024 •

edited

Loading

simon-hirsch commented Sep 14, 2024

fkiraly commented Sep 14, 2024

simon-hirsch commented Sep 15, 2024

fkiraly commented Sep 15, 2024

simon-hirsch commented Sep 16, 2024

simon-hirsch commented Sep 18, 2024

simon-hirsch commented Oct 1, 2024

[API] fit API design - separating data from model specification #23

[API] fit API design - separating data from model specification #23

Comments

fkiraly commented Sep 13, 2024 • edited Loading

simon-hirsch commented Sep 13, 2024

fkiraly commented Sep 13, 2024 • edited Loading

simon-hirsch commented Sep 14, 2024

fkiraly commented Sep 14, 2024

simon-hirsch commented Sep 15, 2024

fkiraly commented Sep 15, 2024

simon-hirsch commented Sep 16, 2024

simon-hirsch commented Sep 18, 2024

simon-hirsch commented Oct 1, 2024

[API] `fit` API design - separating data from model specification #23

[API] `fit` API design - separating data from model specification #23

fkiraly commented Sep 13, 2024 •

edited

Loading

fkiraly commented Sep 13, 2024 •

edited

Loading