Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API] fit API design - separating data from model specification #23

Closed
fkiraly opened this issue Sep 13, 2024 · 9 comments
Closed

[API] fit API design - separating data from model specification #23

fkiraly opened this issue Sep 13, 2024 · 9 comments

Comments

@fkiraly
Copy link

fkiraly commented Sep 13, 2024

Thanks a lot for the interesting presentation in the sktime meetup today!

Following on from the presentation of the interface, I believe you have model.fit(y, x1, x2, ...), where y is the target matrix, and xi are regressors that you regress on different parameters of a parametric distribution, number varying with distribution.

I have a strong, firm opinion that this is not a good interface design, as it mixes model specification into the fit call - namely, the selection of the regressors, which is clearly a modelling choice.

In sklearn-like framework interfaces, it is a strict, nearly absolute API design line that model specification must happen in __init__, and this would also be a major friction point with an skpro interface.

Instead, I would design the interface as follows:

  • fit has two args, just X and y, and in this sequence (not y first).
  • same for update(X, y) and predict(X)
  • there is an additional arg in __init__ (not fit etc), regressors or vars or similar, which is a dict with keys being distribution parameter name, and values being an iterable specifying the columns within X. The default is None , meaning that X used internally for fitting is the same for as the X seen as input to the methods, for all parameters.

I am very very certain about not using the current design, but less (yet still quite) certain about the "optimal" interface here.

If you are worried about specifying cross-effects or inter-dependencies, imo this does not impact the above, as those are modeling choices too, so in the sklearn-like API designs they should go in some __init__ and not in the fit either.

Minor added details, I feel less strongly about those:

  • I would use predict(X) for the predictive mean, then the estimator can be used as sklearn estimator
  • and call the current predict instead predict_proba_params or similar
@simon-hirsch
Copy link
Owner

Thanks for the extensive comment!

I generally like the idea and I see the additional benefit that this generalizes quite well to supporting pandas/polars, as we then pass a dict of int - column names and can use some df.to_numpy() method.

To a certain extent it will also reduce the memory overhead if, as we (externally) don't need multiple X matrices anymore. Internally however we'll need to continue to work with different Gramian matrices for different distribution parameters.

@BerriJ what do you think?

@fkiraly
Copy link
Author

fkiraly commented Sep 13, 2024

Internally however we'll need to continue to work with different Gramian matrices for different distribution parameters.

If you have a single numpy array and create two different column subsets in the same scope (without explicitly copying), this will not occupy twice the memory, as it creates a so-called view rather than a memory copy. Thus, handling this internally is doubly advised.

See here: https://numpy.org/doc/stable/user/basics.copies.html

@simon-hirsch
Copy link
Owner

If you have a single numpy array and create two different column subsets in the same scope (without explicitly copying), this will not occupy twice the memory, as it creates a so-called view rather than a memory copy. Thus, handling this internally is doubly advised.

Yes, but for each distribution parameter we need to store the Gramians G and H.

gram_param = X[:, subset].T @ X[:, subset]

where subset is coming from the dict. The gram will occupy memory (thought much less since it is $J \times J$ and not $N \times J$).

@fkiraly
Copy link
Author

fkiraly commented Sep 14, 2024

Yes, but for each distribution parameter we need to store the Gramians G and H.

Very silly remark perhaps, but isn't gram_param corresponding to subset not just a submatrix of the full Gramian? Assuming every column is used somewhere, would it not suffice to compute the full Gramian and consider the right submatrices for each distribution param?

@simon-hirsch
Copy link
Owner

Yes,

without weights that's true, but we have the weights needed for the iteratively reweighted least-squares and they're different for each distribution parameter.

So the full specification of the gram is

$G = X^T W \Gamma X$

where $W$ is a diagonal of (estimation) weights and $\Gamma$ is a diagonal of discounting weights. The estimation weights depend on the derivatives of the log-likelihood wrt to the parameter (See the first few pages of the paper for the exact definition). Since at least the weights for $W$ are different for each parameter, we have different Grams.

We could also introduce different forgetting for different distribution parameters, e.g. allowing the volatility to adapt faster than the mean estimation - helpful if we use this for volatility tracking with an intercept only.

@simon-hirsch simon-hirsch changed the title fit API design - separating data from model specification [API] fit API design - separating data from model specification Sep 15, 2024
@fkiraly
Copy link
Author

fkiraly commented Sep 15, 2024

ok, I see - though also in the weighted case I do not see an advantage from passing multiple copies of column-subset X, compared with passing just one X and subsetting or computing the Gramians later.

@simon-hirsch
Copy link
Owner

Yes, that's still a valid point.

@simon-hirsch
Copy link
Owner

@fkiraly your feedback is much appreciated at #25. Thanks!

@simon-hirsch
Copy link
Owner

closed with #25 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants