Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MNT Sklearn1.6 compatibility #447

Merged
merged 46 commits into from
Dec 2, 2024

Conversation

TamaraAtanasoska
Copy link
Contributor

@TamaraAtanasoska TamaraAtanasoska commented Nov 4, 2024

Reference Issues/PRs

Fixes #443. Not fully working yet.

What does this implement/fix? Explain your changes.

A few of the compatibility errors, including the one explained in #443 are now fixed. There are some external failures(errors happening outside of skops) that I am not sure how to fix. @adrinjalali any tips? Below are the remaining failures:

FAILED skops/io/tests/test_persist.py::test_can_persist_fitted[GraphicalLassoCV(cv=3,max_iter=5)] - FloatingPointError: Non SPD result: the system is too ill-conditioned for this solver. The system is too ill-conditioned for this ...
FAILED skops/io/tests/test_persist.py::test_can_persist_fitted[PassiveAggressiveClassifier(max_iter=5)] - AssertionError
FAILED skops/io/tests/test_persist.py::test_can_persist_fitted[SGDClassifier(max_iter=5)] - AssertionError
FAILED skops/io/tests/test_persist.py::test_can_persist_fitted[SGDOneClassSVM(max_iter=5)] - AssertionError

Any other comments?

First PR to the project, the fixes might be off.
I understood #443 as I can remove all SGD stuff from the _sklearn.py file when working with sklearn 1.6+, tests pass like that, I am just not sure if the change is supposed to be so severe?

@TamaraAtanasoska TamaraAtanasoska changed the title Sklearn1.6 compatibility MNT Sklearn1.6 compatibility Nov 4, 2024
pyproject.toml Outdated Show resolved Hide resolved
skops/io/_sklearn.py Outdated Show resolved Hide resolved
skops/io/_sklearn.py Outdated Show resolved Hide resolved
skops/io/_sklearn.py Outdated Show resolved Hide resolved
skops/io/tests/test_persist.py Outdated Show resolved Hide resolved
TamaraAtanasoska and others added 5 commits November 11, 2024 12:20
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
@TamaraAtanasoska
Copy link
Contributor Author

TamaraAtanasoska commented Nov 11, 2024

Issue in quantile_forest made: zillow/quantile-forest#103

@TamaraAtanasoska

This comment was marked as outdated.

@adrinjalali
Copy link
Member

Can you get the output with pytest -l to see all the local variables on the stack trace?

skops/io/_sklearn.py Outdated Show resolved Hide resolved
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
@TamaraAtanasoska
Copy link
Contributor Author

Can you get the output with pytest -l to see all the local variables on the stack trace?

The issue is about versioning again, taking care of separating the SGD stuff by sklearn version would most likely solve these last errors. I will look into it today, seems a bit more complex in this case then the rest, at least at first glance. I will ping with questions.

@TamaraAtanasoska TamaraAtanasoska marked this pull request as draft November 18, 2024 14:26
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would pass tests once scikit-learn/scikit-learn#30356 is merged in sklearn

@@ -46,8 +46,7 @@ def check_file_size() -> None:
set_random_state(estimator, random_state=0)

X, y = get_input(estimator)
tags = _safe_tags(estimator)
if tags.get("requires_fit", True):
if get_tags(estimator).requires_fit:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scikit-learn now uses dataclasses for tags, which are emulated here for old sklearn as well

Comment on lines -18 to -20
Huber,
Log,
LossFunction,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scikit-learn has moved to a more central place for loss functions / classes, and therefore the ones which are used between estimators are removed from the sgd specific file.

Comment on lines +54 to +67
from sklearn._loss._loss import (
CyAbsoluteError,
CyExponentialLoss,
CyHalfBinomialLoss,
CyHalfGammaLoss,
CyHalfMultinomialLoss,
CyHalfPoissonLoss,
CyHalfSquaredError,
CyHalfTweedieLoss,
CyHalfTweedieLossIdentity,
CyHuberLoss,
CyPinballLoss,
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are the new loss classes

@@ -44,36 +44,38 @@ def _is_steps_like(obj):
return True


def _assert_generic_objects_equal(val1, val2):
def _assert_generic_objects_equal(val1, val2, path=""):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the path argument to give the devs a better verbose idea of where things fail, to debug things easier


# Default settings for X
N_SAMPLES = 50
N_SAMPLES = 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an SGD estimator was ill defined with 50 samples, increasing to 100

@TamaraAtanasoska
Copy link
Contributor Author

@adrinjalali any legwork left that you'd like to or can share to make this faster or it is good to go as it is when the changes are merged in sklearn?

@adrinjalali
Copy link
Member

@TamaraAtanasoska what's left is to take all estimators generated by _construct_instances instead of only the first one. We'd need to restructure the test to allow for a for loop over the values yielded by the _construct_instances function.

@TamaraAtanasoska
Copy link
Contributor Author

TamaraAtanasoska commented Nov 28, 2024

@TamaraAtanasoska what's left is to take all estimators generated by _construct_instances instead of only the first one. We'd need to restructure the test to allow for a for loop over the values yielded by the _construct_instances function.

do you mean in _tested_estimators and _unsupported_estimators? aren't both tests already looping through each estimator in all_estimators and then only creating an instance of it? or this exactly is the issue, it would be better to create instances of all estimators at once and then loop through them for sklearn 1.6?

Edit: I see this not about all_estimators, but all estimators produced by _construct_instances from that one estimator retrieved from all_estimators passed to it in the test 😄 it is confusing.

Should I take it on or you are already working on it?

@adrinjalali
Copy link
Member

Should I take it on or you are already working on it?

No I'm not working on that, you can take it on :)

@adrinjalali
Copy link
Member

BTW, we're working with @glemaitre on this: https://github.com/glemaitre/sklearn-compat

@TamaraAtanasoska
Copy link
Contributor Author

BTW, we're working with @glemaitre on this: https://github.com/glemaitre/sklearn-compat

that is a cool effort! This can get stressful for a lot of folks otherwise

@TamaraAtanasoska
Copy link
Contributor Author

@adrinjalali there is a TypeError related to the losses as a result of this new commit 712cb13. I suppose that will be fixed with everything together?

FAILED skops/io/tests/test_persist.py::test_can_persist_fitted[TweedieRegressor(max_iter=5)] - TypeError: __init__() takes exactly 1 positional argument (3 given)

>   ???
E   TypeError: __init__() takes exactly 1 positional argument (3 given)

sklearn/_loss/_loss.pyx:1615: TypeError
`

@adrinjalali
Copy link
Member

Yes, that's exactly the error fixed with the now merged PR. The CI here should be green tomorrow once the nightly build is uploaded / updated.

@TamaraAtanasoska
Copy link
Contributor Author

TamaraAtanasoska commented Nov 29, 2024

@adrinjalali refactor in c3da1b9 with the new changes regarding scikit-learn/scikit-learn#30372 although I guess that will be merged soon. If it gets merged before we merge I'd like to change the code slightly as I had it before as it was nicer :)

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a whole bunch of fixes here. I think I rather merge this and do other fixes in separate PRs.

Also, it turns out we were NOT testing against old sklearn versions, which is fixed now here, but it makes CI fail on those old ones, but at least they're tested. Also, the nightly build should be fixed (at least I tested locally with a latest version).

@adrinjalali adrinjalali merged commit 00f5f07 into skops-dev:main Dec 2, 2024
7 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make skops compatible with scikit-learn 1.6
2 participants