-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Use estimator whenever possible to detect the ML task #998
Merged
thomass-dev
merged 5 commits into
probabl-ai:main
from
glemaitre:_find_ml_task_estimator_base
Jan 3, 2025
Merged
fix: Use estimator whenever possible to detect the ML task #998
thomass-dev
merged 5 commits into
probabl-ai:main
from
glemaitre:_find_ml_task_estimator_base
Jan 3, 2025
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
19 tasks
ad7922d
to
469d1e5
Compare
thomass-dev
requested changes
Jan 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this improvement :)
Co-authored-by: Thomas S. <th.salvatore@gmail.com>
thomass-dev
approved these changes
Jan 3, 2025
thomass-dev
pushed a commit
that referenced
this pull request
Jan 10, 2025
closes #834 Investigate an API for a `EstimatorReport`. #### TODO - [x] Metrics - [x] handle string metrics has specified in the accessor - [x] handle callable metrics - [x] handle scikit-learn scorers - [x] use efficiently the cache as much as possible - [x] add testing for all of those features - [x] allow to pass new validation set to functions instead of using the internal validation set - [x] add a proper help and rich `__repr__` - [x] Plots - [x] add the roc curve display - [x] add the precision recall curve display - [x] add prediction error display for regressor - [x] make proper testing for those displays - [x] add a proper `__repr__` for those displays - [x] Documentation - [x] (done for the checked part) add an example to showcase all the different features - [x] find a way to show the accessors documentation in the page of `EstimatorReport`. It could be a bit tricky because they are only defined once the instance created. - We need to have a look at the `series.rst` page from pandas to see how they document this sort of pattern. - [x] check the autocompletion: when typing `report.metrics.->tab` it should provide the autocompetion. **edit**: having a stub file is actually working. I prefer this than type hints directly in the file. - Open questions - [x] we use hashing to retrieve external set. - use the caching for the external validation set? To make it work we need to compute the hash of potentially big arrays. This might more costly than making the model predict. #### Notes This PR build upon: - #962 to reuse the `skore.console` - #998 to be able to detect clusterer in a consistent manner.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I think that I saw the feedback somewhere and try to consolidate here.
Using
estimator
to find the type of ML-task is more robust that using the targety
.Here, I change the logic to use
estimator
as much as possible. However, if not fitted, I fallback on the target to detect which type of classification we are facing. If onlyy
is provided, I use the previous approach only relying ony
.I added a test file.
NB:
is_clusterer
is only available from sklearn 1.6. I vendor a file_sklearn_compat.py
that contains utility to make it easy to have developer tools working from sklearn 1.2 to 1.6. While some of them are not useful right now, I just want to vendor it completely. The package itself is tested and developed here: https://github.com/sklearn-compat/sklearn-compat.