fix: Use estimator whenever possible to detect the ML task #998

glemaitre · 2024-12-21T15:30:50Z

I think that I saw the feedback somewhere and try to consolidate here.
Using estimator to find the type of ML-task is more robust that using the target y.

Here, I change the logic to use estimator as much as possible. However, if not fitted, I fallback on the target to detect which type of classification we are facing. If only y is provided, I use the previous approach only relying on y.

I added a test file.

NB: is_clusterer is only available from sklearn 1.6. I vendor a file _sklearn_compat.py that contains utility to make it easy to have developer tools working from sklearn 1.2 to 1.6. While some of them are not useful right now, I just want to vendor it completely. The package itself is tested and developed here: https://github.com/sklearn-compat/sklearn-compat.

thomass-dev

Thanks for this improvement :)

skore/src/skore/sklearn/_sklearn_compat.py

skore/src/skore/sklearn/find_ml_task.py

Co-authored-by: Thomas S. <th.salvatore@gmail.com>

…or_base

closes #834 Investigate an API for a `EstimatorReport`. #### TODO - [x] Metrics - [x] handle string metrics has specified in the accessor - [x] handle callable metrics - [x] handle scikit-learn scorers - [x] use efficiently the cache as much as possible - [x] add testing for all of those features - [x] allow to pass new validation set to functions instead of using the internal validation set - [x] add a proper help and rich `__repr__` - [x] Plots - [x] add the roc curve display - [x] add the precision recall curve display - [x] add prediction error display for regressor - [x] make proper testing for those displays - [x] add a proper `__repr__` for those displays - [x] Documentation - [x] (done for the checked part) add an example to showcase all the different features - [x] find a way to show the accessors documentation in the page of `EstimatorReport`. It could be a bit tricky because they are only defined once the instance created. - We need to have a look at the `series.rst` page from pandas to see how they document this sort of pattern. - [x] check the autocompletion: when typing `report.metrics.->tab` it should provide the autocompetion. **edit**: having a stub file is actually working. I prefer this than type hints directly in the file. - Open questions - [x] we use hashing to retrieve external set. - use the caching for the external validation set? To make it work we need to compute the hash of potentially big arrays. This might more costly than making the model predict. #### Notes This PR build upon: - #962 to reuse the `skore.console` - #998 to be able to detect clusterer in a consistent manner.

glemaitre added 2 commits December 21, 2024 16:25

fix: Use estimator whenever possible to detect the ML task

acb51e0

iter

92b4e6c

glemaitre mentioned this pull request Dec 23, 2024

feat: Design of EstimatorReport #997

Merged

19 tasks

thomass-dev force-pushed the main branch 7 times, most recently from ad7922d to 469d1e5 Compare December 31, 2024 14:42

thomass-dev self-requested a review January 3, 2025 14:19

thomass-dev requested changes Jan 3, 2025

View reviewed changes

skore/src/skore/sklearn/_sklearn_compat.py Outdated Show resolved Hide resolved

skore/src/skore/sklearn/find_ml_task.py Outdated Show resolved Hide resolved

skore/src/skore/sklearn/find_ml_task.py Show resolved Hide resolved

glemaitre and others added 3 commits January 3, 2025 16:01

Update skore/src/skore/sklearn/find_ml_task.py

5b0eda7

Co-authored-by: Thomas S. <th.salvatore@gmail.com>

address thomas feedback

086eb61

Merge remote-tracking branch 'origin/main' into _find_ml_task_estimat…

2176b0e

…or_base

thomass-dev approved these changes Jan 3, 2025

View reviewed changes

thomass-dev merged commit b7bf3ea into probabl-ai:main Jan 3, 2025
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Use estimator whenever possible to detect the ML task #998

fix: Use estimator whenever possible to detect the ML task #998

glemaitre commented Dec 21, 2024

thomass-dev left a comment

fix: Use estimator whenever possible to detect the ML task #998

fix: Use estimator whenever possible to detect the ML task #998

Conversation

glemaitre commented Dec 21, 2024

thomass-dev left a comment

Choose a reason for hiding this comment