-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add ComparisonReport
to compare instances of EstimatorReport
#1286
base: main
Are you sure you want to change the base?
Conversation
commit b8dc696 Author: Thomas S <thomas@probabl.ai> Date: Tue Feb 4 16:33:37 2025 +0100 mob next [ci-skip] [ci skip] [skip ci] lastFile:skore/tests/unit/sklearn/test_comparator.py commit 4460077 Author: Auguste Baum <auguste@probabl.ai> Date: Tue Feb 4 16:25:46 2025 +0100 mob next [ci-skip] [ci skip] [skip ci] lastFile:skore/tests/unit/sklearn/test_comparator.py commit 4744190 Author: Thomas S <thomas@probabl.ai> Date: Tue Feb 4 15:59:10 2025 +0100 mob next [ci-skip] [ci skip] [skip ci] lastFile:skore/tests/unit/sklearn/test_comparator.py commit 69e7b08 Author: Auguste Baum <auguste@probabl.ai> Date: Tue Feb 4 14:00:14 2025 +0100 mob next [ci-skip] [ci skip] [skip ci] lastFile:skore/src/skore/sklearn/comparator/metrics_accessor.py commit c28b40e Author: Thomas S <thomas@probabl.ai> Date: Tue Feb 4 12:23:13 2025 +0100 mob next [ci-skip] [ci skip] [skip ci] lastFile:skore/src/skore/sklearn/comparator/comparator.py commit 63c5b5f Author: Auguste Baum <auguste@probabl.ai> Date: Tue Feb 4 11:24:54 2025 +0100 mob next [ci-skip] [ci skip] [skip ci] lastFile:skore/tests/unit/sklearn/test_comparator.py commit c038e59 Author: Thomas S <thomas@probabl.ai> Date: Tue Feb 4 10:47:28 2025 +0100 mob next [ci-skip] [ci skip] [skip ci] lastFile:skore/src/skore/sklearn/comparator/comparator.py commit a39f330 Author: Thomas S <thomas@probabl.ai> Date: Tue Feb 4 10:35:20 2025 +0100 mob start [ci-skip] [ci skip] [skip ci] Co-authored-by: Auguste Baum <auguste@probabl.ai>
fadda57
to
f5296be
Compare
Could you elaborate on why it makes sense for the cross validation report to reuse the comparator. Is it about inheritance or about some internal routines? |
The goal is to see if some code can be factorized between comparison and cross-validation reports, as there is a lot of duplication between their metric accessors. |
Co-authored-by: Thomas S <thomas@probabl.ai>
ef95737
to
21d2bd5
Compare
b6774ba
to
5fe9e2c
Compare
commit c18debe Author: Thomas S <thomas@probabl.ai> Date: Thu Feb 6 14:24:22 2025 +0100 mob next [ci-skip] [ci skip] [skip ci] lastFile:skore/src/skore/sklearn/_plot/roc_curve.py commit 8a8f611 Author: Auguste Baum <auguste@probabl.ai> Date: Thu Feb 6 14:14:01 2025 +0100 mob next [ci-skip] [ci skip] [skip ci] lastFile:skore/tests/unit/sklearn/test_comparison.py commit 278e39a Author: Thomas S <thomas@probabl.ai> Date: Thu Feb 6 14:05:39 2025 +0100 mob next [ci-skip] [ci skip] [skip ci] lastFile:skore/tests/unit/sklearn/test_comparison.py commit 31d7384 Author: Auguste Baum <auguste@probabl.ai> Date: Thu Feb 6 13:53:56 2025 +0100 mob next [ci-skip] [ci skip] [skip ci] lastFile:skore/src/skore/sklearn/_comparison/metrics_accessor.py commit 6913cd1 Author: Auguste Baum <auguste@probabl.ai> Date: Thu Feb 6 13:20:56 2025 +0100 mob next [ci-skip] [ci skip] [skip ci] lastFile:skore/src/skore/sklearn/_comparison/report.py commit 4744202 Author: Thomas S <thomas@probabl.ai> Date: Thu Feb 6 11:46:12 2025 +0100 mob next [ci-skip] [ci skip] [skip ci] lastFile:skore/src/skore/sklearn/_comparison/metrics_accessor.py commit fab3a0b Author: Thomas S <thomas@probabl.ai> Date: Thu Feb 6 11:46:05 2025 +0100 mob start [ci-skip] [ci skip] [skip ci] # automatically added all co-authors from WIP commits # add missing co-authors manually Co-authored-by: Auguste Baum <auguste@probabl.ai>
commit 0eac394 Author: Thomas S <thomas@probabl.ai> Date: Thu Feb 6 17:06:53 2025 +0100 mob next [ci-skip] [ci skip] [skip ci] lastFile:skore/src/skore/sklearn/_comparison/metrics_accessor.py commit 083dd34 Author: Auguste Baum <auguste@probabl.ai> Date: Thu Feb 6 16:59:49 2025 +0100 mob next [ci-skip] [ci skip] [skip ci] lastFile:skore/src/skore/sklearn/_comparison/metrics_accessor.py commit f3bea91 Author: Thomas S <thomas@probabl.ai> Date: Thu Feb 6 16:50:32 2025 +0100 mob next [ci-skip] [ci skip] [skip ci] lastFile:skore/src/skore/sklearn/_comparison/metrics_accessor.py commit 1ad7d84 Author: Auguste Baum <auguste@probabl.ai> Date: Thu Feb 6 16:43:13 2025 +0100 mob next [ci-skip] [ci skip] [skip ci] lastFile:skore/src/skore/sklearn/_comparison/metrics_accessor.py commit 235fd18 Author: Thomas S <thomas@probabl.ai> Date: Thu Feb 6 16:35:42 2025 +0100 mob next [ci-skip] [ci skip] [skip ci] lastFile:skore/src/skore/sklearn/_comparison/report.py commit 23c8028 Author: Auguste Baum <auguste@probabl.ai> Date: Thu Feb 6 16:28:18 2025 +0100 mob next [ci-skip] [ci skip] [skip ci] lastFile:skore/src/skore/sklearn/_comparison/metrics_accessor.py commit b6004d5 Author: Thomas S <thomas@probabl.ai> Date: Thu Feb 6 16:19:43 2025 +0100 mob next [ci-skip] [ci skip] [skip ci] lastFile:skore/src/skore/sklearn/_comparison/metrics_accessor.py commit 0a6747d Author: Auguste Baum <auguste@probabl.ai> Date: Thu Feb 6 16:00:37 2025 +0100 mob next [ci-skip] [ci skip] [skip ci] lastFile:skore/src/skore/sklearn/_comparison/metrics_accessor.py # automatically added all co-authors from WIP commits # add missing co-authors manually Co-authored-by: Auguste Baum <auguste@probabl.ai>
ComparisonReport
to compare multiple instances of EstimatorReport
ComparisonReport
to compare multiple instances of EstimatorReport
ComparisonReport
to compare instances of EstimatorReport
223c3c8
to
a202bf7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the ComparisonReport
does not appear in the API?
Oh sorry, we forgot! On it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thinking once I check the example
@@ -0,0 +1,234 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that we are going to potentially increase the number of examples pretty quick (it is already a huge issue in scikit-learn).
I think that it would be best to think of incorporating the usage of the comparator in existing example. I think that we could add a section at the following one: https://skore.probabl.ai/dev/auto_examples/use_cases/plot_employee_salaries.html
The idea would to train the linear & gbdt model and make the comparison using the comparator in the section.
Later when the comparator supports cross-validation report, then we can remove the sections and directly compare the results from the cross-validation.
# Let us display the result of our benchmark: | ||
|
||
# %% | ||
df_report_metrics = comp.metrics.report_metrics() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the rendering and at a first glance I find the level_0
integral index weird. I need to think more about it.
display = comp.metrics.plot.roc() | ||
plt.tight_layout() | ||
|
||
# %% | ||
# .. note:: | ||
# | ||
# In a near future version of skore, the legends will be correct with the names of | ||
# the estimators. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll not merge until it is fixed ;P
# Now, let us try to compare these estimator reports: | ||
|
||
# %% | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that there is not need to show that :)
# Methodologically, we always want to compare estimator reports on the exact same | ||
# dataset. If we do not know where the datasets come from, then we expect skore to | ||
# throw a warning: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me this check should be delayed at computation of the metric. We are able to have a hash
on the data or to know that they differ and thus warn properly.
Otherwise we will be raising too many potential true positive.
a586366
to
689820e
Compare
@glemaitre i have rebased our work with your last changes on the plot API (commit ⬆️ 784c022 + ⬇️ b91cdd0). Can you check it? This shows me once again that we should refactor the |
Working on a review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, is a partial review. I'll continue after the demo
def usecase( | ||
type: Literal["binary-logisitic-regression", "linear-regression"], | ||
random_state: Optional[int] = 42, | ||
): | ||
if type == "binary-logistic-regression": | ||
X, y = make_classification(n_classes=2, random_state=random_state) | ||
estimator = LogisticRegression() | ||
elif type == "linear-regression": | ||
X, y = make_regression(random_state=random_state) | ||
estimator = LinearRegression() | ||
else: | ||
raise ValueError | ||
|
||
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=random_state) | ||
|
||
return estimator, X_train, X_test, y_train, y_test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would create two separate fixture.
) | ||
|
||
with pytest.raises( | ||
TypeError, match="object of type 'EstimatorReport' has no len()" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message tells me that it is not informative enough the end-user. So we should probably have if not isinstance(reports, Iterable)
to mention that we want a list of reports.
with pytest.raises( | ||
TypeError, match="Only instances of EstimatorReport are allowed" | ||
): | ||
ComparisonReport([estimator_report, None]) | ||
|
||
with pytest.raises( | ||
TypeError, match="Only instances of EstimatorReport are allowed" | ||
): | ||
ComparisonReport([None, estimator_report]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those two cases look redundant. Why not only use the latest one?
if (report.X_test is None) or (report.y_test is None): | ||
raise ValueError("Cannot compare reports without testing data") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, you should be able using the external data_source
. I would expect something like this to work on the side of the end-user.
comparator.metrics.report_metrics(data_source="X_y", X=X, y=y)
if len(ml_tasks) > 1: | ||
raise ValueError("Not all estimators are in the same ML usecase") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you are accumulating in a set, I would differ raising the error after the loop because we can provide more information regarding the ML tasks of all estimators:
if len(ml_taks) > 1:
raise ValueError(
"Not all estimators were not trained on the same machine-learning task."
"The tasks of the provided reports is: {ml_tasks}."
)
I would probably store the tasks into a list to raise a better error message and have 1 report, 1 task. It helps in the debugging task.
def warn(title, message): | ||
from rich.panel import Panel | ||
|
||
from skore import console | ||
|
||
console.print( | ||
Panel( | ||
title=title, | ||
renderable=message, | ||
style="orange1", | ||
border_style="cyan", | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it used?
_SCORE_OR_LOSS_ICONS = { | ||
"accuracy": "(↗︎)", | ||
"precision": "(↗︎)", | ||
"recall": "(↗︎)", | ||
"brier_score": "(↘︎)", | ||
"roc_auc": "(↗︎)", | ||
"log_loss": "(↘︎)", | ||
"r2": "(↗︎)", | ||
"rmse": "(↘︎)", | ||
"report_metrics": "", | ||
"custom_metric": "", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this in on of the PR, it should be called _SCORE_OR_LOSS_ICONS
and be a dict
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we should be able to remove report_metrics
and custom_metrics
now
|
||
Parameters | ||
---------- | ||
data_source : {"test", "train"}, default="test" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here, we should be supporting the X_y
case. When it will come to implement for the CrossValidationReport
, this option is indeed not supported.
So it makes me think that we might want to store in the __init__
of the report, what type of reporter we get and we might raise an error in the different metrics based on that.
scoring_names=None, | ||
pos_label=None, | ||
scoring_kwargs=None, | ||
aggregate=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the EstimatorReport
, the aggregate
is not useful. It is only an interesting option to compute a statistic on several splits.
|
||
results = pd.concat(results, axis=0, ignore_index=True) | ||
results.index = pd.MultiIndex.from_tuples( | ||
enumerate(self._parent.report_names_), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's start simple and avoid to add the enumerate since the user can override the report_names
.
ComparisonReport
report.metrics.accuracy(data_source="train")
is called with at least one EstimatorReport that does not have training dataUpdate plots legend(deferred to future PR)RocCurveDisplay
needs a full refactor to be splitted by use-case: estimator report, cross-validation report and finally comparison report. In each of these use-cases, there is two scenarios with binary classification and multi-class classification. Otherwise, it will be unmaintainable.Investigate missing metrics in(deferred to future PR)report_metrics
report_metrics
andavailable_if
; it should be merged (ideally everything inavailable_if
?)Refactor to make(deferred to future PR)CrossValidationReport
depend on itChange EstimatorReportIssuerepr
?EstimatorReport
repr method has unintended side-effects #1293Closes #1245
Co-authored-by: Auguste auguste@probabl.ai
Co-authored-by: Sylvain sylvain@probabl.ai