Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add ComparisonReport to compare instances of EstimatorReport #1286

Draft
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

thomass-dev
Copy link
Collaborator

@thomass-dev thomass-dev commented Feb 4, 2025

  • Rename to ComparisonReport
  • Rebase on top of docs(future feat): Example driven development #1239 and adapt
  • Raise if report.metrics.accuracy(data_source="train") is called with at least one EstimatorReport that does not have training data
  • Test
  • Docstrings
    • MetricsAccessor
  • Move index column "#0" in front of each metric
  •  Pass report names in comparator
  • Update plots legend (deferred to future PR)
    • The actual RocCurveDisplay needs a full refactor to be splitted by use-case: estimator report, cross-validation report and finally comparison report. In each of these use-cases, there is two scenarios with binary classification and multi-class classification. Otherwise, it will be unmaintainable.
  • Investigate missing metrics in report_metrics (deferred to future PR)
    • The logic is split between report_metrics and available_if; it should be merged (ideally everything in available_if?)
  • Refactor to make CrossValidationReport depend on it (deferred to future PR)
  • Change EstimatorReport repr? Issue EstimatorReport repr method has unintended side-effects #1293

Closes #1245

Co-authored-by: Auguste auguste@probabl.ai
Co-authored-by: Sylvain sylvain@probabl.ai

commit b8dc696
Author: Thomas S <thomas@probabl.ai>
Date:   Tue Feb 4 16:33:37 2025 +0100

    mob next [ci-skip] [ci skip] [skip ci]

    lastFile:skore/tests/unit/sklearn/test_comparator.py

commit 4460077
Author: Auguste Baum <auguste@probabl.ai>
Date:   Tue Feb 4 16:25:46 2025 +0100

    mob next [ci-skip] [ci skip] [skip ci]

    lastFile:skore/tests/unit/sklearn/test_comparator.py

commit 4744190
Author: Thomas S <thomas@probabl.ai>
Date:   Tue Feb 4 15:59:10 2025 +0100

    mob next [ci-skip] [ci skip] [skip ci]

    lastFile:skore/tests/unit/sklearn/test_comparator.py

commit 69e7b08
Author: Auguste Baum <auguste@probabl.ai>
Date:   Tue Feb 4 14:00:14 2025 +0100

    mob next [ci-skip] [ci skip] [skip ci]

    lastFile:skore/src/skore/sklearn/comparator/metrics_accessor.py

commit c28b40e
Author: Thomas S <thomas@probabl.ai>
Date:   Tue Feb 4 12:23:13 2025 +0100

    mob next [ci-skip] [ci skip] [skip ci]

    lastFile:skore/src/skore/sklearn/comparator/comparator.py

commit 63c5b5f
Author: Auguste Baum <auguste@probabl.ai>
Date:   Tue Feb 4 11:24:54 2025 +0100

    mob next [ci-skip] [ci skip] [skip ci]

    lastFile:skore/tests/unit/sklearn/test_comparator.py

commit c038e59
Author: Thomas S <thomas@probabl.ai>
Date:   Tue Feb 4 10:47:28 2025 +0100

    mob next [ci-skip] [ci skip] [skip ci]

    lastFile:skore/src/skore/sklearn/comparator/comparator.py

commit a39f330
Author: Thomas S <thomas@probabl.ai>
Date:   Tue Feb 4 10:35:20 2025 +0100

    mob start [ci-skip] [ci skip] [skip ci]

Co-authored-by: Auguste Baum <auguste@probabl.ai>
Copy link
Contributor

github-actions bot commented Feb 5, 2025

Documentation preview @ cdef817

@glemaitre
Copy link
Member

Could you elaborate on why it makes sense for the cross validation report to reuse the comparator. Is it about inheritance or about some internal routines?

@thomass-dev
Copy link
Collaborator Author

thomass-dev commented Feb 5, 2025

Could you elaborate on why it makes sense for the cross validation report to reuse the comparator. Is it about inheritance or about some internal routines?

The goal is to see if some code can be factorized between comparison and cross-validation reports, as there is a lot of duplication between their metric accessors.

Co-authored-by: Thomas S <thomas@probabl.ai>
thomass-dev and others added 7 commits February 6, 2025 15:19
commit c18debe
Author: Thomas S <thomas@probabl.ai>
Date:   Thu Feb 6 14:24:22 2025 +0100

    mob next [ci-skip] [ci skip] [skip ci]

    lastFile:skore/src/skore/sklearn/_plot/roc_curve.py

commit 8a8f611
Author: Auguste Baum <auguste@probabl.ai>
Date:   Thu Feb 6 14:14:01 2025 +0100

    mob next [ci-skip] [ci skip] [skip ci]

    lastFile:skore/tests/unit/sklearn/test_comparison.py

commit 278e39a
Author: Thomas S <thomas@probabl.ai>
Date:   Thu Feb 6 14:05:39 2025 +0100

    mob next [ci-skip] [ci skip] [skip ci]

    lastFile:skore/tests/unit/sklearn/test_comparison.py

commit 31d7384
Author: Auguste Baum <auguste@probabl.ai>
Date:   Thu Feb 6 13:53:56 2025 +0100

    mob next [ci-skip] [ci skip] [skip ci]

    lastFile:skore/src/skore/sklearn/_comparison/metrics_accessor.py

commit 6913cd1
Author: Auguste Baum <auguste@probabl.ai>
Date:   Thu Feb 6 13:20:56 2025 +0100

    mob next [ci-skip] [ci skip] [skip ci]

    lastFile:skore/src/skore/sklearn/_comparison/report.py

commit 4744202
Author: Thomas S <thomas@probabl.ai>
Date:   Thu Feb 6 11:46:12 2025 +0100

    mob next [ci-skip] [ci skip] [skip ci]

    lastFile:skore/src/skore/sklearn/_comparison/metrics_accessor.py

commit fab3a0b
Author: Thomas S <thomas@probabl.ai>
Date:   Thu Feb 6 11:46:05 2025 +0100

    mob start [ci-skip] [ci skip] [skip ci]

# automatically added all co-authors from WIP commits
# add missing co-authors manually
Co-authored-by: Auguste Baum <auguste@probabl.ai>
commit 0eac394
Author: Thomas S <thomas@probabl.ai>
Date:   Thu Feb 6 17:06:53 2025 +0100

    mob next [ci-skip] [ci skip] [skip ci]

    lastFile:skore/src/skore/sklearn/_comparison/metrics_accessor.py

commit 083dd34
Author: Auguste Baum <auguste@probabl.ai>
Date:   Thu Feb 6 16:59:49 2025 +0100

    mob next [ci-skip] [ci skip] [skip ci]

    lastFile:skore/src/skore/sklearn/_comparison/metrics_accessor.py

commit f3bea91
Author: Thomas S <thomas@probabl.ai>
Date:   Thu Feb 6 16:50:32 2025 +0100

    mob next [ci-skip] [ci skip] [skip ci]

    lastFile:skore/src/skore/sklearn/_comparison/metrics_accessor.py

commit 1ad7d84
Author: Auguste Baum <auguste@probabl.ai>
Date:   Thu Feb 6 16:43:13 2025 +0100

    mob next [ci-skip] [ci skip] [skip ci]

    lastFile:skore/src/skore/sklearn/_comparison/metrics_accessor.py

commit 235fd18
Author: Thomas S <thomas@probabl.ai>
Date:   Thu Feb 6 16:35:42 2025 +0100

    mob next [ci-skip] [ci skip] [skip ci]

    lastFile:skore/src/skore/sklearn/_comparison/report.py

commit 23c8028
Author: Auguste Baum <auguste@probabl.ai>
Date:   Thu Feb 6 16:28:18 2025 +0100

    mob next [ci-skip] [ci skip] [skip ci]

    lastFile:skore/src/skore/sklearn/_comparison/metrics_accessor.py

commit b6004d5
Author: Thomas S <thomas@probabl.ai>
Date:   Thu Feb 6 16:19:43 2025 +0100

    mob next [ci-skip] [ci skip] [skip ci]

    lastFile:skore/src/skore/sklearn/_comparison/metrics_accessor.py

commit 0a6747d
Author: Auguste Baum <auguste@probabl.ai>
Date:   Thu Feb 6 16:00:37 2025 +0100

    mob next [ci-skip] [ci skip] [skip ci]

    lastFile:skore/src/skore/sklearn/_comparison/metrics_accessor.py

# automatically added all co-authors from WIP commits
# add missing co-authors manually
Co-authored-by: Auguste Baum <auguste@probabl.ai>
@thomass-dev thomass-dev changed the title feat: Add reports comparator feat: Add ComparisonReport to compare multiple instances of EstimatorReport Feb 7, 2025
@thomass-dev thomass-dev changed the title feat: Add ComparisonReport to compare multiple instances of EstimatorReport feat: Add ComparisonReport to compare instances of EstimatorReport Feb 7, 2025
@auguste-probabl auguste-probabl marked this pull request as ready for review February 7, 2025 11:07
@thomass-dev thomass-dev requested a review from glemaitre February 7, 2025 12:36
@sylvaincom sylvaincom self-requested a review February 7, 2025 15:00
Copy link
Contributor

@sylvaincom sylvaincom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the ComparisonReport does not appear in the API?

@auguste-probabl
Copy link
Contributor

I believe the ComparisonReport does not appear in the API?

Oh sorry, we forgot! On it

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thinking once I check the example

@@ -0,0 +1,234 @@
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that we are going to potentially increase the number of examples pretty quick (it is already a huge issue in scikit-learn).

I think that it would be best to think of incorporating the usage of the comparator in existing example. I think that we could add a section at the following one: https://skore.probabl.ai/dev/auto_examples/use_cases/plot_employee_salaries.html

The idea would to train the linear & gbdt model and make the comparison using the comparator in the section.

Later when the comparator supports cross-validation report, then we can remove the sections and directly compare the results from the cross-validation.

# Let us display the result of our benchmark:

# %%
df_report_metrics = comp.metrics.report_metrics()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the rendering and at a first glance I find the level_0 integral index weird. I need to think more about it.

Comment on lines 136 to 143
display = comp.metrics.plot.roc()
plt.tight_layout()

# %%
# .. note::
#
# In a near future version of skore, the legends will be correct with the names of
# the estimators.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll not merge until it is fixed ;P

# Now, let us try to compare these estimator reports:

# %%
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that there is not need to show that :)

Comment on lines 217 to 219
# Methodologically, we always want to compare estimator reports on the exact same
# dataset. If we do not know where the datasets come from, then we expect skore to
# throw a warning:
Copy link
Member

@glemaitre glemaitre Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me this check should be delayed at computation of the metric. We are able to have a hash on the data or to know that they differ and thus warn properly.

Otherwise we will be raising too many potential true positive.

@thomass-dev thomass-dev marked this pull request as draft February 10, 2025 13:41
@thomass-dev
Copy link
Collaborator Author

thomass-dev commented Feb 10, 2025

@glemaitre i have rebased our work with your last changes on the plot API (commit ⬆️ 784c022 + ⬇️ b91cdd0). Can you check it?

This shows me once again that we should refactor the metrics_accessors.py files to reduce duplicate code.

@glemaitre
Copy link
Member

Working on a review.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, is a partial review. I'll continue after the demo

Comment on lines +14 to +29
def usecase(
type: Literal["binary-logisitic-regression", "linear-regression"],
random_state: Optional[int] = 42,
):
if type == "binary-logistic-regression":
X, y = make_classification(n_classes=2, random_state=random_state)
estimator = LogisticRegression()
elif type == "linear-regression":
X, y = make_regression(random_state=random_state)
estimator = LinearRegression()
else:
raise ValueError

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=random_state)

return estimator, X_train, X_test, y_train, y_test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would create two separate fixture.

)

with pytest.raises(
TypeError, match="object of type 'EstimatorReport' has no len()"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message tells me that it is not informative enough the end-user. So we should probably have if not isinstance(reports, Iterable) to mention that we want a list of reports.

Comment on lines +53 to +61
with pytest.raises(
TypeError, match="Only instances of EstimatorReport are allowed"
):
ComparisonReport([estimator_report, None])

with pytest.raises(
TypeError, match="Only instances of EstimatorReport are allowed"
):
ComparisonReport([None, estimator_report])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those two cases look redundant. Why not only use the latest one?

Comment on lines +121 to +122
if (report.X_test is None) or (report.y_test is None):
raise ValueError("Cannot compare reports without testing data")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, you should be able using the external data_source. I would expect something like this to work on the side of the end-user.

comparator.metrics.report_metrics(data_source="X_y", X=X, y=y)

Comment on lines +127 to +128
if len(ml_tasks) > 1:
raise ValueError("Not all estimators are in the same ML usecase")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are accumulating in a set, I would differ raising the error after the loop because we can provide more information regarding the ML tasks of all estimators:

if len(ml_taks) > 1:
    raise ValueError(
        "Not all estimators were not trained on the same machine-learning task."
        "The tasks of the provided reports is: {ml_tasks}."
    )

I would probably store the tasks into a list to raise a better error message and have 1 report, 1 task. It helps in the debugging task.

Comment on lines +15 to +27
def warn(title, message):
from rich.panel import Panel

from skore import console

console.print(
Panel(
title=title,
renderable=message,
style="orange1",
border_style="cyan",
)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it used?

Comment on lines +23 to +34
_SCORE_OR_LOSS_ICONS = {
"accuracy": "(↗︎)",
"precision": "(↗︎)",
"recall": "(↗︎)",
"brier_score": "(↘︎)",
"roc_auc": "(↗︎)",
"log_loss": "(↘︎)",
"r2": "(↗︎)",
"rmse": "(↘︎)",
"report_metrics": "",
"custom_metric": "",
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this in on of the PR, it should be called _SCORE_OR_LOSS_ICONS and be a dict.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we should be able to remove report_metrics and custom_metrics now


Parameters
----------
data_source : {"test", "train"}, default="test"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here, we should be supporting the X_y case. When it will come to implement for the CrossValidationReport, this option is indeed not supported.

So it makes me think that we might want to store in the __init__ of the report, what type of reporter we get and we might raise an error in the different metrics based on that.

scoring_names=None,
pos_label=None,
scoring_kwargs=None,
aggregate=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the EstimatorReport, the aggregate is not useful. It is only an interesting option to compute a statistic on several splits.


results = pd.concat(results, axis=0, ignore_index=True)
results.index = pd.MultiIndex.from_tuples(
enumerate(self._parent.report_names_),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's start simple and avoid to add the enumerate since the user can override the report_names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: ComparisonReport
5 participants