Skip to content

Commit

Permalink
docs: Minor changes of the doc examples for v0.6 (#1212)
Browse files Browse the repository at this point in the history
Some minor changes
  • Loading branch information
sylvaincom authored Jan 23, 2025
1 parent 3ce6a33 commit 11910dd
Show file tree
Hide file tree
Showing 10 changed files with 54 additions and 42 deletions.
3 changes: 2 additions & 1 deletion examples/getting_started/plot_quick_start.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,8 @@
# Cleanup the project
# -------------------
#
# Let's clear the skore project (to avoid any conflict with other documentation examples).
# Let's clear the skore project (to avoid any conflict with other documentation
# examples).

# %%
my_project.clear()
3 changes: 2 additions & 1 deletion examples/getting_started/plot_skore_getting_started.py
Original file line number Diff line number Diff line change
Expand Up @@ -308,7 +308,8 @@
# Cleanup the project
# -------------------
#
# Let's clear the skore project (to avoid conflict with other documentation examples).
# Let's clear the skore project (to avoid any conflict with other documentation
# examples).

# %%
my_project.clear()
3 changes: 2 additions & 1 deletion examples/getting_started/plot_tracking_items.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,8 @@
# Cleanup the project
# -------------------
#
# Let's clear the skore project (to avoid any conflict with other documentation examples).
# Let's clear the skore project (to avoid any conflict with other documentation
# examples).

# %%
my_project.clear()
3 changes: 2 additions & 1 deletion examples/getting_started/plot_working_with_projects.py
Original file line number Diff line number Diff line change
Expand Up @@ -358,7 +358,8 @@ def my_func(x):
# Cleanup the project
# -------------------
#
# Let's clean the skore project to avoid conflict with other examples.
# Let's clear the skore project (to avoid any conflict with other documentation
# examples).

# %%
my_project.clear()
3 changes: 2 additions & 1 deletion examples/model_evaluation/plot_cross_validate.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,8 @@
# Cleanup the project
# -------------------
#
# Let's clear the skore project (to avoid any conflict with other documentation examples).
# Let's clear the skore project (to avoid any conflict with other documentation
# examples).

# %%
my_project.clear()
47 changes: 26 additions & 21 deletions examples/model_evaluation/plot_estimator_report.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@
"""

# %%
# Loading our dataset and defining our estimator
# ==============================================
#
# First, we load a dataset from skrub. Our goal is to predict if a company paid a physician. The ultimate goal is to
# detect potential conflict of interest when it comes to the actual problem that we want to solve.

Expand Down Expand Up @@ -46,8 +49,8 @@
X_train, X_test, y_train, y_test = train_test_split(df, y, random_state=42)

# %%
# By the way, notice how skore's :func:`~train_test_split` automatically warns us for a
# class imbalance.
# By the way, notice how skore's :func:`~skore.train_test_split` automatically warns us
# for a class imbalance.
#
# Now, we need to define a predictive model. Hopefully, `skrub` provides a convenient
# function (:func:`skrub.tabular_learner`) when it comes to getting strong baseline
Expand All @@ -62,9 +65,11 @@
estimator

# %%
# Getting insights from our estimator
# ===================================
#
# Introducing the :class:`skore.EstimatorReport` class
# ----------------------------------------------------
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#
# Now, we would be interested in getting some insights from our predictive model.
# One way is to use the :class:`skore.EstimatorReport` class. This constructor will
Expand All @@ -79,15 +84,15 @@
# %%
#
# Once the report is created, we get some information regarding the available tools
# allowing us to get some insights from our specific model on the specific task.
# allowing us to get some insights from our specific model on our specific task.
#
# You can get a similar information if you call the :meth:`~skore.EstimatorReport.help`
# We can get a similar information if we call the :meth:`~skore.EstimatorReport.help`
# method.
report.help()

# %%
#
# Be aware that you can access the help for each individual sub-accessor. For instance:
# Be aware that we can access the help for each individual sub-accessor. For instance:
report.metrics.help()

# %%
Expand All @@ -96,7 +101,7 @@
# %%
#
# Metrics computation with aggressive caching
# -------------------------------------------
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#
# At this point, we might be interested to have a first look at the statistical
# performance of our model on the validation set that we provided. We can access it
Expand All @@ -119,7 +124,7 @@
# the caching mechanism. Indeed, when we have a large enough dataset, computing the
# predictions for a model is not cheap anymore. For instance, on our smallish dataset,
# it took a couple of seconds to compute the metrics. The report will cache the
# predictions and if you are interested in computing a metric again or an alternative
# predictions and if we are interested in computing a metric again or an alternative
# metric that requires the same predictions, it will be faster. Let's check by
# requesting the same metrics report again.

Expand Down Expand Up @@ -170,7 +175,7 @@

# %%
#
# By default, the metrics are computed on the test set. However, if a training set
# By default, the metrics are computed on the test set only. However, if a training set
# is provided, we can also compute the metrics by specifying the `data_source`
# parameter.
report.metrics.log_loss(data_source="train")
Expand Down Expand Up @@ -210,13 +215,13 @@

# %%
#
# .. warning::
# .. note::
# In this last example, we rely on computing the hash of the input data. Therefore,
# there is a trade-off: the computation of the hash is not free and it might be
# faster to compute the predictions instead.
#
# Be aware that you can also benefit from the caching mechanism with your own custom
# metrics. We only expect that you define your own metric function to take `y_true`
# Be aware that we can also benefit from the caching mechanism with our own custom
# metrics. Skore only expects that we define our own metric function to take `y_true`
# and `y_pred` as the first two positional arguments. It can take any other arguments.
# Let's see an example.

Expand Down Expand Up @@ -288,7 +293,7 @@ def operational_decision_cost(y_true, y_pred, amount):
# %%
#
# We observe that caching is working as expected. It is really handy because it means
# that you can compute some additional metrics without having to recompute the
# that we can compute some additional metrics without having to recompute the
# the predictions.
report.metrics.report_metrics(
scoring=["precision", "recall", operational_decision_cost],
Expand All @@ -302,9 +307,9 @@ def operational_decision_cost(y_true, y_pred, amount):

# %%
#
# It could happen that you are interested in providing several custom metrics which
# does not necessarily share the same parameters. In this more complex case, we will
# require you to provide a scorer using the :func:`sklearn.metrics.make_scorer`
# It could happen that we are interested in providing several custom metrics which
# does not necessarily share the same parameters. In this more complex case, skore will
# require us to provide a scorer using the :func:`sklearn.metrics.make_scorer`
# function.
from sklearn.metrics import make_scorer, f1_score

Expand All @@ -322,10 +327,10 @@ def operational_decision_cost(y_true, y_pred, amount):
# %%
#
# Effortless one-liner plotting
# -----------------------------
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#
# The :class:`skore.EstimatorReport` class also provides a plotting interface that
# allows to plot *defacto* the most common plots. As for the the metrics, we only
# allows to plot *defacto* the most common plots. As for the metrics, we only
# provide the meaningful set of plots for the provided estimator.
report.metrics.plot.help()

Expand All @@ -338,9 +343,9 @@ def operational_decision_cost(y_true, y_pred, amount):
# %%
#
# The plot functionality is built upon the scikit-learn display objects. We return
# those display (slightly modified to improve the UI) in case you want to tweak some
# of the plot properties. You can have quick look at the available attributes and
# methods by calling the `help` method or simply by printing the display.
# those display (slightly modified to improve the UI) in case we want to tweak some
# of the plot properties. We can have quick look at the available attributes and
# methods by calling the ``help`` method or simply by printing the display.
display

# %%
Expand Down
3 changes: 2 additions & 1 deletion examples/model_evaluation/plot_train_test_split.py
Original file line number Diff line number Diff line change
Expand Up @@ -246,7 +246,8 @@
# Cleanup the project
# -------------------
#
# Let's clear the skore project (to avoid any conflict with other documentation examples).
# Let's clear the skore project (to avoid any conflict with other documentation
# examples).

# %%
my_project.clear()
6 changes: 3 additions & 3 deletions examples/technical_details/plot_cache_mechanism.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@
# Caching the predictions for fast metric computation
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#
# First, let us focus on :class:`~skore.EstimatorReport`, as the same philosophy will
# First, we focus on :class:`~skore.EstimatorReport`, as the same philosophy will
# apply to :class:`~skore.CrossValidationReport`.
#
# Let's explore how :class:`~skore.EstimatorReport` uses caching to speed up
Expand Down Expand Up @@ -238,13 +238,13 @@
# %%
#
# We only use the cache to retrieve the `display` object and not directly the matplotlib
# figure. It means that you can still customize the cached plot before displaying it:
# figure. It means that we can still customize the cached plot before displaying it:
display.plot(roc_curve_kwargs={"color": "tab:orange"})
plt.tight_layout()

# %%
#
# Be aware that you can clear the cache if you want to:
# Be aware that we can clear the cache if we want to:
report.clear_cache()
report._cache

Expand Down
23 changes: 12 additions & 11 deletions examples/use_cases/plot_employee_salaries.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@

# %%
#
# We use a `skrub` dataset that is non-trivial.
# We use a skrub dataset that is non-trivial.
from skrub.datasets import fetch_employee_salaries

datasets = fetch_employee_salaries()
Expand Down Expand Up @@ -161,7 +161,7 @@ def periodic_spline_transformer(period, n_splines=None, degree=3):
# ^^^^^^^^^^^
#
# Now, we want to evaluate this complex model via cross-validation (with 5 folds).
# For that, we use skore's :class:`~skore.CrossValidationReport` to investigate th
# For that, we use skore's :class:`~skore.CrossValidationReport` to investigate the
# performance of our model.
from skore import CrossValidationReport

Expand All @@ -170,10 +170,10 @@ def periodic_spline_transformer(period, n_splines=None, degree=3):

# %%
# We observe that the cross-validation report detected that we have a regression task
# and provides us only a subset of the metrics and plots that make sense for our
# and provides us with some metrics and plots that make sense for our
# specific problem at hand.
#
# To accelerate any future computation (e.g. a metric), we cache once and for all the
# To accelerate any future computation (e.g. of a metric), we cache once and for all the
# predictions of our model.
# Note that we don't necessarily need to cache the predictions as the report will
# compute them on the fly (if not cached) and cache them for us.
Expand All @@ -189,7 +189,7 @@ def periodic_spline_transformer(period, n_splines=None, degree=3):

# %%
#
# To not lose the this cross-validation report, let's store it in our skore project.
# To not lose this cross-validation report, let's store it in our skore project.
project.put("Linear model report", report)

# %%
Expand All @@ -201,7 +201,7 @@ def periodic_spline_transformer(period, n_splines=None, degree=3):
# ^^^^^^^^^^^^
#
# Now that we have our first baseline model, we can try an out-of-the-box model:
# the :class:`~skrub.TableVectorizer` that makes the feature engineering for us.
# skrub's :class:`~skrub.TableVectorizer` that makes the feature engineering for us.
# To deal with the high cardinality of the categorical features, we use a
# :class:`~skrub.TextEncoder` that uses a language model and an embedding model to
# encode the categorical features.
Expand Down Expand Up @@ -243,8 +243,8 @@ def periodic_spline_transformer(period, n_splines=None, degree=3):
# Investigating the models
# ^^^^^^^^^^^^^^^^^^^^^^^^
#
# At this stage, I might not been careful and have already overwritten the report and
# model from my first attempt. Hopefully, because we stored the reports in our skore
# At this stage, we might not been careful and have already overwritten the report and
# model from our first attempt. Hopefully, because we stored the reports in our skore
# project, we can easily retrieve them. So let's retrieve the reports.
linear_model_report = project.get("Linear model report")
hgbdt_model_report = project.get("HGBDT model report")
Expand All @@ -267,7 +267,7 @@ def periodic_spline_transformer(period, n_splines=None, degree=3):
#
# In addition, if we forgot to compute a specific metric
# (e.g. :func:`~sklearn.metrics.mean_absolute_error`),
# we can easily add it to the the report, without re-training the model and even
# we can easily add it to the report, without re-training the model and even
# without re-computing the predictions since they are cached internally in the report.
# This allows us to save some potentially huge computation time.
from sklearn.metrics import mean_absolute_error
Expand Down Expand Up @@ -295,7 +295,7 @@ def periodic_spline_transformer(period, n_splines=None, degree=3):

# %%
#
# Finally, we can even get individual :class:`~skore.EstimatorReport` for each fold
# Finally, we can even get the individual :class:`~skore.EstimatorReport` for each fold
# from the cross-validation to make further analysis.
# Here, we plot the actual vs predicted values for each fold.
from itertools import zip_longest
Expand All @@ -317,7 +317,8 @@ def periodic_spline_transformer(period, n_splines=None, degree=3):
# Cleanup the project
# -------------------
#
# Let's clear the skore project (to avoid any conflict with other documentation examples).
# Let's clear the skore project (to avoid any conflict with other documentation
# examples).

# %%
project.clear()
2 changes: 1 addition & 1 deletion skore-ui/src/stores/project.ts
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ export const useProjectStore = defineStore("project", () => {
let currentItemUpdateIndex: { [key: string]: number } = {};

/**
* Return true if the the given key is in the list of displayed keys, false otherwise.
* Return true if the given key is in the list of displayed keys, false otherwise.
* @param view the view to check
* @param key the key to look for
* @returns true if the key is displayed, false otherwise
Expand Down

0 comments on commit 11910dd

Please sign in to comment.