feat: As a user, i want to retrieve a `CrossValidationReporter` from a project #1045

MarieS-WiMLDS · 2025-01-03T15:48:47Z

Is your feature request related to a problem? Please describe.

(I know it's not really a bug, but I wanted to highlight how weird it is for the user).

# %%
from skore import CrossValidationReporter

# %% 
rep = CrossValidationReporter(clf, X, y)
project.put("rep_cv", rep)
rep # outputs screenshot 1

# %%
rep_get = project.get("rep_cv")
rep_get # outputs screenshot 2

# %% 
rep_get_item = project.get_item("rep_cv")
rep_get_item # outputs screenshot 3

Screen 1:

Screen 2:

Screen 3:

This is highly inconsistant, and weird to the user. How do I know how to manipulate each of these?

Describe the solution you'd like

The solution to me is two folds:

get and get_item should render a CrossValidationReporter. If I close my notebook, I want to be able to get my cv_reporter, and still use the interesting features such as cv_rep.plots.scores.
having the display in the notebook is really nice! Let's keep it, but with a different function. It could be something like skore.prettify(reporter)

Describe alternatives you've considered, if relevant

Remove completely the get_item function. It's in this discussion: #990. Actually, I still think we should do it :).

Additional context

The solution proposed would also have the advantage to not loose information, while today put/get does.

The text was updated successfully, but these errors were encountered:

thomass-dev · 2025-01-03T16:37:57Z

It's not a bug (as you said), it's a question of design.

In the current:

get returns the original data when it is possible, or a transformed data close to it,
get_item returns the Item, i.e. the object that contains your (transformed) data that can be persisted.

Currently and in state, after you have put your object in a project, you have zero guarantee that you can programmatically retrieve your original data (for example, you can't retrieve your CrossValidationReporter).

I know it's an opened and actual question, we should think about it.
I removed the bug label and changed the title, because it's not a bug and involves more thinking.

glemaitre · 2025-01-03T23:29:48Z

I think it is related with #949

With my user perspective, I would expect to write something like:

# %%
import skore
my_project = skore.create("my_project", working_dir="/tmp")

# %%
from sklearn import datasets, linear_model
X, y = datasets.load_diabetes(return_X_y=True)
lasso = linear_model.Lasso()

# %%
reporter = my_project.CrossValidationReporter(lasso, X, y, cv=3)

What the reasoning behind this code as a user:

I don't need to use put or get
I automatically get the output of the reporter
I expect the UI to have register (or put) the item reporter in the project for me
I expect the activity feed to show me progress
I expect the UI to allow to play with the item

tuscland · 2025-01-07T08:18:55Z

@glemaitre if you expect side-effects when you call my_project.CrossValidationReporter, should it be a verb instead? It looks like a factory.

glemaitre · 2025-01-07T08:35:24Z

For me, I'm comfortable having the symmetry:

skore.CrossValidationReporter(...)
my_project.CrossValidationReporter(...)

with the exact same API. I can easily understand that when using the project, everything should get in there magically.

MarieS-WiMLDS · 2025-01-07T14:12:58Z

Rationale

We should, as much as possible, have a symmetry between put and get. In particular for the objects we create ourselves.
However, we can't have symmetry for everything. For the other things, we use pickle.
We will improve on the way if we can to make extend the supported types.
It's still unclear if we really have to store everything: what we can't track & analyze, is there a purpose for us to store?

key takeaways

Officially supported types for optimized storage and presentation
What you put is what you get (WYPIWYG)
Adopt Arrow for DataFrame and Series format
String can be hinted with display_as for alternative displays
Items are now private

Implementation

Supported types by skore

What you put is what you get
If the object is not a supported type, save as pickle

Pillow image
- internally saved as raster image
- same for display
Plotly
- internally saved as JSON
- same for display
Matplotlib
- internally saved as pickle
- displayed as SVG
JSON
- internally saved as JSON
- supports via a strict mapping to Python
Case for strings:
- Accept a display_as kwarg to modulate how the string is processed at display time
  display_as="html"
  display_as="markdown"
  display_as="svg"
DataFrames and Series
- internally saved as arrow
- Supported implementations: Pandas, Polars
Numpy Arrays
- internally saved as npy allow_picke=false
Scikit-learn "compatible" estimator
- internally saved as skops
- displayed a HTML
Skore types
- a composition of supported types
Not above
- fallback to pickle, and raise a warning
- display as str(object)

TODO

change the way data are stored
make items API private
make views API private

This is for several reasons: - it is not as explicit as several simple `put` calls: the mechanic is ambiguous with regards to atomicity (if a key-value pair is invalid, does it make the whole operation fail?) - the mechanic makes it complicated to add options to `put`, e.g. the `note` option proposed in #1041 or the `display_as` option proposed in #1045 (comment).

thomass-dev · 2025-01-15T09:40:47Z

@marie @Sylvain

To track the evolution of an object programmatically, do you need a list with all the versions of an item and their metadata (date), or a sorted list of values by date is sufficient?

In other words:

# A
versions = ['a', 'b', 'c', 'd']
# B
versions = [('a', '2020-01-01'), ('b', '2020-01-02'), ('c', '2020-01-03'), ('d', '2020-01-04')]

(i'm modifying the get_item_versions function to hide the Item class).

tuscland · 2025-01-15T09:51:09Z

Could we structure this a bit with a dictionary or an object so it is future-proof? There is more metadata one will want to access for a particular item version.

MarieS-WiMLDS · 2025-01-15T09:54:21Z

Agree with @tuscland, we might want to add things such as how and by whom the object was created at some point for team cooperation.

sylvaincom · 2025-01-15T09:54:38Z

Agreed with @tuscland: soon there will be text comments for each version of an item also, see #1041

thomass-dev · 2025-01-15T10:00:48Z

@sylvaincom the backend part is already merged. So you want a dict with

{
    "value": ...,
    "date": ...,
    "note": ...,
}

sylvaincom · 2025-01-15T10:04:42Z

Maybe also add the version number? So that it's more easily accessible

In the future, as @MarieS-WiMLDS pointed out, we will also add the user name

thomass-dev · 2025-01-15T10:09:57Z

In the future, as @MarieS-WiMLDS pointed out, we will also add the user name

You talk about collaboration here, we are far from it ^^.

thomass-dev · 2025-01-15T10:14:46Z

I propose a new API:

def get(self, key: str, *, latest=True, metadata=False): ...

latest is used to return the latest version or all the history.
metadata is used to return the metadata in addition of the value.

get("<key>") -> "<object2>"
get("<key>", latest=False) -> ["<object1>", "<object2>"]
get("<key>", latest=False, metadata=True) -> [{"object': "<object1>", "date": "<date>", "note": "<note>"}, ...]
get("<key>", metadata=True) -> {"object': "<object2>", "date": "<date>", "note": "<note>"}

What do you think? It is okay or you want to keep get_item_versions ?

MarieS-WiMLDS · 2025-01-15T11:06:14Z

I'm not comfortable with the option latest, it gives only two possibilities to the user. What about version that could take latest as input, all, or really a version number?

thomass-dev · 2025-01-15T11:10:58Z

Getting a specific version is only project.get("<key>", latest=False)[the_version_number_i_want] no?
What i dislike with your proposition, is that you assume that the user knows the number of versions in advance.

augustebaum · 2025-01-15T11:12:55Z

I would rather a get_versions that returns all versions and all metadata, than adding a bunch of optional arguments to get

glemaitre · 2025-01-15T11:35:03Z

version="latest" when it comes to get the latest version.

Regarding versioning, I'll take an experience that I have with OpenML that version datasets: you have a version parameter that take an int. However, by setting it you can be quite in the dark to know what are you going to get.

So I think that having a utility as proposed by @augustebaum that can allow me quickly to know what are the metadata (supposing that the metadata is the entry point allowing me to filter what I need) and the corresponding version to know what I need to pick would be super useful, without the need to get the item themself. I could even imagine some filtering feature allowing to get a subset of the versions.

Also it means that it simplifies the "job" what get is supposed to do: return an item. Potentially, we can imagine to always return a list (with a single object) and allow for stuff like

# items is always a sequence (or maybe a dict to be able to have a richer indexing)
items = get("key", version="latest")
items = get(version=get_versions("key", filter="5 <= version <= 10")
items = get(version=get_versions("key", filter"version > 10 and date < @target_date")

The filter syntax with this text would be the one use by df.query since I could easily think that the output of get_version could have a sort of table representation.

tuscland · 2025-01-15T13:36:01Z

Isn't it weird to query for metadata using an API called get_versions?

What about:

# When `version` is specified, `get` returns a sequence of dictionaries
items = get("key", version=42)
items = get("key", version="latest") # "latest" is a filter specification like the other
items = get("key", version="5 <= version <= 10") # can be implemented later
items = get("key", version="version > 10 and date < @target_date")

glemaitre · 2025-01-15T13:41:54Z

I think that I would be happier to assess only the info without to load potentially the items. I take the parallel of the conda/pip way to request metadata: pip you need(ed) to get the entire artefacts and then the get the metadata while conda you can already get only the metadata. At the end, it is a huge benefit if I'm not interested in the item.

get_versions is potentially a weird name. get_info(version=..., metadata=True/False) returning dataclasses might be my next thought :). get would have a parameter that take the output of get_info or potentially expose the parameter of get_info as a shortcuts (it can become overwhelming but I don't have the full understanding).

MarieS-WiMLDS added the bug label Jan 3, 2025

thomass-dev removed the bug label Jan 3, 2025

thomass-dev changed the title ~~bug: inconsistance~~ feat: As a user, i want to retrieve a CrossValidationReporter from a project Jan 3, 2025

thomass-dev added the enhancement label Jan 3, 2025

thomass-dev assigned thomass-dev, augustebaum and MarieS-WiMLDS Jan 7, 2025

thomass-dev linked a pull request Jan 7, 2025 that will close this issue

feat: Modify the user API of skore to respect "what you put is what you get" principle #1052

Merged

24 tasks

thomass-dev mentioned this issue Jan 7, 2025

feat: Modify the user API of skore to respect "what you put is what you get" principle #1052

Merged

24 tasks

tuscland mentioned this issue Jan 7, 2025

The default behaviour of put is not sane enough for unknown object types #822

Closed

tuscland added this to the Skore 0.7 milestone Jan 7, 2025

glemaitre mentioned this issue Jan 8, 2025

chore: Hide view functions from project #1013

Closed

augustebaum mentioned this issue Jan 14, 2025

feat(project)!: Remove "put-several" mechanic #1100

Merged

tuscland modified the milestones: Skore 0.6, Test Jan 15, 2025

thomass-dev mentioned this issue Jan 20, 2025

feat: Change the way NumPy array are persisted #1159

Open

auguste-probabl unassigned augustebaum Jan 20, 2025

thomass-dev closed this as completed in 5cb9468 Jan 20, 2025

thomass-dev closed this as completed in #1052 Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: As a user, i want to retrieve a `CrossValidationReporter` from a project #1045

feat: As a user, i want to retrieve a `CrossValidationReporter` from a project #1045

MarieS-WiMLDS commented Jan 3, 2025 •

edited

Loading

thomass-dev commented Jan 3, 2025 •

edited

Loading

glemaitre commented Jan 3, 2025

tuscland commented Jan 7, 2025

glemaitre commented Jan 7, 2025

MarieS-WiMLDS commented Jan 7, 2025 •

edited by thomass-dev

Loading

thomass-dev commented Jan 15, 2025 •

edited

Loading

tuscland commented Jan 15, 2025

MarieS-WiMLDS commented Jan 15, 2025

sylvaincom commented Jan 15, 2025

thomass-dev commented Jan 15, 2025

sylvaincom commented Jan 15, 2025

thomass-dev commented Jan 15, 2025

thomass-dev commented Jan 15, 2025 •

edited

Loading

MarieS-WiMLDS commented Jan 15, 2025 •

edited

Loading

thomass-dev commented Jan 15, 2025 •

edited

Loading

augustebaum commented Jan 15, 2025 •

edited

Loading

glemaitre commented Jan 15, 2025 •

edited

Loading

tuscland commented Jan 15, 2025

glemaitre commented Jan 15, 2025

feat: As a user, i want to retrieve a CrossValidationReporter from a project #1045

feat: As a user, i want to retrieve a CrossValidationReporter from a project #1045

Comments

MarieS-WiMLDS commented Jan 3, 2025 • edited Loading

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered, if relevant

Additional context

thomass-dev commented Jan 3, 2025 • edited Loading

glemaitre commented Jan 3, 2025

tuscland commented Jan 7, 2025

glemaitre commented Jan 7, 2025

MarieS-WiMLDS commented Jan 7, 2025 • edited by thomass-dev Loading

Rationale

key takeaways

Implementation

TODO

thomass-dev commented Jan 15, 2025 • edited Loading

tuscland commented Jan 15, 2025

MarieS-WiMLDS commented Jan 15, 2025

sylvaincom commented Jan 15, 2025

thomass-dev commented Jan 15, 2025

sylvaincom commented Jan 15, 2025

thomass-dev commented Jan 15, 2025

thomass-dev commented Jan 15, 2025 • edited Loading

MarieS-WiMLDS commented Jan 15, 2025 • edited Loading

thomass-dev commented Jan 15, 2025 • edited Loading

augustebaum commented Jan 15, 2025 • edited Loading

glemaitre commented Jan 15, 2025 • edited Loading

tuscland commented Jan 15, 2025

glemaitre commented Jan 15, 2025

feat: As a user, i want to retrieve a `CrossValidationReporter` from a project #1045

feat: As a user, i want to retrieve a `CrossValidationReporter` from a project #1045

MarieS-WiMLDS commented Jan 3, 2025 •

edited

Loading

thomass-dev commented Jan 3, 2025 •

edited

Loading

MarieS-WiMLDS commented Jan 7, 2025 •

edited by thomass-dev

Loading

thomass-dev commented Jan 15, 2025 •

edited

Loading

thomass-dev commented Jan 15, 2025 •

edited

Loading

MarieS-WiMLDS commented Jan 15, 2025 •

edited

Loading

thomass-dev commented Jan 15, 2025 •

edited

Loading

augustebaum commented Jan 15, 2025 •

edited

Loading

glemaitre commented Jan 15, 2025 •

edited

Loading