Item visibility (private/public) #990

MarieS-WiMLDS · 2024-12-19T11:14:19Z

MarieS-WiMLDS
Dec 19, 2024
Maintainer

Two recent events made me think about the status of item, whether they should be private or public. The first one is reading the documentation (for instance https://skore.probabl.ai/0.5/generated/skore.item.cross_validation_item.CrossValidationItem.html). The second one is a comment stating that items should be public (#966 (comment)).

In my opinion, items should be private, for the following reasons:

firstly, I don't understand what an item add to other objects.
items make the API much more difficult to understand. everything interesting for the user that is inside a CrossValidationItem should be in CrossValidationReporter.

Making items private wouldn't change anything in the user workflow, except that we would need to change get_item_versions to get_versions.

Answered by MarieS-WiMLDS

Jan 9, 2025

It was decided to make items private in order to have an API easier to understand and to use. Related issue is #1045.

View full answer

thomass-dev · 2024-12-19T13:23:44Z

thomass-dev
Dec 19, 2024
Maintainer

Item was a good deal to let the user characterize its complex data, such as media.
How can he say with a good API that for a given bytes/str object, it is an image or just a string?

Furthermore, Item is not necessarily aware of the raw data, as the raw data was transform to be persisted.
Item represents a transformation, and when we can, the raw data (primitive, dataframe etc).
The distinction is IMO important for the user.

1 reply

MarieS-WiMLDS Dec 19, 2024
Maintainer Author

Item was a good deal to let the user characterize its complex data, such as media.
Do you mean that for complex items, you expect the user to write:

project.put_item("key", MediaItem.factory(my_obj_as_bytes_or_str, media_type="text/html"))

You can't expect a data scientist to write this.

tuscland · 2024-12-19T19:04:58Z

tuscland
Dec 19, 2024
Maintainer

Let's list the requirements.

Offer a storage unit for a set of controlled types, so that the values are as environment independent as possible. How persistence is made is an implementation detail. This is an important requirement.
Give access an extensible system of categorization (metadata).
The key to store an item should be optional (use case of the artifact log).

For inspiration, we can look at W&B's public API for images.

https://docs.wandb.ai/guides/track/log/media/

They have an Image type. It seems lighter than requiring that everything stored (using the public API) is an Item.

Note that an Item does not "represents a transformation" (ping @thomass-dev). It represents an artifact of your ML development process.

We designed put so that it performs the convenient conversion on a best effort basis. The underlying non-sugared function is put_item.

@MarieS-WiMLDS I believe that you are exploring opportunities to make the user experience less confusing. The source of confusion seems to be the potential asymmetry between put and get, and also that put wraps put_item.

Now that we have a clearer idea of what we would like to store, we could try to see what kind of data needs to be wrapped:

Numbers, primitive data structures: no need to wrap
Series, DataFrames: no need to wrap
Markdown text: need to wrap
Images bytes: need to wrap
Plots: we don't want to store a library dependant representation, not sure how to deal with them
Rich objects (reports): we control them, so they are de-facto supported.

All in all, I am in favor of this effort, but I would like to see examples of code.

0 replies

rouk1 · 2024-12-20T09:17:26Z

rouk1
Dec 20, 2024
Maintainer

I'll take mine on this. 🙊
I find the current API to complicated and not engaging for the user.

As a user perspective I don't care about how item persists, I don't care about about the underlying classes.

I would like user to write code like:

from sklearn import datasets
from sklearn.linear_model import Lasso
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
import skore

p = skore.open() # Each call to open could create a session that will help user track there progress.

# data
diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
p.track(X, name="X")  # tracks a dataframe numpy/polars/...
p.track(y)  # name is optional, it may fallback to the given variable name using frame inspection ?

# sk learn models
pipeline = make_pipeline(StandardScaler(), Lasso())
p.cross_validate(pipeline, X, y)
p.track(pipeline)

# plots and media
pillow_img = ...
p.track(pillow_img)  # auto media type to png thx to pillow
plot_object = ... 
p.track(plot_object) # as vector if possible, as vega spec if possible fallback to png bytes 

# primitive type
markdown = ...
p.track(markdown, name="summary", media_type="text/markdown")

# later
my_old_pilpeline = p.find_model(name="pipeline", session="azertyui") # session id is visible in the UI hence one can get back it's fitted model

Imo getting stuff back from skore is not relevant. If you get stuff back from skore, you get only the extra stuff skore gave you (cross validations plots, table insights, ...) which are already visible in the UI. Fitted models are probably a special case as it sounds like a good idea to store them securely using skops.

2 replies

thomass-dev Dec 20, 2024
Maintainer

Imo getting stuff back from skore is not relevant.

I totally agree, as long as skore is not made for reproducibility but for tracking.

MarieS-WiMLDS Dec 20, 2024
Maintainer Author

We can be willing to take items back from skore to explore them. Here are a couple of use cases: I might want to share it with someone else who has to load it. I might want to explore more of it the following day, while I closed the session and all local memory was cleaned. I might want to explore things created in a script in another script.

In this comment, some code associated to a use case: #966 (comment)

thomass-dev · 2024-12-20T09:41:33Z

thomass-dev
Dec 20, 2024
Maintainer

I'm even convinced since the start that "how to display an item" should not be programmatically defined, but in the UI by the user.

For objects whose the type can't characterize their nature, such str or bytes, i imagine something like: the str is sent as is in the UI without the user having to specify the nature, it is displayed using a default media type (Markdown?) and the user can switch between several media type via a drop-down. For the given key, the UI saves the selected media type.

4 replies

rouk1 Dec 20, 2024
Maintainer

Frontend side this is hard to achieve. I fear that this will bloat the UI quickly. Skore should at least give hints on media type for what it knows (models, dataframes, raster images and vectors). Why not for strings that may be markdown/html/whatever. Other type are imo irrelevant to display raw (dict, arrays, ...)

thomass-dev Dec 20, 2024
Maintainer

Sorry if it wasn't clear.

What i want is not removing the automatic definition of the media type (in the backend) using the type of the object.
I want to remove the ability of the user to define a media type programmatically in favor of a drop-down in the UI, when the type of the object is not enough to define this media type in the backend.

thomass-dev Dec 20, 2024
Maintainer

This has two advantages:

the user API is fluid (we don't need anymore put_item),
the user can no longer make mistakes creating an explicit item with an incompatible object. For instance, trying to create a MediaItem using a dataframe, with the media type image/svg+xml.

MarieS-WiMLDS Jan 3, 2025
Maintainer Author

From what I understand, this would be a solution to my need as a user, which is: remove the notion of item from the API, since it doesn't add anything to me. It would help to simplify the use of skore.

MarieS-WiMLDS · 2025-01-09T13:14:59Z

MarieS-WiMLDS
Jan 9, 2025
Maintainer Author

It was decided to make items private in order to have an API easier to understand and to use. Related issue is #1045.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Item visibility (private/public) #990

{{title}}

Replies: 5 comments 7 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Item visibility (private/public) #990

MarieS-WiMLDS Dec 19, 2024 Maintainer

Replies: 5 comments · 7 replies

thomass-dev Dec 19, 2024 Maintainer

MarieS-WiMLDS Dec 19, 2024 Maintainer Author

tuscland Dec 19, 2024 Maintainer

rouk1 Dec 20, 2024 Maintainer

thomass-dev Dec 20, 2024 Maintainer

MarieS-WiMLDS Dec 20, 2024 Maintainer Author

thomass-dev Dec 20, 2024 Maintainer

rouk1 Dec 20, 2024 Maintainer

thomass-dev Dec 20, 2024 Maintainer

thomass-dev Dec 20, 2024 Maintainer

MarieS-WiMLDS Jan 3, 2025 Maintainer Author

MarieS-WiMLDS Jan 9, 2025 Maintainer Author

MarieS-WiMLDS
Dec 19, 2024
Maintainer

Replies: 5 comments 7 replies

thomass-dev
Dec 19, 2024
Maintainer

MarieS-WiMLDS Dec 19, 2024
Maintainer Author

tuscland
Dec 19, 2024
Maintainer

rouk1
Dec 20, 2024
Maintainer

thomass-dev Dec 20, 2024
Maintainer

MarieS-WiMLDS Dec 20, 2024
Maintainer Author

thomass-dev
Dec 20, 2024
Maintainer

rouk1 Dec 20, 2024
Maintainer

thomass-dev Dec 20, 2024
Maintainer

thomass-dev Dec 20, 2024
Maintainer

MarieS-WiMLDS Jan 3, 2025
Maintainer Author

MarieS-WiMLDS
Jan 9, 2025
Maintainer Author