-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: As a user, i want to retrieve a CrossValidationReporter
from a project
#1045
Comments
It's not a bug (as you said), it's a question of design. In the current:
Currently and in state, after you have put your object in a project, you have zero guarantee that you can programmatically retrieve your original data (for example, you can't retrieve your I know it's an opened and actual question, we should think about it. |
CrossValidationReporter
from a project
I think it is related with #949 With my user perspective, I would expect to write something like: # %%
import skore
my_project = skore.create("my_project", working_dir="/tmp")
# %%
from sklearn import datasets, linear_model
X, y = datasets.load_diabetes(return_X_y=True)
lasso = linear_model.Lasso()
# %%
reporter = my_project.CrossValidationReporter(lasso, X, y, cv=3) What the reasoning behind this code as a user:
|
@glemaitre if you expect side-effects when you call |
For me, I'm comfortable having the symmetry: skore.CrossValidationReporter(...)
my_project.CrossValidationReporter(...) with the exact same API. I can easily understand that when using the project, everything should get in there magically. |
RationaleWe should, as much as possible, have a symmetry between put and get. In particular for the objects we create ourselves. key takeaways
ImplementationSupported types by skore
TODO
|
This is for several reasons: - it is not as explicit as several simple `put` calls: the mechanic is ambiguous with regards to atomicity (if a key-value pair is invalid, does it make the whole operation fail?) - the mechanic makes it complicated to add options to `put`, e.g. the `note` option proposed in #1041 or the `display_as` option proposed in #1045 (comment).
This is for several reasons: - it is not as explicit as several simple `put` calls: the mechanic is ambiguous with regards to atomicity (if a key-value pair is invalid, does it make the whole operation fail?) - the mechanic makes it complicated to add options to `put`, e.g. the `note` option proposed in #1041 or the `display_as` option proposed in #1045 (comment).
This is for several reasons: - it is not as explicit as several simple `put` calls: the mechanic is ambiguous with regards to atomicity (if a key-value pair is invalid, does it make the whole operation fail?) - the mechanic makes it complicated to add options to `put`, e.g. the `note` option proposed in #1041 or the `display_as` option proposed in #1045 (comment).
This is for several reasons: - it is not as explicit as several simple `put` calls: the mechanic is ambiguous with regards to atomicity (if a key-value pair is invalid, does it make the whole operation fail?) - the mechanic makes it complicated to add options to `put`, e.g. the `note` option proposed in #1041 or the `display_as` option proposed in #1045 (comment).
To track the evolution of an object programmatically, do you need a list with all the versions of an item and their metadata (date), or a sorted list of values by date is sufficient? In other words: # A
versions = ['a', 'b', 'c', 'd']
# B
versions = [('a', '2020-01-01'), ('b', '2020-01-02'), ('c', '2020-01-03'), ('d', '2020-01-04')] (i'm modifying the |
Could we structure this a bit with a dictionary or an object so it is future-proof? There is more metadata one will want to access for a particular item version. |
Agree with @tuscland, we might want to add things such as how and by whom the object was created at some point for team cooperation. |
@sylvaincom the backend part is already merged. So you want a dict with
|
Maybe also add the version number? So that it's more easily accessible In the future, as @MarieS-WiMLDS pointed out, we will also add the user name |
You talk about collaboration here, we are far from it ^^. |
I propose a new API: def get(self, key: str, *, latest=True, metadata=False): ...
What do you think? It is okay or you want to keep |
I'm not comfortable with the option |
Getting a specific version is only |
I would rather a |
Regarding versioning, I'll take an experience that I have with So I think that having a utility as proposed by @augustebaum that can allow me quickly to know what are the metadata (supposing that the metadata is the entry point allowing me to filter what I need) and the corresponding version to know what I need to pick would be super useful, without the need to get the item themself. I could even imagine some filtering feature allowing to get a subset of the versions. Also it means that it simplifies the "job" what get is supposed to do: return an item. Potentially, we can imagine to always return a list (with a single object) and allow for stuff like # items is always a sequence (or maybe a dict to be able to have a richer indexing)
items = get("key", version="latest")
items = get(version=get_versions("key", filter="5 <= version <= 10")
items = get(version=get_versions("key", filter"version > 10 and date < @target_date") The filter syntax with this text would be the one use by |
Isn't it weird to query for metadata using an API called What about:
|
I think that I would be happier to assess only the info without to load potentially the items. I take the parallel of the conda/pip way to request metadata:
|
Is your feature request related to a problem? Please describe.
(I know it's not really a bug, but I wanted to highlight how weird it is for the user).
Screen 1:
Screen 2:
Screen 3:
This is highly inconsistant, and weird to the user. How do I know how to manipulate each of these?
Describe the solution you'd like
The solution to me is two folds:
get
andget_item
should render a CrossValidationReporter. If I close my notebook, I want to be able to get my cv_reporter, and still use the interesting features such ascv_rep.plots.scores
.skore.prettify(reporter)
Describe alternatives you've considered, if relevant
Remove completely the get_item function. It's in this discussion: #990. Actually, I still think we should do it :).
Additional context
The solution proposed would also have the advantage to not loose information, while today put/get does.
The text was updated successfully, but these errors were encountered: