Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it possible to insert several items at once #173

Closed
augustebaum opened this issue Aug 13, 2024 · 12 comments · Fixed by #363
Closed

Make it possible to insert several items at once #173

augustebaum opened this issue Aug 13, 2024 · 12 comments · Fixed by #363
Assignees
Labels
enhancement New feature or request

Comments

@augustebaum
Copy link
Contributor

Something like

Store.insert({"a": 3, "b": "hi"})

which would have the same effect as

Store.insert("a", 3)
Store.insert("b", "hi")

This is because it is not uncommon in DS tasks to save several metrics at once, and to keep them all in a big dictionary. This dict syntax would make it frictionless to start using mandr without riddling one's code with Store.insert statements.

However, I wonder what should happen when inserting a nested dict, e.g.

Store.insert({"a": {"b": "hi"}})
@augustebaum augustebaum added the question Further information is requested label Aug 13, 2024
@augustebaum augustebaum removed the question Further information is requested label Aug 28, 2024
@tuscland
Copy link
Member

@MarieS-WiMLDS can you tell us if we want to do this? (using updated project-wise syntax.)

@MarieS-WiMLDS
Copy link
Contributor

I wrote this a couple of minutes ago, so I think yes, we want to be able to have a dict instead of copy/pasting all these lines.
Capture d’écran du 2024-09-16 18-05-55

With Store.insert({"a": {"b": "hi"}}), I expect the value of the key a to be a dict, nothing terribly fancy I guess?

@tuscland
Copy link
Member

Sounds good, and in line with initial expectations. Just note that since the API has changed yesterday, we could write something like:

project.put({"a": {"b": "hi"}})

... to iteratively call put on each key.

@tuscland tuscland added the enhancement New feature or request label Sep 17, 2024
@augustebaum augustebaum changed the title It could be nice to insert several items at once Make it possible to insert several items at once Sep 18, 2024
@thomass-dev
Copy link
Collaborator

thomass-dev commented Sep 18, 2024

@tuscland i suggest using *args instead of dict :

class Project:
    def put(self, *key_value_pair: tuple[str, objet]): ...

# [...]
project.put(("key", "value"))
project.put(("key, "value"), ("key", "value"), ("key", "value"))

(untested snippet)

@tuscland
Copy link
Member

Users want to insert a dict because they build it progressively.

@thomass-dev
Copy link
Collaborator

thomass-dev commented Sep 19, 2024

You mean, user will build a huge dict and put this dict in skore at the end of its script ?
Something like that :

myproject = {
    "key1": "value1",
    "key2": "value2",
    "key3": "value3",
}
[...]
myproject["key4"] = "value4"
[...]
myproject["key5"] = "value5"

skore_project.put(myproject)

If so, its a bit weird in my opinion to not use directly skore.project.put progressively:

skore_project.put(("key1", "value1"), ("key2",  "value2"), ("key3", "value3"))
[...]
skore_project.put(("key4", "value4"))
[...]
skore_project.put(("key5", "value5"))

@tuscland
Copy link
Member

It makes sense because then you have only one call to skore, and some people prefer that. The syntax based on tuples is not user friendly.

Also, I am not a fan of the *args based approach because it may prevent us from properly binding keyword arguments, if we would like to expand the capabilities of put.

@augustebaum augustebaum linked a pull request Sep 19, 2024 that will close this issue
@augustebaum
Copy link
Contributor Author

Okay, after a first iteration (#363) we found that the expected behaviour is unclear in case of errors: if one of the values cannot be saved to the database, what should we do?

A more concrete example:

project.put({
  "df_train": df_train,
  "df_test": (lambda: "unsupported value"),
  "y_train": (lambda: "unsupported value"),
  "y_test": y_test,
})

Here the value of y_test is a function, which we shouldn't be expected to properly save to the database.
In the current single-item syntax,

project.put("y_test", (lambda: "unsupported value"))

raises a NotImplementedError.

So, in the dict example, should we:

  1. Prevent any saving from happening, i.e. none of the keys are actually saved, and we raise an error;
  2. Save everything until the error, i.e. "df_train" is saved, but after that an error is raised, and "y_test" is not saved;
  3. Save everything that doesn't trigger an error, i.e. "df_train" and "y_test" are saved, and after that an error is raised that contains a list of all the errors that occured;
  4. Save everything that doesn't trigger an error, i.e. "df_train" and "y_test" are saved, and after that a warning is printed that contains a list of all the errors that occured; unlike raising an error, this doesn't stop execution of the program.

@augustebaum augustebaum self-assigned this Sep 20, 2024
@MarieS-WiMLDS
Copy link
Contributor

options 2 and 3 are absolute no go for me, because they may stop a training or calibration that takes time.
I prefer 4 over 1 because at least we save what we can.

@thomass-dev
Copy link
Collaborator

thomass-dev commented Sep 23, 2024

3 and 4 are the same workflow with a different output: warning versus exception.
We should add a parameter to control this, such as "on_error=<warning|raise>".

@tuscland
Copy link
Member

In general, it is important to give users the confidence that using skore will not break their workflow, i.e. we follow Postel's law: be liberal in what you accept, and conservative in what you provide.

Should we create an issue to verify this aspect is well implemented overall? Also, it should be documented.

@augustebaum
Copy link
Contributor Author

Should we create an issue to verify this aspect is well implemented overall? Also, it should be documented.

I'm not sure how one would check this, but feel free

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants