-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Processes: pre-defined + user-defined + sharing: namespacing/prefixing? #305
Comments
I think that was discussed mainly in calls. Result was that some back-ends want prefixes for UDP and some not. Thus, there's now this compromise in the docs:
... in its scope (pre-defined/user-defined) and from the user-perspective at the time of submission.
The issue is that if you don't prefix all, it can be problematic to add new processes as back-end as all good names might be taken by users. Thus to not break existing process graphs, user processes are preferred over back-end processes if this case happens.
The API suggests
Indeed, at some point, it might be interesting to use URLs as process_id in process graphs. But there's no sharing yet, so the API is consistent. We can reconsider sharing and URLs as process IDs in a future version for sure. |
Indeed, but that breaks reproducibility across backends. Unless there would be a mechanism that the user, when storing a UDP, provides a suggested process_id and the backend returns an updated "collision-safe" process_id.
I think this also breaks reproducibility in a way. First, I assume that it is ok to update a conflicting UDP as long as it was created before the PDP (the API is not explicit about this I think). This feels a bit weird to me: updating is fine, but delete+create is not.
That's indeed the problem with using prefixes in the process_id naming, because it's a "commitment for life". It could be more flexible with something like optional namespaces: for example, as long as there is no conflict you can use a normal process_id without a namespace, e.g.
As noted above: it's just a suggestion and backends are still free to follow/enforce that, which breaks reproducability across backends. The optional namespacing I suggested above is a bit more flexible, but it does not fit the current Background story: we are starting with a use case where we already would like to do some kind of public UDP sharing (also see #85 (comment) ) and we'd like to have a solution soon that is as close as possible to some kind consensus that would end up in a later openEO API version. |
There's two issues here: (1)the potential conflict of names and (2) how we reference shared processes. For (2) I'd suggest to simply use URL's instead of process ids in the future. That means we still use \w+ for new processes stored at a back-end, but in a process graph you also allow URIs in For (1) I really hesitate to define a fixed prefix, especially as we can't really move away from \w+. What I could imagine is that clients add a
When suggesting
That doesn't really solve the problem. What's happening with old process graphs? |
I like this idea. Instead of boolean "user" field, you could also call it for example "namespace" (or "library", "package", "source", "origin", ...) and handle sharing (or even packaging) as well. For example (probably going over the top here):
There are of course a lot of details to get right regarding sharing (security and privacy-wise), which is out of scope here |
Should we bring in the distinction between pre-defined (default) and user-defined in for 1.0? In that case we need a PR and approvals ASAP (this week). Otherwise the next possibility for adding it would be openEO API 2.0. |
I think it would make the API cleaner, especially regarding the somewhat adhocy
and it easily allows backends to experiment with sharing approaches for usecases that might already need it. |
I'll issue a PR and then we need to find consensus. |
Indeed, some of these issues should preferably be solved in 1.0. We're probably one of the first backends trying this out anyway? |
Yes, I was working on this with GEE also in 0.4, but dropped it in 1.0. I think the namespaces are easy to add to at least distinguish user and pre-defined processes. Should also be relatively easy for clients to set correctly. On the other hand, I'm concerned that the URI change could be to much for 1.0. For URIs you'd actually not even need a namespace. You could simply use the URI as process_id, but then all back-ends would need to implement it otherwise it would not be clear whether a back-end supports it or not. If you put a URL in the namespace, then the process_id is pretty much redundant. For now, I'd suggest to just use the two namespaces and leave out URIs. What clients could do is to allow passing URIs, then they download it automatically and store it at /process_graphs and put the corresponding id and namespace in the process graph. If we's have chosen the URI approach in process graphs directly, the back-ends should store URI-based processes anyway during submission as just loading them during execution sounds a bit less reliable and prone to attacks when changed later on (see npm issues with deleted packages, for example). |
Please review #309 |
Not necessarily: the namespace URL could be a "folder" and process_id still a traditional |
Yes, would work, but I'd like to see client support/helpers on this. Because splitting URIs is nothing a typical user wants to do. Also, it assumes things are called [...].json, which may not always be the case. So definitely something to think about more in the future. |
possible next step (but not necessarily to include for 1.0 final):
|
That's definitely something that would only be available in openEO API 2.0 and should live in it's own issue as this is going to be closed for 1.0.0. |
This has been merged. @soxofaan For the other ideas open a new issue, please. |
I just realized again why we haven't had namespaces ( #309 ) in the API: User could simply replace missing pre-defined processes (e.g. normalized_difference) with our "alternative" processing instructions in openeo-processes. Just store it as user-defined process and it still runs. That's now broken with namespaces. Thus, I changed the default behavior again in #313. The namespaces for explicit choosing stay as is. |
…pace Add "namespace" placeholder to ProcessUnsupported error message (#305)
(I remember discussions about this but couldn't find in-depth github ticket about this, feel free to close as duplicate)
At VITO, we're working on support for user-defined processes and basic sharing, which raises some concerns about naming collisions, not only between pre-defined and user-defined processes, but also across users when some kind of sharing mechanism is in place.
Some data points:
From User-defined processes #256:
The description of the
process_id
component:openeo-api/openapi.yaml
Lines 4661 to 4663 in 1247172
I think there is a contradiction/inconsistency here:
process_id
MUST be unique .... but if backend adds same name, it's ok to be not unique anymore (can user still update their UDP afterwards?). I think this is an indication that it "needs more work".Process ids currently must follow this regex:
openeo-api/openapi.yaml
Line 4667 in 1247172
meaning Latin characters, digits and underscore. Which means there is very little wiggle room to implement some kind of robust namespace/prefix scheme. Also somewhat relevant here, especially when sharing across users comes in the picture: in context of OpenID Connect, the user id returned by the OIDC provider might not fit
\w+
(dashes, dots, ...)I don't have a concrete suggestions at the moment, but just wanted to start off this discussion
The text was updated successfully, but these errors were encountered: