Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type definitions for process arguments #47

Closed
GreatEmerald opened this issue Feb 23, 2018 · 12 comments
Closed

Type definitions for process arguments #47

GreatEmerald opened this issue Feb 23, 2018 · 12 comments
Labels
process discovery and profile discovery
Milestone

Comments

@GreatEmerald
Copy link
Member

At the moment the process definition is typeless, arguments only have a human-readable description and no type definition. This is a problem for usability, since in different languages there are native equivalents for spatial types; users are very likely to already have their data in the native type and would want the client to handle the native type, rather than having to decompose the native type into what the backend expects manually. As an example, in R, this is what users need to do to pass an extent to the bbox_filter process at the moment:

MyExtent = extent(MyAOI) # MyAOI is a raster or polygon that covers the area of interest
CropS2 = collection("Sentinel-2") %>% process("bbox_filter", left=MyExtent@xmin, right=MyExtent@xmax, top=MyExtent@ymax, bottom=MyExtent@ymin, srs=MyExtent@crs)
CropL8 = collection("Landsat8") %>% process("bbox_filter", left=MyExtent@xmin, right=MyExtent@xmax, top=MyExtent@ymax, bottom=MyExtent@ymin, srs=MyExtent@crs)

Whereas ideally this would just be:

CropS2 = collection("Sentinel-2") %>% process("bbox_filter", extent=MyAOI)
CropL8 = collection("Landsat8") %>% process("bbox_filter", extent=MyAOI)

But there is no way for the client to know how to map the Extent to the arguments, because they're loosely defined. We cannot just make an assumption that every process with an argument left expects the ymin coordinate of the object extent.

This is not just a problem with extents; same goes for other spatial types, such as defining points. The users have points in a SpatialPoints object and would expect to be able to pass it, rather than having to manually extract information about every single point in that object. The problem gets even bigger with polygons, that don't even have a set number of vertices.

@edzer
Copy link
Member

edzer commented Feb 23, 2018

Good point. I think in the end you'd want to have this:

CropS2 <- filter(collection = "Sentinel-2") %>% filter(bbox = extent(MyAOI))

in principle, R can take care of all the dirty work for this; I'm not entirely sure how the typing in the API would look like, but if it can be done it sounds like an idea worth exploring.

@GreatEmerald
Copy link
Member Author

Well, that would work as well, but in the R client collection() has special meaning as the start of the process chain, and process() is a generic function so that all the possible names for processes would not need to be implemented into the R client. Having a separate filter() function would even be confusing for users, since they get the list of supported processes from the backend and then choose which one to use from there (so using filter() to call a process called bbox_filter is less intuitive than using process("bbox_filter")). But that's the R client semantics, anyway.

Having types in the core API would also be useful for users listing the arguments; for instance, at the moment it's unclear what nir in the ndvi process refers to: is it a collection with a single band, is it a band number (integer), or is it a band name (string)? Or could it be either?

@edzer
Copy link
Member

edzer commented Feb 23, 2018

I see processes in the API as functions in R. Following your argumentation (and given that function is a reserved word) you would use

my_sum = Function("sum", a, b, c)

rather than

my_sum = sum(a, b, c)

That may reflect how the API works, but not what an R user would expect.

@GreatEmerald
Copy link
Member Author

Indeed, I agree. But it is not feasible to implement every single possible backend process as a function in the R client. Unless there's a way to make a function generator that works upon the list of processes in the backend and makes a function for each, but that it itself would be rather unusual (and cause issues when trying to connect to more than one backend within the same R session, not to mention possible name collisions with built-in functions).

@edzer
Copy link
Member

edzer commented Feb 23, 2018

I think that in the end the set of compulsory openEO processes may not be that large. Let's start simple, but keep close to what R users are used to.

@flahn
Copy link
Member

flahn commented Feb 24, 2018

Maybe I try to sketch the problem we are facing now and point out how the argument type might be an asset to solve this issue.

If we want to use R objects as values for process arguments, then I need to know from a technical perspective how to relate the fields of the R object to the correct corresponding process arguments or how to serialize the R object into JSON in order to provide the backend with an understandable value. As for now we don't consider this case since argument values are either numeric, character strings or one of the predefined objects (collection, process or udf). The parameter assignment in this egard is done by name matching: R function parameter "xyz" corresponds to process parameter "xyz". No type checking can be done at this point. We leave it completely to the user to enter the correct values for the arguments.

Now, the idea was to use argument type definitions together with a set of common object serializations for JSON like for an extent (or bounding box) to 1) be able to precheck process graphs on the client side, 2) be able to map common R objects against API defined counterparts and use them on the client.

@m-mohr
Copy link
Member

m-mohr commented Feb 26, 2018

So, basically there are three things we discuss here:

  1. Type definitions for the processes in the API
  2. Client function parameters and their translation into the API process parameters
  3. Naming of function names (in R)

(1) is a API related issue. I am not sure yet how we could define that properly. Any suggestion?

In my opinion, I would let (2) be handled by the clients. They should do the translation, but it's true that they need some more information (see [1]) for it.

(3) is mostly a R related issue and should be discussed in the perspective repo?! In PHP magic methods would be the solution to that. In R you probably need the process("xyz", args = ...) anyway as a fallback for non-core processes or name collisions. Of course it might be handy to have helper funtions that wrap core processes, e.g. sum(...) to process("sum", ...). Not sure whether you might want to prefix them? In other languages I'd use openeo.sum() or so...

@nuest
Copy link

nuest commented Feb 26, 2018

Re (3): Have you considered generating the functions in the R package from the API spec? See https://github.com/richfitz/stevedore/blob/master/R/swagger_args.R

@GreatEmerald
Copy link
Member Author

I agree. @flahn Can you make a new issue for (3) in the R client repository?

Back to (1): it makes me wonder whether types are necessary for every argument, or just for the special spatial types (extents, points, polygons, etc.). The latter would limit the scope, but the former would be useful by itself (to make sure users know that nir expects a number etc.).

For a proper definition, the backends could send an extra field for each process argument defining the name of the type (quite like there is a description field now). Then it would be a matter of documenting what each of the types mean (which is something that will be needed for #46 too). A composite type such as Extent could be defined as a set (struct) of left, right, top, bottom and srs, and then there would only be one argument of type Extent for e.g. the bbox_filter function.

That said, there's a question of which types are useful, since each client is going to have somewhat different native types.

@m-mohr m-mohr added processes Process definitions and descriptions process discovery and profile discovery and removed processes Process definitions and descriptions labels Mar 11, 2018
@m-mohr m-mohr added this to the v0.4 milestone Mar 26, 2018
@m-mohr
Copy link
Member

m-mohr commented Apr 27, 2018

It seems there is no language-independent function definition language: https://stackoverflow.com/questions/50059706/function-definiton-language-like-openapi

Therefore I just started to use the idea behind OpenAPI and experiment with my own function definition language: https://github.com/m-mohr/functio-spec
Let's see how that works out. ;)

@m-mohr
Copy link
Member

m-mohr commented Apr 30, 2018

An initial draft of Functio is available, which allows us to formally specify our processes.

The processes currently defined in the API have been converted to a Functio example file.

@m-mohr
Copy link
Member

m-mohr commented Jul 5, 2018

A subset of Functio has been ported to the API spec with a proper OpenAPI schema. Issue should be solved.

m-mohr added a commit that referenced this issue Jul 5, 2018
…s been ported to the API spec with a proper OpenAPI schema. This should solve issues #47, #54, ...
@m-mohr m-mohr closed this as completed Jul 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
process discovery and profile discovery
Projects
None yet
Development

No branches or pull requests

5 participants