Skip to content

Commit

Permalink
Draft for compound extension types and variations
Browse files Browse the repository at this point in the history
  • Loading branch information
jvanstraten committed May 12, 2022
1 parent 720dbea commit 9f23575
Show file tree
Hide file tree
Showing 4 changed files with 185 additions and 9 deletions.
25 changes: 25 additions & 0 deletions proto/substrait/extensions/extensions.proto
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ syntax = "proto3";
package substrait.extensions;

import "google/protobuf/any.proto";
import "google/protobuf/empty.proto";
import "substrait/type.proto";

option csharp_namespace = "Substrait.Protobuf";
option go_package = "github.com/substrait-io/substrait-go/proto/extensions";
Expand Down Expand Up @@ -40,6 +42,9 @@ message SimpleExtensionDeclaration {

// the name of the type in the defined extension YAML.
string name = 3;

// Parameterization, if this is a compound extension type.
repeated TypeParameter parameters = 4;
}

message ExtensionTypeVariation {
Expand All @@ -52,6 +57,9 @@ message SimpleExtensionDeclaration {

// the name of the type in the defined extension YAML.
string name = 3;

// Parameterization, if this is a compound type variation.
repeated TypeParameter parameters = 4;
}

message ExtensionFunction {
Expand All @@ -67,6 +75,23 @@ message SimpleExtensionDeclaration {
// more than one impl per name in the YAML.
string name = 3;
}

message TypeParameter {
oneof parameter {
// Explicitly null/unspecified parameter, to select the default value (if
// any).
google.protobuf.Empty null = 1;

// Data type parameters, like the i32 in LIST<i32>.
Type data_type = 2;

// Value parameters, like the 10 in VARCHAR<10>.
bool boolean = 3;
int64 integer = 4;
string enum = 5;
string string = 6;
}
}
}

// A generic object that can be used to embed additional extension information
Expand Down
1 change: 1 addition & 0 deletions site/docs/types/type_variations.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ Since Substrait is designed to work in both logical and physical contexts, there
| Name | The name used to reference this type. Should be unique within type variations for this parent type within a simple extension. |
| Description | A human description of the purpose of this type variation. |
| Function Behavior | **INHERITS** or **SEPARATE**: Whether this variation supports functions using the canonical variation or whether functions should be resolved independently. For example if one has the function `add(i8,i8)` defined and then defines an `i8` variation, can the `i8` variation field be bound to the base `add` operation (inherits) or does a specialized version of `add` need to be defined specifically for this type variation (separate). Defaults to inherits. |
| Parameterization | Type variations can be parameterized. For example, an implementation may support storing `timestamp_tz` using any timezone, in which case it might not be convenient to create a variation for every possible timezone. Parameterizations for type variations work the same as parameterizations for [compound user-defined types](user_defined_types.md#parameterization). |
95 changes: 88 additions & 7 deletions site/docs/types/user_defined_types.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,95 @@
# User-Defined Types

User Defined Types can be created using a combination of pre-defined simple and compound types. User-defined types are defined as part of [simple extensions](../extensions/index.md#simple-extensions). An extension can declare an arbitrary number of user defined extension types. Initially, user defined types must be simple types (although they can be constructed of a number of inner compound and simple types).
User Defined Types can be created using a combination of pre-defined simple and compound types. User-defined types are defined as part of [simple extensions](../extensions/index.md#simple-extensions). An extension can declare an arbitrary number of user defined extension types. Once a type has been declared, it can be used in function declarations.

A YAML example of an extension type is below:
## Structure

User defined types may be opaque, or may be specified as a structure of preexisting types. A simple example of the latter is:

```yaml
name: point
structure:
longitude: i32
latitude: i32
```
This declares a new type (namespaced to the associated YAML file) called "point". This type is composed of two `i32` values named longitude and latitude.

[TBD: should field references be allowed to dereference the components of a user defined type?]

## Parameterization

User-defined types may be parameterized, in the same way in which the built-in compound types are parameterizable. The supported "meta-types" for parameters are data types, booleans, integers, enumerations, and strings. Using parameters, we could redefine "point" with different types of coordinates. For example:

```yaml
name: point
parameters:
- name: T
description: |
The type used for the longitude and latitude
components of the point.
type: type
```

or:

```yaml
name: point
parameters:
- name: coordinate_type
type: enum
options:
- integer
- double
```

or:

```yaml
name: point
parameters:
- name: LONG
type: type
- name: LAT
type: type
```

We can't specify the internal structure in this case, because there is currently no support for derived types in the structure.

The allowed range can be limited for integer parameters. For example:

```yaml
name: vector
parameters:
- name: T
type: type
- name: dimensions
type: integer
min: 2
max: 3
```

This specifies a vector that can be either 2- or 3-dimensional.

Similar to function arguments, the last parameter may be specified to be variadic, allowing it to be specified one or more times instead of only once. For example:

```yaml
name: union
parameters:
- name: T
type: type
variadic: true
```

This defines a type that can be parameterized with one or more other data types, for example `union<i32, i64>` but also `union<bool>`. Zero or more is also possible, by making the last argument optional:

```yaml
name: point
structure:
longitude: i32
latitude: i32
name: tuple
parameters:
- name: T
type: type
optional: true
variadic: true
```

This declares a new type (namespaced to the associated YAML file) called "point". This type is composed of two `i32` values named longitude and latitude. Once a type has been declared, it can be used in function declarations. [TBD: should field references be allowed to dereference the components of a user defined type?]
This would also allow for `tuple<>`, to define a zero-tuple.
73 changes: 71 additions & 2 deletions text/simple_extensions_schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ properties:
type: object
additionalProperties:
$ref: "#/$defs/type"
parameters: # parameter list for compound types
$ref: "#/$defs/type_param_defs"
variadic: # when set, last parameter may be specified one or more times
type: boolean
type_variations:
type: array
minItems: 1
Expand All @@ -25,14 +29,18 @@ properties:
required: [parent, name]
properties:
parent:
type: string
$ref: "#/$defs/type"
name:
type: string
description:
type: string
functions:
type: string
enum: [INHERITS, SEPARATE]
parameters: # parameter list for compound type variations
$ref: "#/$defs/type_param_defs"
variadic: # when set, last parameter may be specified one or more times
type: boolean
scalar_functions:
type: array
items:
Expand All @@ -45,8 +53,69 @@ properties:
$defs:
type:
oneOf:
- type: string
- type: string # shorthand form for when only name is needed
- type: object
properties:
name: # name: a Substrait type name, or name of a type previously defined in this extension
type: string
nullable: # set to true to make the type nullable
type: boolean
variation: # type variation, if any
$ref: "#/$defs/variation"
parameters: # parameters for compound types
$ref: "#/$defs/type_param_values"
variation:
oneOf:
- type: string # shorthand form for when only name is needed
- type: object
properties:
name: # name of a type variation previously defined in this extension
type: string
parameters: # parameters for compound type variations
$ref: "#/$defs/type_param_values"
type_param_defs: # an array of compound type (variation) parameter definitions
type: array
items:
type: object
required: [type]
properties:
name: # name of the parameter (for documentation only)
type: string
description: # description (for documentation only)
type: string
type: # expected metatype for the parameter
type: string
enum:
- type
- boolean
- integer
- enumeration
- string
min: # for integers, the minimum supported value (inclusive)
type: number
max: # for integers, the maximum supported value (inclusive)
type: number
options: # for enums, the list of supported values
type: array
minItems: 1
uniqueItems: true
items:
type: string
optional: # when set to true, the parameter may be omitted at the end or skipped using null
type: boolean
type_param_values: # an array of compound type (variation) parameter definitions
type: array
items:
OneOf:
- type: "null" # use to skip optional parameters
- type: boolean # for boolean parameters
- type: number # for integer parameters
- type: string # for string and enum parameters
- type: object # for data type parameters
required: [ type ]
properties:
type:
$ref: "#/$defs/type"
arguments: # an array of arguments
type: array
items:
Expand Down

0 comments on commit 9f23575

Please sign in to comment.