-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: introduce compound (parameterizable) extension types and variations #196
feat: introduce compound (parameterizable) extension types and variations #196
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks superb.
uniqueItems: true | ||
items: | ||
type: string | ||
optional: # when set to true, the parameter may be omitted at the end or skipped using null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do min
, max
, options
and optional
need to exist in the proto implementation as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you mean the files from #193, yeah, probably, I didn't touch those (yet, pending the outcome of that issue, and I've also been working on more accurately specifying type expressions in the background, for which they'd probably change as well). Other than that, the proto messages in substrait.Plan only bind values to the parameters, so, I don't think so, no.
On a practical note, you'd have to combine the recursively-referring/referred-to files into a single file to get this to work. We did this for Yes, it's really annoying that proto doesn't support this. |
I don't think these additions would create a new cycle between files, thankfully; it'd be plan -> extensions -> type and plan -> algebra -> type. |
@@ -67,6 +76,23 @@ message SimpleExtensionDeclaration { | |||
// more than one impl per name in the YAML. | |||
string name = 3; | |||
} | |||
|
|||
message TypeParameter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be part of the type variable as opposed to being normalized in the extensiontypevariation.
It means we should probably deprecate this line:
https://github.com/substrait-io/substrait/blob/main/proto/substrait/type.proto#L39
uint32 user_defined_type_reference = 31;
And instead introduce something more like
UserDefinedType user_defined = 32;
message UserDefinedType {
uint32 user_defined_type_reference = 1;
// Parameterization, if this is a compound type variation.
repeated TypeParameter parameters = 2;
message TypeParameter {
oneof parameter {
// Explicitly null/unspecified parameter, to select the default value (if
// any).
google.protobuf.Empty null = 1;
// Data type parameters, like the i32 in LIST<i32>.
Type data_type = 2;
// Value parameters, like the 10 in VARCHAR<10>.
bool boolean = 3;
int64 integer = 4;
string enum = 5;
string string = 6;
}
}
}
This would make user defined definitions consistent with all other compound types, defining them within the operations rather than having to constantly normalize them. E.g. creating list<i8>
would be basically the same as creating mylist<i8>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I did it this way for consistency with function overloads (i.e. you'd need separate anchors for each overload), to make it a bit easier to compare types, and to save some repetition in the plan, but yours is probably the stronger argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any status update on the comments here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just spitballing with this and haven't looked at this for a while to be honest, but I can pick it back up next week.
2f85dd1
to
9f23575
Compare
b7a4f52
to
a284d34
Compare
a284d34
to
df1df3c
Compare
Okay, rebased, updated according to @jacques-n's comment, and got fixed things up to get CI to pass. I tried to update |
When I first reviewed this I didn't realize that you were also proposing compound type variations. That feels like an entirely different discussion that I can't say I see a really pressing need for. Let's definitely remove it from the original intention here of supporting compound extension types. As a note, (per my comments elsewhere), I specifically avoided using "parameterized" to reference compound types since I reserved that use for a different use case. Let's hold off on starting to use that language here until we have a good dictionary elsewhere. |
Uh... what use case would that be if not to specify "something with parameters"? I'm confused. I'm also not sure how to rewrite the docs if that's a somehow forbidden word.
I figured as much, since it didn't get any comments; again, this whole thing was just supposed to be me writing down whatever came to mind. That said, I do have a use case in the back of my mind for them, namely to represent timezone-aware timestamps stored with a particular timezone offset that isn't necessarily UTC, which Arrow uses IIRC. It also just seemed like low-hanging fruit to support both, since the logic for compound type classes can be copied one-to-one to variations, though in general I've found that I interpreted variations to be much more general of a concept than they were apparently intended to be. Anyway, I don't feel that strongly about this so I'll remove it once I figure out how to reformulate the docs... |
Removed/reverted: - parameterizable type variations - changes to how types are referred to in the YAML - the word "parameterizable" in this context Changed: - rename "type" metatype to "dataType" to disambiguate a bit better Fixed: - doc examples using "enum" instead of "enumeration"
27e18f7
to
a8ec3bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jvanstraten . Looks good.
…ions (substrait-io#196) * feat: introduce compound (parameterizable) extension types and variations
#190 prompted me to start thinking about what would be needed to support parameterizable user-defined types. Since the use case in #190 kind of suits a parameterizable type variation of
timezone_tz
more so than a new type, I figured I'd apply the same logic to both extension types and type variations at the same time.This commit is really just me spitballing ideas and is intended as a request for comments of sorts rather than a PR; I figured it'd probably be more succinct to do this via code than a long-winded unstructured issue. This certainly couldn't be merged as such even if we'd wanted to because
substrait.extensions
now depends onsubstrait.Type
, which protobuf doesn't support (sigh), and because I'm sure the commit message is not compliant.Anyway, please shoot holes!