-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom operator support #11137
Custom operator support #11137
Conversation
82f8961
to
c09e299
Compare
7878cd0
to
0d66cc5
Compare
This seems to be working well with |
0d66cc5
to
71b7956
Compare
I plan to review this tomorrow morning |
I've found an issue whilst working on datafusion-contrib/datafusion-functions-json#22 Lexical Precedence doesn't work correctlySee datafusion-contrib/datafusion-functions-json@f573a39 The following works as expected in postgres: select '{"a": "b"}'::json->>'a'='b'
-- True But with datafusion select '{"a": "b"}'->>'a'='b'
-- Error during planning: Unexpected argument type to 'LongArrow' at position 2, expected string or int, got Boolean. Same happens for other common code like Happens for both I tried setting Same issue applies to the existing at arrows: > select [1,2,3]@>[1] is null;
-- Error during planning: Cannot infer common array type for arrow operation List(Field { name: "item", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) @> Boolean |
Precedence issue looks like it's an issue in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @samuelcolvin
I think this PR looks really quite good to be honest. It is well commented / documented / structured ❤️
However, to respond to your high level question
Is all the effort here worth it, should we just add the 20 or so extra operators from the sql library to Operator?
I personally think this is a very reasonable alternative and given what I am seeing in this PR from what it would required to support custom operators, I would tend to agree that simply adding new enum variants to Operators
might be the best way to go
I think some PRs from @jayzhan211 recently may be related (e.g. #11155) where the planner is translating operators into function calls for array operators. Maybe we could extend that idea so instead of hard coding the function names in the planner (which isn't ideal from my perspective) the translation of "operator --> function" comes from a table or trait or something 🤔
What do you think @jayzhan211 ?
cc @phillipleblanc as I think you may be interested in this too |
An also related conversation maybe: #10534 |
@alamb sorry I wasn't clear, having implemented this, I'm in favour of sticking to this API, it'll make customising how operators behave much easier than hard coding them all within datafusion. It'll also be much easier to avoid any breaking changes for those who don't want to support more operators. I'll let you make the final decision. |
I am working on an alternate proposal too -- stay tuned |
Here is an alternate idea #11168 |
@jayzhan211's alternate PR here #11180 is looking good in my opinion |
closing this, I'll follow up on #11180. |
I think this is ready to review.
WIP to decide if we progress cc @alamb.This is based on #11132 since I needOperator
to be non-copyable. If we decide to progress I can rebase this, or cherry pick and create a new branch.You can see this without the churn of #11132 at samuelcolvin#1.high level question
Is all the effort here worth it, should we just add the 20 or so extra operators from the sql library to
Operator
?Advantages of this route:
Disadvantages:
a lot of faff hereit's not that badTo to
ParseCustomOperator
be passed into the SQL parser, it definitely shouldn't be as it is now, perhaps we should have an equivalent ofregister_function_rewrite
like, e.g.register_custom_operators
? DONEis the hack withI think what we have is goodWrapCustomOperator
necessary and acceptable, maybe someone else's Rust foo could avoid this?shouldI don't think so, the current write logic is more powerfulCustomOperator
provide aget_udf
method, then we rewrite to that function automatically, rather than relying onregister_function_rewrite
?from_proto_binary_op
, we can't keep the same signature and support custom operators, this might be easy to fix - done by adding the register methods toFunctionRegistry
is it okay to rely onI think it's goodname()
of the operator to implement equality, ordering and hashing?Example Usage:
Here's a trivial example of usage that just replaces the "->" operator with string concat: