Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add main mathematical functions #3909

Merged
merged 17 commits into from
Dec 10, 2023
Merged

feat: add main mathematical functions #3909

merged 17 commits into from
Dec 10, 2023

Conversation

PrettyWood
Copy link
Collaborator

@PrettyWood PrettyWood commented Dec 9, 2023

With all the analytics DB, working on math and dates operations is key.
I will try to work on date / time / durations... types but still need some time for onboarding.
In the meantime if you're ok with this, I would like to first add math functions that are available for almost every dialects. In another PR I'll have a deeper look at string functions and I reckon it should cover pretty much #217 was opened for.
I'll work in a next PR on adding documentation for #1761 (similar to #1775)

@PrettyWood PrettyWood changed the title feat: add main mathemtical functions feat: add main mathematical functions Dec 9, 2023
let asin = column -> s"ASIN({column:0})"
let tan = column -> s"TAN({column:0})"
let atan = column -> s"ATAN({column:0})"
let pow = column exponent -> s"POW({column:0}, {exponent:0})" # TODO: create an operator for that (** or ^ ?)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're ok with this PR, I reckon we should have some kind of operator for pow
Which one would you prefer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes agree. No particularly strong view on ^ vs **, WDYT? (our parser can cope with the initial ambiguity of ** fine)

Copy link
Collaborator Author

@PrettyWood PrettyWood Dec 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally would go with ** for 3 reasons

  • it's in python which is super used in data world (also in JS)
  • ^ makes me think of bitwise XOR in Rust and Python
  • It feels more comfortable to type on a QWERTY ^^ I prefer to type twice a key than use shift nvm I'm stupid

Copy link
Member

@max-sixty max-sixty Dec 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Maybe ** uses up fewer unique characters, in case we want to use ^ for something else)

Edit: I just saw your reply, OK — let's go with **...

Copy link
Member

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super — this is v helpful, we should have done this a while ago.

They all look fine to me. I guess folks aren't using all of these (atan!) that much, but no harm in having them...

@max-sixty
Copy link
Member

In another PR, we could add some of these to our integration tests. I think they're simple enough that I'm fairly confident they work, but nice to have coverage of them across dialects. A couple of times we've found inconsistencies between dialects this way.

If you also want to add them to the unit tests and/or docs, that would be good. Again, am quite confident they work, so not a top priority.

@PrettyWood PrettyWood marked this pull request as ready for review December 9, 2023 23:22
@PrettyWood
Copy link
Collaborator Author

@max-sixty I should be good to review :)

@max-sixty
Copy link
Member

Great re adding ** here!

We need to set the associativity here:

fn associativity(&self) -> Associativity {
use BinaryOperator::*;
match self {
Minus | Divide => Associativity::Left,
_ => Associativity::Both,
}
}

It should be right associative (the only one). And we should add a test that the precedence works; 5 * 4 ** 3 ** 2 should be equivalent to 5 * (4 ** 9). (it's OK if it passes it through to SQL; it's important it doesn't insert parentheses that shouldn't be there)

If you prefer to push that to another PR, that's totally fine; we should just comment it out from the parser so it doesn't work yet.

CHANGELOG.md Outdated Show resolved Hide resolved
@PrettyWood
Copy link
Collaborator Author

It should be right associative

You're completely right. I'll work on it tonight though I'll have to do some extra work because it's not a BinaryOperator (not available in SQL parser)

@PrettyWood
Copy link
Collaborator Author

@max-sixty So after some dive in I'm a bit perplex on how I should make the right associativity work.
What we want is for

c ** a ** b

to be translated into

POW(c, POW(a, b))

instead of

POW(POW(c, a), b)

Since there is no "power" operator in Postgres, we can't delegate that to the DB and there is nothing on sqlparser that handles that directly. I tried to use BinaryOperator::Custom(String::from("**")) for example and set right associativity on it, it doesn't work.
IMO what we want is for c ** a ** b to be translated into (c ** a) ** b before that but I would need some guidance 😬

@max-sixty
Copy link
Member

OK, good point... this is actually a bit harder than I thought — it's not only adjusting the SQL writer, it's also adjusting the parser here:

.foldl(|left, (op, right)| {
, I think.

(Previously everything had been left-associative, so all the work I'd done here had been on the SQL writer)

I can come back to this and look more. In the meantime, can we comment out the change in the parser so ** doesn't work yet? We can leave the rest of the code there so it'll flow through when we do fix it.

@@ -165,7 +165,7 @@ LambdaParam { identPart TypeDefinition? (":" expression)? }
Comment { "#" ![\n]* }
@precedence { Docblock, Comment }

//op_bin_only { "*" | "/" | "//" | "%" | "!=" | ">=" | "<=" | "~=" | ">" | "<" | "??" | "&&" | "||" }
//op_bin_only { "*" | "/" | "//" | "%" | "**" | "!=" | ">=" | "<=" | "~=" | ">" | "<" | "??" | "&&" | "||" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vanillajonathan should we remove this line?

PrettyWood and others added 3 commits December 10, 2023 20:04
@PrettyWood
Copy link
Collaborator Author

PrettyWood commented Dec 10, 2023

Ok I removed the ** operator I think we're good!
One last remark: should I write pow as pow = exponent column instead of pow = column exponent ?
I'm a bit torn between:

  • SQL that uses POW(col, 2)
  • PRQL where I would prefer to write derive { pow_col = col | pow 2 }

@max-sixty
Copy link
Member

PRQL where I would prefer to write derive { pow_col = col | pow 2 }

I think this is correct — the col generally goes last, because piping puts the argument last, and is the thing that's normally piped.

We do something similar for round etc...

@PrettyWood PrettyWood merged commit 41dbf82 into main Dec 10, 2023
34 checks passed
@PrettyWood PrettyWood deleted the math-functions branch December 10, 2023 19:51
@PrettyWood PrettyWood mentioned this pull request Dec 10, 2023
@vanillajonathan
Copy link
Collaborator

It would be pretty convenient if all the math functions could be grouped together under a math namespace.

Like Math in JavaScript.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math

@PrettyWood
Copy link
Collaborator Author

Yup @vanillajonathan I thought about it and agree with you I'd like that. We could have the same for string utils.
This would be a bit more explicit and avoid name collisions

@max-sixty
Copy link
Member

Yes, I agree...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants