Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Allow join side to take an ident that resolves to a literal #4499

Merged
merged 7 commits into from
May 23, 2024

Conversation

kgutwin
Copy link
Collaborator

@kgutwin kgutwin commented May 22, 2024

This PR makes a small addition to the join transform, allowing the side: parameter to take an Ident that resolves immediately to a string literal.

Example:

let my_side = "right"

from x
join y (==id) side:my_side

compiles as you would expect to

SELECT 
  x.*, 
  y.* 
FROM 
  x 
  RIGHT JOIN y ON x.id = y.id

Why this change? I am working on a set of "library" functions for my project, and one of them is a simple "map" transform that takes in a relation as a lookup table and outputs a new column. The basic code for this function is:

let map_into = func
  map_tbl <relation>
  column <scalar>
  map_tpl <tuple>
  tbl <relation>
  -> (
    join side:left map=map_tbl (column == that.key) tbl
    derive map_tpl
    select !{ map.key, map.value }
)

# usage of the above
let region_lookup = from_text """
key,value
1,northeast
2,southeast
3,midwest
4,northwest
5,southwest
"""

from employees
map_into region_lookup this.region { region_name = that.value }

Which works great -- but it isn't configurable as to the join type, which means users of this library function only get one hard-coded option. What I want is to expose the join side as a function parameter... with this PR, it now looks like this:

let map_into = func
  map_tbl <relation>
  column <scalar>
  map_tpl <tuple>
  side <text>:"left"
  tbl <relation>
  -> (
    join side:_param.side map=map_tbl (column == that.key) tbl
    derive map_tpl
    select !{ map.key, map.value }
)

from employees
map_into region_lookup this.region { region_name = that.value } side:"inner"
# now the resulting SQL uses an inner join rather than a left join

Note the use of join side:_param.side to pass the side value in via parameter.

Feedback is welcome - thanks!

@max-sixty
Copy link
Member

Thanks!

Added one quick question, fine to merge now though.

A code comment saying what's the intent of the new code + a changelog is great but not strictly necessary.

@vanillajonathan
Copy link
Collaborator

🤔
This can get confusing:

let left = "right"

from x
join y (==id) side:left

@kgutwin
Copy link
Collaborator Author

kgutwin commented May 22, 2024

Ah yeah, that's like the Python trick of defining True = False which used to work back in the Python 2 days... I suppose it's possible to get yourself turned around but I'm hopeful that it's rare in practice.

FWIW, this PR would interpret that code snippet as a left join rather than a right join. It first tries to use the side Ident as a bare string ("left", "right", etc.) and if that doesn't match, it tries to resolve the ident to a literal. I suppose we could flip the logic around if that seems more sensible.

I'll push a changelog entry and some code comments soon.

@eitsupi
Copy link
Member

eitsupi commented May 22, 2024

I too think this is odd.

Wouldn't we have to make a breaking change to accept text as an argument?

@kgutwin
Copy link
Collaborator Author

kgutwin commented May 22, 2024

The original behavior (treating the provided ident as a bare string) would still be the default, so I don't think this would be a breaking change... this only adds functionality in the fallback case where the ident provided to side isn't already one of the four recognized strings and when it can be resolved to a string literal.

There are probably a few other options to consider in order to make the join side parametrizable, but I went this route because the implementation was the simplest and had the least overall impact.

@eitsupi
Copy link
Member

eitsupi commented May 22, 2024

Yes, I understand your intention (Great work!).

However, given that we are currently rethinking our type system (includes table and column name resolution), I am wondering if this change makes sense.

@kgutwin
Copy link
Collaborator Author

kgutwin commented May 22, 2024

Thanks -- yeah, I really appreciate the work around the type system, I've already found it useful for certain things and am looking forward to future improvements. I'm hoping this change is mostly outside those bounds, though, because it's really just about finding a way to pass a parameter to the join operation. Have there been any thoughts around how the type system changes might impact function "option" parameters?

Since idents can reference just about anything in the PRQL context, I tried to double-check that you wouldn't get strange behavior if a user passes an ident that resolves to something odd. For example, if you try to pass a table column name to the side parameter (which is nonsensical) it returns a sensible error. Similarly, if you pass it a reference to a function or an array, you get a reasonable error and not a compiler crash.

@max-sixty
Copy link
Member

max-sixty commented May 22, 2024

Agree with @vanillajonathan & @eitsupi on reflection; it is less simple to allow both strings and bare words here.

Should we switch to a string param for normal uses too; so it's join side:"right"?

That seems a bit less aesthetic. But it's unclear what type it is now. Possibly it's enum type, like join_type.right, which is in the main namespace by default?

If we go with the enum type, we could allow let foo = join_type.right or let foo = right — if we don't go for strings, we should still allow this is to be a value...

(If getting there is really hard, I would be +0.1 on merging this with an experimental label to unblock @kgutwin and then resolving later)

@kgutwin
Copy link
Collaborator Author

kgutwin commented May 22, 2024

I agree that moving towards an enum strategy makes a lot of sense. That said, I really like the ergonomics of join side:right and am hesitant to make it more complicated like join side:"right" or join side:join_type.right just for the sake of this one request. I reviewed the standard library and the only other function that uses this mechanism at the moment is from_text format:csv.

Are you thinking we should use something like the named unions that are defined in #4038? That would be a good starting point, but it seems like there's a lot of implementation left to do, for example to handle dereferencing individual elements of the union. I can't tell yet whether getting that done would be "really hard" but I suspect it'll be outside my scope in the near term...

If we can agree on what the UX and structure for this feature ought to be long-term, could we consider merging this PR for the time being, so that I can close out the feature request on my project? I'm happy to help out with the proper implementation once it's clear.

@max-sixty
Copy link
Member

max-sixty commented May 22, 2024

I agree that moving towards an enum strategy makes a lot of sense. That said, I really like the ergonomics of join side:right and am hesitant to make it more complicated like join side:"right" or join side:join_type.right just for the sake of this one request. I reviewed the standard library and the only other function that uses this mechanism at the moment is from_text format:csv.

Yes it would still be written as side:right; but that would be shortened syntax for side:join_type.right, similar to how we can import enum variants in Rust.

I think that's fine for merging this version (unless others really disagree?). Could we add a short note to the code referencing this discussion and saying it's experimental / subject to change?

@kgutwin
Copy link
Collaborator Author

kgutwin commented May 23, 2024

That sounds good to me, I'll put in a quick comment shortly. I'll also open an issue to continue the discussion on how function could take (and validate) named unions as parameters.

As an aside, for transparency, I found a workaround in my project that means this feature is no longer necessary for me in the short term. We have some other tasks to do later that would be helped by this, so a merge would still be good, but I understand if it's best to set this aside for a more complete implementation.

CHANGELOG.md Outdated Show resolved Hide resolved
@max-sixty
Copy link
Member

Great — I can't seem to edit the PR (I think this might be a setting on your end?), but if we can accept that change and run pre-commit run -a for the formatting, I think it makes sense to merge. Thanks!

kgutwin and others added 2 commits May 23, 2024 17:36
Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
@max-sixty max-sixty merged commit 0158d3f into PRQL:main May 23, 2024
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants