Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support named joins #2389

Closed
5 tasks done
josevalim opened this issue Jan 22, 2018 · 58 comments
Closed
5 tasks done

Support named joins #2389

josevalim opened this issue Jan 22, 2018 · 58 comments

Comments

@josevalim
Copy link
Member

josevalim commented Jan 22, 2018

Tasks

  • Allow multiple options to be given to join/5
  • Support the :as construct in queries
  • Include :as on inspected queries
  • Update the documentation
  • Add support for :as in from too

The problem

A commonly asked feature about Ecto is to support named joins. The idea is that, if you need to compose a query by parts, you may want to refer to previous joins. The only way to do so in Ecto today is via positional bindings:

  query = Post

  # Filter by the join
  query = from p in query,
            join: c in Comment, where: c.post_id == p.id

  # Extend the query
  query = from [p, c] in query,
            select: {p.title, c.body}

There is also the ... identifier that helps select the last bindings:

  # Extend the query
  query = from [p, ..., c] in query,
            select: {p.title, c.body}

However, those approaches are still restrictive, as they require you to track when a given join was added.

For those reasons, having a mechanism that refers to a previously established join would be very useful.

Solution

There were previous attempts to address this issue but all of them attempted to conflate the positional binding with the named join. This approach can be very error prone because in Ecto the name of the binding has no effect on the generated query. So if bindings suddenly had to be unique, many queries would fail.

Another approach would be to identify the joins by their schema but that is also limited as you would be unable to join the same schema/table more than once, which may be desired from time to time.

So it is clear that the only approach to solve this is by introducing an explicit naming mechanism. We propose to do so via the :as option in queries:

  query = Post

  # Filter by the join
  query = from p in query,
            join: c in Comment, as: :comments, where: c.post_id == p.id

  # Extend the query
  query = from [p, comments: c] in query,
            select: {p.title, c.body}

If the :as key is present, Ecto will do the following:

  1. Has another join be specified with the :as key? If so, we raise
  2. Otherwise we store it for further lookups

We could even leverage this solution to include the :as names in the resulting SQL query, as it may be handy in certain situations.

Feedback

Your turn. :)

@OvermindDL1
Copy link
Contributor

OvermindDL1 commented Jan 22, 2018

First some glee. Oh My Word This Would Simplify SOOO Much Of My Code And Shorten My Factorial-Growing Compile Times Whoooo!!!

Now that that is out of the way. ^.^

I'm not sure why the :as needs to be in the join query, instead of this:

query = from p in query,
            join: c in Comment, as: :comments, where: c.post_id == p.id

Why not do this:

query = from p in query,
            join: [comments: c] in Comment, where: c.post_id == p.id

This way the syntax more matches the use query and it seems more obvious as well since the 'name' of the thing can be inferred as it's unique 'key'. :-)

@josevalim
Copy link
Member Author

@OvermindDL1 I thought about it but my reservation is that join expects a single binding and not a list of bindings. With your proposal we would mix the consumption of bindings with their definition.

@OvermindDL1
Copy link
Contributor

@OvermindDL1 I thought about it but my reservation is that join expects a single binding and not a list of bindings. With your proposal we would mix the consumption of bindings with their definition.

True a list implies more, but it matches the later query. What about instead a 2-tuple?

query = from p in query,
            join: {:comments, c} in Comment, where: c.post_id == p.id

That way it still matches the key/binding pair later.

@josevalim
Copy link
Member Author

josevalim commented Jan 22, 2018 via email

@OvermindDL1
Copy link
Contributor

Another reason to use :as though is that we can use it in the generated SQL
too. It may be handy in some situations.

You can with the tuple as well. I know as is more SQL'y, but it just doesn't seem to fit and makes the query syntax seem a lot more noisy to me, and less elixir'y/erlang'y. The 'key' in it seems related though to me?

@john-griffin
Copy link

This feature would be incredible. We hit the limits of ... when trying to generate queries with nested joins and this would solve it.

@mbuhot
Copy link
Contributor

mbuhot commented Feb 15, 2018

Can the name be dynamic / interpolated?

query = 
  from [p, ^som_dyamic_name: c] in query,
  select: {p.title, c.body}

@szlend
Copy link
Contributor

szlend commented Feb 15, 2018

This is a great step towards making join queries more composable, and fixes one of my biggest gripes with Ecto!

However, the proposed solution still doesn't fix the issue completely. I still can't write a single function that can accept a joined or non-joined query. Also, it still makes dealing with multiple joins of the same table difficult because the alias has to be hardcoded in the function (unless it can be interpolated like @mbuhot suggested).

For example, this function is not reusable:

def active(query) do
  from [accounts: a] in query,
    where: a.active_until > datetime_add(fragment("NOW()"), -5, "day")
end

# this would work
query = from t in Thing, join: a in assoc(t, :account), as: :accounts
Account.active(query)

# but this (as far as I understand) would not
query = from a in Account
Account.active(query)

# neither would this
query =
  from t in Thing,
    join: a in assoc(t, :account), as: :accounts,
    join: a2 in assoc(t, :other_account), as: :accounts2
Account.active(query) # i want to filter on :accounts2

@josevalim
Copy link
Member Author

josevalim commented Feb 15, 2018 via email

@OvermindDL1
Copy link
Contributor

You can declare the same join twice as long as it has the same name and it
will be added only once, this allows operations to be idempotent.

Hmm, how will that handle conflicting on: ... options? 'and' them together or first-only or last-only or so?

I'm not big on dynamic aliases personally, though having a function to detect what names already exist on the query so we could branch would be very nice. :-)

@michalmuskala
Copy link
Member

I think the current plan is to raise when the on clauses are conflicting.

@hauleth
Copy link
Contributor

hauleth commented Feb 20, 2018

It would be handy to have also additional join-like function that would work similarly to select_merge, it would check if given named join is present and adds one only if it isn't already present, otherwise it will add it. This would help with composing queries.

@josevalim
Copy link
Member Author

@hauleth that's exactly what we are discussing above. :) If the as option is given more than once, it is idempotent as long as you have the same :on.

@hauleth
Copy link
Contributor

hauleth commented Feb 20, 2018

Oh, yeah, I was mislead by @michalmuskala comment, sorry for the fuss.

@OvermindDL1
Copy link
Contributor

Will there be some official way to introspect what join 'names' have already been added to the query, or will we need to manually introspect the internals of the query to figure it out? Basically a function like [:some_name, :another_name] = get_join_names(some_query) (and maybe also a helper like is_joined?(some_query, :some_name) or so)? Although perhaps returning a set (map with true values, or maybe something useful in the values?) would make matching in a case a lot easier. :-)

@josevalim
Copy link
Member Author

@hauleth no problem.

Let's get the feature in its basic form in, then we can discuss all possible additions and their use cases.

@zoldar
Copy link
Member

zoldar commented Feb 27, 2018

I was experimenting with adding support for idempotent named joins but hit some issues along the way. Most chief among them is how to reliably determine whether two join expressions are equal. The query expressions in on seem to be non-trivial to compare when the expressions inside contain positional references to bindings. Comparison, at least regarding my current knowledge of Ecto internals and Elixir metaprogramming, would either require expanding those expressions or some other way of altering them - which may carry certain computational cost. If I'm missing some well known way of making such comparison, please let me know.

Another problem I've started to consider is whether the behavior (joins being idempotent) doesn't bring an unwanted element of surprise to the user of the library. Perhaps that's not as big of an issue as I imagine it to be, but in situation where somebody adds a (named) join for the second time and, for some reason, tries to refer to it by its absolute position, may be caught by surprise that the binding was not added. Again, whether that's a big issue that join behaves somewhat differently in that particular case, I don't know.

For reference, I have a WIP branch with some groundwork done for comparing joins (although mind you, as it currently stands, it's a non-working, messy draft): zoldar@f532d6e

If nothing changes, I will try to come up with a way to make the comparison at least work in the coming days - reliably, if possible. If anyone has any ideas or opinions on the problems/questions I've raised here, I'd be happy to hear them.

@josevalim
Copy link
Member Author

@zoldar if you have the exact same join, with the exact same on and as, then odds are that you want them to be the same anyway.

Regarding the check, the bindings have to be the same, otherwise they are separate joins, no? I would try implement this by traversing the on_expr (i.e. %JoinExpr{}.on.expr) at compile time and by setting the metadata node in the AST to an empty list, effectively removing contextual or line/file information. You should be able to do this by using Macro.prewalk/2. The key you are going to use to check if the join is the same will be {join_source, :erlang.phash2(prewalked_join_on_expr)}.

Thoughts?

@zoldar
Copy link
Member

zoldar commented Feb 27, 2018

@josevalim

Ok, so, given a following basic example:

"posts"
|> join(:inner, [], b1 in "blogs", on: b1.valid, as: :blogs)
|> join(:inner, [], b2 in "blogs", on: b2.valid, as: :blogs)

I'm getting following on expressions:

{{:., [line: 263], [{:b1, [line: 263], nil}, :valid]}, [line: 263], []}

{{:., [line: 264], [{:b2, [line: 264], nil}, :valid]}, [line: 264], []}

which later become:

{{:., [], [{:&, [], [1]}, :valid]}, [], []}

{{:., [], [{:&, [], [2]}, :valid]}, [], []}

dropping metadata from AST alone seems to not be enough. However, it probably could indeed be possible to traverse the expression and "unify" all occurrences of positional arguments referring to the joined association. I'll look into that in the coming days.

@josevalim
Copy link
Member Author

@zoldar You are right. We could use the escaped on but that complicates the code too. I think it is probably best to give up on implementing this feature implicitly. Maybe we would need to find a way to override those as more explicitly or, as @OvermindDL1 said, find a way to check if an as already exists or not. I will remove this item from the todo list. Just documentation and supporting :as in from left. Thank you for exploring those alternatives!

@OvermindDL1
Copy link
Contributor

OvermindDL1 commented Feb 27, 2018

/me is very of the thought that a name should only exist once, if it is attempted to be used again then it should raise an exception saying as much while showing the existing and the attempted join informations...

(EDIT: Really though, multiple usages of the same name just screams BUG to me)

Honestly all I'd want is just named joins (if the name is attempted to be used again then hard error), and a function or documented field on the query to get the list of used names (to save me from having to carry them around manually, which I would inevitably have to do in every case that I use named queries). :-)

@zoldar
Copy link
Member

zoldar commented Mar 2, 2018

@josevalim Just wanted to make sure, for avoidance of any doubt:

Add support for :as in from too

is that about supporting :as in bindings list, as in:

from [p, comments: c] in posts_with_comments, ...

(which is already supported - Ecto.Query.Builder.From.build/2 runs bindings through Builder.escape_binding/3; it's just a matter of adding tests and documenting that)?

or is it about aliasing the source of the query, like:

from p in Posts, as: :posts, ...

?

The second form would probably be useful in the context of generating the query by adapter and using the provided aliases instead of generated ones.

@josevalim
Copy link
Member Author

josevalim commented Apr 8, 2018 via email

@michalmuskala
Copy link
Member

@zoldar that's exactly what happens when you add the join with same source and on and same alias multiple times - it will be added just once. I'm not sure I see what would be an advantage of checking it manually.

@zoldar
Copy link
Member

zoldar commented Apr 8, 2018

@michalmuskala Hmm, as far as the implementation for named bindings goes, it will explicitly crash, unless I misunderstood?

test "crashes on assigning the same name twice at compile time" do

@zoldar
Copy link
Member

zoldar commented Apr 8, 2018

@zachdaniel Sure, go for it if you like.

but one thing I know I'll need for my API query builder is also "give me an unused alias".

what do you mean by that?

@michalmuskala
Copy link
Member

It will raise if you try to join two different things using the same alias. If you join the same thing twice with the same alias, the join will be deduplicated for you - that's the major feature of named joins.

@josevalim
Copy link
Member Author

josevalim commented Apr 8, 2018 via email

@josevalim
Copy link
Member Author

josevalim commented Apr 8, 2018 via email

@zachdaniel
Copy link
Contributor

@zoldar so in my API, you can provide filters on joined values, and you can also include related entities. The current implementation has to do some very hacky things to accomplish this, so lets not dive into that too much. But what I'd need is something equivalent of "give me an alias that is not in use in this query already", so that I can tie it to a specific reference to an association in our request.

Like I said though, it wouldn't bother me if no one wants this, because its easy enough to generate aliases until I get one that is not in use, but I would rather use the function you talked about to check if an alias is in use. so like

...building query

# encounter a filter/include request on an association
alias = Ecto.Query.new_alias(query)
association = util_to_figure_out_assoc # my own thing
filter = util_to_figure_out_dynamic_filter # my own thing

new_query =
  from row in query,
    join: associated in assoc(row, ^association),
    as: ^alias,
    on: ^filter

or something along those lines.

@zoldar
Copy link
Member

zoldar commented Apr 8, 2018

@josevalim is right - it's best to wait until people start trying that out in practice. I actually had a real life use case on my mind but I'll hold off with proposing anything until/if I actually try to use that feature on an actual living code.

Sorry for stirring the pond in a closed issue on Sunday 🙈

@zachdaniel
Copy link
Contributor

@zoldar as soon as I feel like I have the tools to replace my existing code w/ named bindings (there are a few things I need, and this is one of them) then I will be trying it out in practice.

@zachdaniel
Copy link
Contributor

But we can totally discuss this in a different format later this week :D I'll just see if I can try the refactor that I have in mind w/ ecto master and see what things I run into, and then report back.

@josevalim
Copy link
Member Author

josevalim commented Apr 8, 2018 via email

@zachdaniel
Copy link
Contributor

@josevalim that is very true :D My use case is that I'm accepting a (very graphql-ish) query from user input, something like

%{
  upvotes: %{greater_than: 1},
  any_of: [
    %{comments: %{author_id: 1}},
    %{comments: %{upvotes: %{less_than: 1}}}
  ]
}

We already support a significant set of filters and boolean combinators like that, but its done w/ quite a bit of metaprogramming that I'd like to make unnecessary. I am already tying specific association joins that I require to binding indices externally to the query, and then building query expressions manually. I was hoping to replace that with named bindings, but that would require me to be able to specify a binding at runtime while constructing a query.

@josevalim
Copy link
Member Author

josevalim commented Apr 8, 2018

@zachdaniel in this case I would try to pre-process the data and move the any_of inside each association.

The other idea is to have something like has_named_binding? that @zoldar mentioned and make sure the bindings is introduced at some point. So you would have functions like:

def ensure_comments(query) do
  if has_named_binding?(query, :comments) do
    query
  else
    join(query, [p], c in assoc(p, :comments), as: :comments)
  end
end

@zachdaniel
Copy link
Contributor

zachdaniel commented Apr 9, 2018

We have something like 60 resources most of which have upwards of 10 relationships (and some many more) so having to have a handwritten (or even compile time generated) list of functions that can join + name that association isn't really feasible. I'll write something up that explains our use case more in depth later, since we're looking to open source our framework in the next 6 months to a year as time allows to clean it up and extract the parts that are specific to us.

@OvermindDL1
Copy link
Contributor

This would be helpful, for instance, in cases when a query is built from filters which can be applied multiple times and only the first call of each should add relevant joins.

WDYT? \cc @OvermindDL1

Please!

that's exactly what happens when you add the join with same source and on and same alias multiple times - it will be added just once. I'm not sure I see what would be an advantage of checking it manually.

I'll ever only add it once, and honestly if it was added multiple times with the same name (even if an identical query) I really say it should just error out as that really is an odd logic issue there.

However, checking if the name exists is highly valuable as an earlier part could have joined it with a name with a special unique set of dynamic bindings, and later something uses that join (perhaps further joins).

It will raise if you try to join two different things using the same alias. If you join the same thing twice with the same alias, the join will be deduplicated for you - that's the major feature of named joins.

Uh, this sounds like a major logic bug just waiting to happen...

@josevalim is right - it's best to wait until people start trying that out in practice. I actually had a real life use case on my mind but I'll hold off with proposing anything until/if I actually try to use that feature on an actual living code.

I have lots of real-life use-cases that I've spoken of on the forums and so forth, I can detail some of them here if curious? My use-cases don't need a test for if a query has a binding contained within it, I could carry that information in a tuple or map or so, but that is painful to be carrying another thing along with the query as then I'd have to thread that through everywhere, have a final thing that pulls it out, etc... etc... It is just such a major code simplification to just let me query the names already in it, especially as I don't have to worry about it getting out of sync or anything of the sort.

My use case is that I'm accepting a (very graphql-ish) query from user input, something like

Ah, that's a good use-case, my uses are very similar on the front-side as well, however I have other reasons needed for named joins and that is because I have to heavily optimize a very very slow oracle server link and need to minimize the number of queries (which means utterly ginormous queries) while optimizing them as much as possible. Joining anywhere from 8 to 22 tables is entirely common.

@josevalim
Copy link
Member Author

Folks, this issue is closed and the basic feature is in. If you feel you need more functionality, please provide a complete proposal. You can use this topic as an example. Please provide a snippet with how the code looks today, the problems the old code has, and then describe the solution, with advantages, disadvantages and the updated code samples. Thank you.

@hauleth
Copy link
Contributor

hauleth commented Apr 9, 2018

@josevalim if I understand @OvermindDL1 correctly (and if so, I completely agree) is that we want to reduce functionality proposed there, not add to it. The part we are afraid of is:

If you join the same thing twice with the same alias, the join will be deduplicated for you

So if I understand correctly if I now do this:

query1 = from foo in Foo, join: bar in Bar, on: bar.foo_id == foo.id
query2 = from foo in query1, join: bar in Bar, on: bar.foo_id == foo.id

it will silently pass. However I (and I think @OvermindDL1 also) think that this is dangerous behaviour and it should fail the same as when used like:

query1 = from foo in Foo, join: bar in Bar, on: bar.foo_id == foo.id
query2 = from foo in query1, join: bar in Bar, on: bar.foo_id2 == foo.id

It should provide much less confusion IMHO and also allow us to add such behaviour later if we find it usable. Other way around it would be infeasible to do so (I mean to remove such feature).

EDIT: To join and merge I would propose join_merge macro that would work like currently proposed doubled named join. I think it would be less confusing to the people.

@josevalim
Copy link
Member Author

josevalim commented Apr 9, 2018 via email

@josevalim
Copy link
Member Author

josevalim commented Apr 9, 2018 via email

@tanweerdev
Copy link

tanweerdev commented Sep 12, 2018

Is this feature available now in latest release? (coz I dont see in docs of version 2.2.10 which was released on Apr 8)

@josevalim
Copy link
Member Author

josevalim commented Sep 12, 2018 via email

@tanweerdev
Copy link

Sure. Thanks for the quick response.

@AndrewDryga
Copy link
Contributor

AndrewDryga commented Sep 18, 2018

After we added named joins there is no way to ensure named join exists. Here is an example where it's useful:

  def with_preloaded_account(queryable) do
    if alias_defined?(queryable, :account) do
      queryable
    else
      queryable
      |> join(:left, [assignment], account in assoc(assignment, :account), as: :account)
      |> preload([assignment, account: account], account: account)
    end
  end

  def alias_defined?(queryable, alias_name) when is_atom(queryable) do
    queryable
    |> Ecto.Queryable.to_query()
    |> alias_defined?(alias_name)
  end

  def alias_defined?(queryable, alias_name) do
    Map.has_key?(queryable.aliases, alias_name)
  end

This way you have more flexibility and can do cool stuff like this which is used in domain context:

  def with_preloads(queryable, preload: []) do
    queryable
  end

  def with_preloads(queryable, preload: preload) when is_atom(preload) do
    with_preloads(queryable, preload: List.wrap(preload))
  end

  def with_preloads(queryable, preload: [:user | rest_preloads]) do
    queryable
    |> with_preloaded_user()
    |> with_preloads(preload: rest_preloads)
  end

  def with_preloads(queryable, preload: [:account | rest_preloads]) do
    queryable
    |> with_preloaded_account()
    |> with_preloads(preload: rest_preloads)
  end

  def with_preloads(queryable, []) do
    queryable
  end

I saw in multiple projects that by having preloads option in domain context you can save a lot of duplicated code and do not execute separate queries.

So should we add this to Ecto.Query?

@hauleth
Copy link
Contributor

hauleth commented Sep 18, 2018

@AndrewDryga if I read your request correctly it is the same as there
#2389 (comment)

@AndrewDryga
Copy link
Contributor

@hauleth yeah, did not saw that one. This is exactly what we ended up doing.

@josevalim
Copy link
Member Author

@AndrewDryga we would appreciate a PR that adds has_named_binding?(query, key). :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests