Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow overlapping implementations for marker traits #1268

Closed
wants to merge 2 commits into from

Conversation

sgrif
Copy link
Contributor

@sgrif sgrif commented Sep 2, 2015

rendered

(Despite showing as closed, this RFC was accepted and merged.)

@arielb1
Copy link
Contributor

arielb1 commented Sep 3, 2015

The currently presented version of specialization won't allow this. Your proposal is basically lattice impls for marker traits.

@nrc nrc added the T-lang Relevant to the language team, which will review and decide on the RFC. label Sep 3, 2015
@aturon aturon self-assigned this Sep 3, 2015
sgrif added a commit to diesel-rs/diesel that referenced this pull request Sep 11, 2015
The integration tests file and the primitives.rs file were getting
ridiculously huge. This breaks them both up into more logical
components. This will make it easier to unit test the various types.
While we can't move all the integration tests into the `tests` folder
until either specialization, or rust-lang/rfcs#1268 lands, due to how
we're having to implement `SelectableColumn` for `JoinSource`, this
doesn't affet the integration tests for the type casting behavior, and
we can safely move those over.
@aturon
Copy link
Member

aturon commented Sep 14, 2015

@sgrif First, sorry for the very overdue response; I'm behind on my RFC shepherding.

Second: some improvements in this area would be most welcome. For example, the design I recently proposed for a trait here won't work without some adjustment to coherence for marker traits.

Simply allowing overlap for marker traits (as proposed here) seems like a pure win. But I wonder if you've thought much about negative impls and potential overlap there? For example, in the comment I linked above, I propose something like:

use std::cell::UnsafeCell;

trait ExnSafe {}
impl ExnSafe for .. {}

trait RefExnSafe {}
impl RefExnSafe for .. {}

impl<'a, T: RefExnSafe> ExnSafe for &'a T {}

impl<'a, T> !ExnSafe for &'a mut T {}
impl<T> !ExnSafe for UnsafeCell<T> {}

impl<T: 'static + Send> ExnSafe for T {}

// Imagine a variant of RefCell that did poisoning:
impl<T> ExnSafe for PoisoningRefCell<T> {}

struct AssertExnSafe<T>(T);
impl<T> ExnSafe for AssertExnSafe<T> {}
impl<T> Deref for AssertExnSafe<T> { ... }
impl<T> DerefMut for AssertExnSafe<T> { ... }

fn recover<F: FnOnce() -> R + ExnSafe, R>(f: F) -> thread::Result<R>

which has potential for overlap between positive and negative cases. How should that work out?

I've been thinking recently about a more flexible form of specialization, which @arielb1 calls "lattice impls". Whether we go that way or not, I think that we want to provide some way of resolving such ambiguities with specialization, and maybe that's the answer for these conflicts.

Regardless, I don't see a lot of downside to moving forward with the simple case of "positive impl" overlap proposed here in the meantime!

sgrif added a commit to diesel-rs/diesel that referenced this pull request Sep 15, 2015
Due to limitations of the language, I still cannot have the tests live
in another crate (unfortunate, as it means this is actually impossible
to use in any context...)

However, we can force a lot of the code changes (mostly namespacing in
`macros`) by moving the tests out of `lib.rs`.

The two blockers for usage in another crate are the impls for
`SelectableColumn` created by the `joinable_inner!` macro (this
constitutes a blanket impl for a non-crate-local type). I can define a
generic implementation in the library once either
rust-lang/rfcs#1268, or specialization lands.

The second blocker is the impl for `Queriable` that is generated by
`belongs_to!`, as `(A, B)` is not considered crate-local, even if both
`A` and `B` are crate local. I can define a generic impl for this one,
but only once specialization lands. (I'm not sure that I want to in this
case, but it looks like I might not have a choice?)

In fact, in the second one, I'm almost certain that I'd rather not have
to define a generic type, as I will probably want to define a separate
implementation depending on whether the relationship is one-to-many,
one-to-one, and optional one-to-one.

It also kinda sucks that these restrictions exist at all when the
definition comes from a macro in the library's crate. However I
understand that for coherence reasons, where the macro is defined makes
absolutely no difference.
@gereeter
Copy link

I'm not certain of what exact restrictions should be in place, but I'm strongly in favor of something like this as it would allow #91 to be implemented as a library.

sgrif added a commit to diesel-rs/diesel that referenced this pull request Sep 21, 2015
The integration tests file and the primitives.rs file were getting
ridiculously huge. This breaks them both up into more logical
components. This will make it easier to unit test the various types.
While we can't move all the integration tests into the `tests` folder
until either specialization, or rust-lang/rfcs#1268 lands, due to how
we're having to implement `SelectableColumn` for `JoinSource`, this
doesn't affet the integration tests for the type casting behavior, and
we can safely move those over.
sgrif added a commit to diesel-rs/diesel that referenced this pull request Sep 21, 2015
Due to limitations of the language, I still cannot have the tests live
in another crate (unfortunate, as it means this is actually impossible
to use in any context...)

However, we can force a lot of the code changes (mostly namespacing in
`macros`) by moving the tests out of `lib.rs`.

The two blockers for usage in another crate are the impls for
`SelectableColumn` created by the `joinable_inner!` macro (this
constitutes a blanket impl for a non-crate-local type). I can define a
generic implementation in the library once either
rust-lang/rfcs#1268, or specialization lands.

The second blocker is the impl for `Queriable` that is generated by
`belongs_to!`, as `(A, B)` is not considered crate-local, even if both
`A` and `B` are crate local. I can define a generic impl for this one,
but only once specialization lands. (I'm not sure that I want to in this
case, but it looks like I might not have a choice?)

In fact, in the second one, I'm almost certain that I'd rather not have
to define a generic type, as I will probably want to define a separate
implementation depending on whether the relationship is one-to-many,
one-to-one, and optional one-to-one.

It also kinda sucks that these restrictions exist at all when the
definition comes from a macro in the library's crate. However I
understand that for coherence reasons, where the macro is defined makes
absolutely no difference.
@sgrif
Copy link
Contributor Author

sgrif commented Sep 22, 2015

I hadn't thought too closely about negative impl overlap, but I started working on actually implementing this, and it turns out we do have an explicit test case which got me thinking about it, and illustrates it pretty well.

I think the right answer here is to disallow any overlap between a negative impl and a positive impl (excluding impl for ..). Once specialization lands, those rules can apply in a straightforward and consistent way, where overlap is permitted as long as one impl is strictly more specific than the other. I'll amend the RFC to be clear on this

sgrif added a commit to diesel-rs/diesel that referenced this pull request Sep 30, 2015
At this point, we are officially separating the concept of a join from
an association, and I am no longer thinking about associations just yet.
We'll figure out what they look like later, this will sit underneath. We
still need to figure out what selecting from more than two tables looks
like, as well.

Implementing this revealed a fatal flaw in our infrastructure thus far.
The type of a column is not concrete, and changes based on the query. In
particular, a outer join can change the type of a column to be nullable.

As such, we end up with the following strucutre:

- The type of a join is (Left, Nullable<Right>)
- The type of a column from the right side when selected from a join is
  Nullable<Original> (note: We still need to figure out if we can avoid
  having `Option<Option<T>>` here)
- A column of `(Nullable<A>, Nullable<B>)` can be treated as
  `Nullable<(A, B)>` instead

These rules then allow fairly ergonomic working with these columns in
various contstructs. Selecting a single column gives you a nullable
value. Any sort of data structure, including tuples, can be treated as
optional instead of its fields being optional.

For a time during implementation, I ended up having the additional
parameter on `SelectableColumn` be an associated type, which I really
wanted to avoid due to the potential changes in marker trait overlap
coming if rust-lang/rfcs#1268 lands. We managed
to have it be fairly painless to make it an input type, however.

We needed to re-introduce `next_is_null`, to implement the nullable
tuple generic. I opted to have it take the number of columns to check,
as this is the best way I could find to check if all of the values are
null, without adding a method on `FromSql` to ask if it's able to handle
nulls, or anything like that.

Once again, this caused some werid changes in the compiler error
messages. Sometimes we're now getting the same error an extra time, or
one time less, or it's asking for a different trait than it was before.
I need to follow up on this.

As I'm writing this, I'm realizing that I'm probably missing a few
tests, which I need to go backfill now.
sgrif added a commit to diesel-rs/diesel that referenced this pull request Sep 30, 2015
All of the queriable macros related to joins have caused issues, since
they are invalid via the orphan rules for coherence that exist today.
While I think we can relax those once specialization lands, or work
around it doing something ugly like this:
d65df55,
neither of those solutions are great right now.

This should remove the last blocker that makes this library impossible
to use in other crates. We still need to work around the trait coherence
rules for the `SelectableColumn` impls in `joinable!`, but that's a much
more livable workaround, as it will go away entirely once either
specialization or rust-lang/rfcs#1268 lands, as
I can replace that with a full blanket impl in the crate itself.

Once concern I do have is that we aren't doing much to enforce the
relationship between models. One thing that was already true, but is
coming to the forefront of my mind is that there's nothing to stop you
from doing `users.inner_join(posts).select((users::id, posts::title))`
and collecting that into `User { id: i32, name: String }`.

Do we even need to care though? Would caring be too tightly conflating
the concept of a record and a row from a specific table? Is this ever a
case that would be easy to accidentally get wrong? I don't so at this
time, you'd have to be pretty specific to get it wrong. I think this is
probably fine.

Once we add the work-around for the `SelectableColumn` stuff, we can
move the rest of the integration tests into the tests directory, and
start looking at how places like crates.io might use this library.
sgrif added a commit to diesel-rs/diesel that referenced this pull request Sep 30, 2015
555993e has more detailed reasoning.
This workaround will eventually just go away as it can be defined
generically once the rules for overlap are loosened either by
specialization (allowing for lattice) or rust-lang/rfcs#1268. In order
to use joins, this macro will need to be called and passed every column
for the table on each side.
sgrif added a commit to diesel-rs/diesel that referenced this pull request Oct 7, 2015
The main thing that stood out to me was that the final test,
`filter_then_select` didn't immedaitely pass once I made it compile. It
was ignoring the where clause of the parent source. This re-affirms to
me that I need to start looking for a more generic way to implement
this, instead of having all of these different concrete types.

In the short term, I've removed the default impl to force myself to
consider the implications of this method on the rest of the types. For
Join sources, I think it still makes sense to always return `None`,
since they only work with tables at the moment.

It's worth noting this now allows chaining `select` and `filter`, where
subsequent calls will override the previous. I think that's fine for
`select`, but `filter` should of course chain with `and`.

This has a lot of the same caveats that we've seen in the past when
moving things off of `Table` and onto `QuerySource`. The tests get a tad
bit verbose here, but I'm not sure if that's indicative of a problem.

Once again I need to do the selectable expression workaround. This can
go away once rust-lang/rfcs#1268 lands. I was
unable to add a blanket impl in these cases, since it overlapped with
the blanket impls defined for types like `expression::Eq`. This was
actually quite surprising to me, since both impls had concrete types
which could be known for a fact not to overlap.
@aturon
Copy link
Member

aturon commented Oct 8, 2015

We talked about this a bit at last week's lang team meeting, and I think everyone is on board, except for the definition of "marker trait".

The consensus was that, for the purposes of this RFC, coherence rules should be relaxed for any trait that does not contain any items, regardless of what it inherits from.

In particular, inheriting from a trait that happens to include items doesn't actually introduce coherence problems around the subtrait implementation.

Can you update the RFC accordingly? Once you've done that, we can move to final comment period and try to land this.

The consensus has been that inheriting from a trait which has associated items does not introduce coherence violations for the subtrait.
@sgrif
Copy link
Contributor Author

sgrif commented Oct 8, 2015

I've updated the RFC to use that definition of marker trait

@aturon aturon added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Oct 9, 2015
@aturon
Copy link
Member

aturon commented Oct 9, 2015

This RFC is entering its Final Comment Period.

@glaebhoerl
Copy link
Contributor

In particular, inheriting from a trait that happens to include items doesn't actually introduce coherence problems around the subtrait implementation.

I'm not sure of the right way to interpret this. As far as I can tell, empty traits with non-empty supertraits couldn't possibly have incoherent impls, because to implement a trait you need to implement its supertraits, and you couldn't impl the non-empty supertrait incoherently. Is this the same thing as what you meant to imply?

@aturon
Copy link
Member

aturon commented Oct 9, 2015

@glaebhoerl Yes; thanks for putting it much more clearly :)

@glaebhoerl
Copy link
Contributor

That said, I think you could actually implement a subtrait incoherently even without doing so for the supertrait (which may also be part of what you meant). Contrived example:

trait A { fn f() -> bool; }

impl<T> A for Option<T> { ... }

trait B: A { }

impl<T: Debug> B for Option<T> { }

impl<T: Display> B for Option<T> { }

But this also seems fine.

@arielb1
Copy link
Contributor

arielb1 commented Oct 12, 2015

Was there an analysis of the interaction with the other parts of the type system, and with the interaction with the current implementation strategy (I think this would basically require the new trait system).

@aturon
Copy link
Member

aturon commented Oct 12, 2015

@glaebhoerl Yes, that's what I'm trying to get at: supertraits are simply irrelevant when it comes to the coherence rules of a subtrait, because the impl blocks are separate; we can govern them by separate coherence rules. In particular, if a subtrait doesn't have any items, than there's nothing to be coherent about, which is of course what this RFC is getting at. Hence it makes sense for relaxed coherence rules to apply.

@arielb1 Could you be slightly more concrete/specific? In terms of semantic interactions, this proposal doesn't seem problematic, given that the coherence property is basically vacuous in these cases.

@arielb1
Copy link
Contributor

arielb1 commented Oct 12, 2015

@aturon

I want to find any surprises before, not after, this is stabilized, especially as I am not sure what is the best way to stabilize this.

@aturon
Copy link
Member

aturon commented Oct 12, 2015

@arielb1 I'm definitely on board with that! But what I mean is, I'm not sure if you have any concrete worries in mind? Certainly both @nikomatsakis and I have given this RFC some thought in terms of semantic interactions and haven't turned up anything problematic, but you're often a little more devious, so maybe you see something fishy?

@arielb1
Copy link
Contributor

arielb1 commented Oct 12, 2015

@aturon

Nothing particularly, except that I don't feel like we understand the current system well enough. On the other hand, this is both not stronger than lattice impls (which we may want), and feels to be semantically equivalent to fixing rust-lang/rust#27544, which we (probably) want to.

@arielb1
Copy link
Contributor

arielb1 commented Oct 12, 2015

For example, there was the OIBIT = negative impls interaction.

@arielb1
Copy link
Contributor

arielb1 commented Oct 12, 2015

One thing that worries me is that we could have non-deterministic effects about which impl gets used when inference, particularly region inference, is involved.

impl<'a, 'b> Marker<'a> for &'b T {}
impl<'a> Marker<'a> for T where T: 'a {}

@nikomatsakis
Copy link
Contributor

Tracking issue is rust-lang/rust#29864

@sgrif
Copy link
Contributor Author

sgrif commented Nov 16, 2015

Thanks! Sorry for the delay on following up

@nikomatsakis
Copy link
Contributor

I accidentally rebased when I landed this, so I am going to close the PR manually.

@sgrif sgrif deleted the sg-overlapping-impls branch November 18, 2015 13:51
sgrif added a commit to sgrif/rfcs that referenced this pull request Apr 30, 2016
This RFC was written at a time where the specialization RFC appeared to include the lattice rule. Since the RFC was accepted without it, this RFC implies that it is superseded by specialization, but it is not.
sgrif added a commit to sgrif/rfcs that referenced this pull request Apr 30, 2016
sgrif added a commit to sgrif/rfcs that referenced this pull request Apr 30, 2016
sgrif added a commit to sgrif/rfcs that referenced this pull request Apr 30, 2016
aturon added a commit that referenced this pull request Jul 26, 2016
Update the alternative section for #1268
bors added a commit to rust-lang/rust that referenced this pull request Sep 25, 2018
Support an explicit annotation for marker traits

From the tracking issue for rust-lang/rfcs#1268:
> It seems obvious that we should make a `#[marker]` annotation. ~ #29864 (comment)

This PR allows you to put `#[marker]` on a trait, at which point:
- [x] The trait must not have any items ~~All of the trait's items must have defaults~~
- [x] Any impl of the trait must be empty (not override any items)
- [x] But impls of the trait are allowed to overlap

r? @nikomatsakis
@richard-uk1
Copy link

Could you update the link at the top now it's been merged 😁

@Centril
Copy link
Contributor

Centril commented Oct 4, 2018

Done.

@richard-uk1
Copy link

Thanks very much :D

sgrif added a commit to diesel-rs/diesel that referenced this pull request Dec 19, 2019
This commit implements the proposal laid out at
https://discourse.diesel.rs/t/proposed-change-replace-nonaggregate-with-something-less-broken-that-can-support-group-by/18.

The changes made in this commit now correctly enforce semantics around
the mixing of aggregate/non-aggregate expressions, and lays the
foundations required for full `GROUP BY` support in the future. This
commit does not implement `GROUP BY` as a feature in public API, though
I have added some minimal support to prove the soundness of the change.

Since this will likely be the largest breaking change in 2.0, I am going
to include as much detail as I can about the problem, the reasoning
behind the solution, and the likely impact of the change.

Recap of the problem
----

`NonAggregate` was originally introduced in
c66d96f. The goal was to prevent the
following error at compile time:

   [local] sean@sean=# select count(*), name from users;
   ERROR:  42803: column "users.name" must appear in the GROUP BY clause or be used in an aggregate function

I didn't think about this nearly as much as I should have at the time,
which resulted in a hilariously broken implementation that has prevented
the addition of `group_by` support, and left bugs in the codebase for
more than 4 years.

At the time I was focused on "mixing aggregate and non-aggregate
expressions in a select statement". Since select uses tuples to
represent multiple values, I added a trait to represent non-aggregate
values, added it as a bound for `impl Expression for AnyTuple`, and
called it a day. This had a number of problems.

The most obvious was that it prevented valid queries involving multiple
aggregate expressions. At the time I figured we'd have a separate
trait for aggregate expressions, and then `impl Expression for AnyTuple
where AllFields: NonAggregate | AllFields: Aggregate`. This eventually
led to [RFC #1268](rust-lang/rfcs#1268), which
doesn't seem likely to be stabilized soon, and even if it were we still
have the other issues with this solution.

The next issue is that you cannot say whether a given expression is
aggregate by looking at it on its own. The answer to "Is `users`.`name`
aggregate?" depends on whether or not that column appears in the group
by clause. So any trait which correctly handles this must be generic
over the group by clause, or it cannot be correctly implemented for
columns.

However, the most egregious issue with that commit is that it does not
handle *any* composite expressions besides tuples. Even today,
`.select((count_star(), id))` fails, but `.select(count_star() + id)`
will compile (and either error at runtime or give non-deterministic
results depending on your backend).

User Impact
----

This change will unfortunately have more breakage than I had
anticipated. That said, the breakage only affects *extremely* generic
code, and I do not believe that the majority of our users will notice or
care about this change.

There are three main ways in which the breakage will affect users:

- The additional bound on `SelectStatement<...>: SelectDsl` and
  `SelectStatement<...>: Query`.
  - This one broke our test suite in a few places, mainly where we have
    *really* generic code to say "I can select T.eq(sql_string)". I do
    not believe this is representative of real code.
  - I did a GitHub-wide search of all occurances of
    `SelectableExpression` (which is *not* the bound on the public API,
    but is the bound on `SelectStatement`'s impl, and would point to
    broken code). I found one repo which is relying on internals that
    will break, and a lot of results from Wundergraph. I didnt' look at
    those. This probably breaks Wundergraph. Sorry, Georg. It should be
    easy to fix I promise.
- `NonAggregate` can no longer be directly implemented
  - I did a GitHub-wide search for `NonAggregate`. With one exception,
    every repo implementing this trait directly was on pre-1.0. The only
    affected repo is manually implementing `Expression` instead of using
    `#[derive(AsExpression)]`. With that change they will be
    future-proofed.
- `T: NonAggregate` no longer implies `(OtherType, T): NonAggregate`
  - This broke `infer_schema`, since it was relying on `AssocType:
    NonAggregate` to know `(known_column, AssocType, known_column):
    Expression`. Ironically, that's no longer important, it did still
    break due to the first item on this list. I could not find any code
    in the wild that fell into this category.

Originally I thought that the only code which would be affected by this
is code that wrote `impl NonAggregate`, since that code would need to
change to `impl ValidGrouping`. However, I missed a few things when I
made that assessment.

The first is that... Well the feature exists. The whole point of this
was to prevent `aggregate + non_aggregate` from compiling when passed to
select, which implies a new trait bound *somewhere*. While
`SelectStatement` and its impls are private API, it's really easy to
couple yourself ot the bounds on those impls. It doesn't help that
`rustc` literally recommends that folks do that when errors occur. Any
code that is written as `Foo: SelectDsl<Selection>` will be fine, but
any code that's gone down the rabbit hole and has copied the bounds from
`impl SelectDsl for SelectStatement<...>` will break. I didn't find many
cases in the wild, but I do think it's relatively common.

The second thing I missed is that "is this aggregate" is not a binary
question. Originally I expected we would have two answers to the
question, and compound expressions would enforce that their
sub-expressions all had the same answer. The issue is that `1` doesn't
care. You can have `count(*) + 1`, and you can also have `non_aggregate
+ 1`. `1` is always valid, regardless of aggregation. So we need three
answers. The problem is that this means we can't have `T: NonAggregate`
mean everything it used to.

On stable `T: NonAggregate` will mean `T: ValidGrouping<()>`, and that
`T` can be passed to everything with a `NonAggregate` bound (`filter`,
`order`, etc). On nightly, it also implies `T::IsAggregate:
MixedAggregates<is_aggregate::No, Output = is_aggregate::No>`. In
English, it implies that this is valid with no group by clause, and its
output can appear with an expression which is not aggregate (but might
be with a different group by clause). The outcome of this is that `T:
NonAggregate` implies `(T, Other): NonAggregate`. However the missing
link from both stable and unstable is `is_aggregate::No:
MixedAggregates<T::IsAggregate>` being implied, which would mean
`(Other, T): NonAggregate` would be implied.

Ultimately this is a really long-winded way of saying that `T:
NonAggregate` used to imply interactions with other types. That is no
longer consistently true. It's unlikely there are many affected users,
but any that are affected will need to directly have a `ValidGrouping`
bound.

Implementation Notes
---

Because this broke diesel_infer_schema, I had to do a version bump to
get that crate to rely on the released 1.4.

There's a note on the unsoundness of the `NonAggregate` impl of
`Subselect`. I haven't changed the bounds on that impl, but we almost
certainly should address it before 2.0 is released.

I opted to validate the new bound in `SelectDsl`, so folks get errors on
`.select` instead of `.load`. We can't validate this on calls to both
`.select` *and* `.group_by`, since a select statement valid with a group
by is often invalid without one, and a group by clause usually makes the
default select clause invalid.

While this commit does not add proper group by support, I fleshed it out
a bit to test the type constraints. Right now a column only considers
itself valid if there is no group by clause, or if the group by clause
is exactly that column.

I had more to say but I went on a call and lost my train of thought. I
need to come back and finish writing this later.

Notable Limitations
---

`SELECT lower(name) FROM users GROUP BY lower(name)` probably can't be
represented.

Unanswered Questions
---

`BoxableExpression` has been limited to only work when there's no
group by clause, and only work with types which are `is_aggregate::No`.
This is probably not what we want to do, but was the smallest change to
start.
sgrif added a commit to diesel-rs/diesel that referenced this pull request Dec 19, 2019
This commit implements the proposal laid out at
https://discourse.diesel.rs/t/proposed-change-replace-nonaggregate-with-something-less-broken-that-can-support-group-by/18.

The changes made in this commit now correctly enforce semantics around
the mixing of aggregate/non-aggregate expressions, and lays the
foundations required for full `GROUP BY` support in the future. This
commit does not implement `GROUP BY` as a feature in public API, though
I have added some minimal support to prove the soundness of the change.

Since this will likely be the largest breaking change in 2.0, I am going
to include as much detail as I can about the problem, the reasoning
behind the solution, and the likely impact of the change.

Recap of the problem
----

`NonAggregate` was originally introduced in
c66d96f. The goal was to prevent the
following error at compile time:

   [local] sean@sean=# select count(*), name from users;
   ERROR:  42803: column "users.name" must appear in the GROUP BY clause or be used in an aggregate function

I didn't think about this nearly as much as I should have at the time,
which resulted in a hilariously broken implementation that has prevented
the addition of `group_by` support, and left bugs in the codebase for
more than 4 years.

At the time I was focused on "mixing aggregate and non-aggregate
expressions in a select statement". Since select uses tuples to
represent multiple values, I added a trait to represent non-aggregate
values, added it as a bound for `impl Expression for AnyTuple`, and
called it a day. This had a number of problems.

The most obvious was that it prevented valid queries involving multiple
aggregate expressions. At the time I figured we'd have a separate
trait for aggregate expressions, and then `impl Expression for AnyTuple
where AllFields: NonAggregate | AllFields: Aggregate`. This eventually
led to [RFC #1268](rust-lang/rfcs#1268), which
doesn't seem likely to be stabilized soon, and even if it were we still
have the other issues with this solution.

The next issue is that you cannot say whether a given expression is
aggregate by looking at it on its own. The answer to "Is `users`.`name`
aggregate?" depends on whether or not that column appears in the group
by clause. So any trait which correctly handles this must be generic
over the group by clause, or it cannot be correctly implemented for
columns.

However, the most egregious issue with that commit is that it does not
handle *any* composite expressions besides tuples. Even today,
`.select((count_star(), id))` fails, but `.select(count_star() + id)`
will compile (and either error at runtime or give non-deterministic
results depending on your backend).

User Impact
----

This change will unfortunately have more breakage than I had
anticipated. That said, the breakage only affects *extremely* generic
code, and I do not believe that the majority of our users will notice or
care about this change.

There are three main ways in which the breakage will affect users:

- The additional bound on `SelectStatement<...>: SelectDsl` and
  `SelectStatement<...>: Query`.
  - This one broke our test suite in a few places, mainly where we have
    *really* generic code to say "I can select T.eq(sql_string)". I do
    not believe this is representative of real code.
  - I did a GitHub-wide search of all occurances of
    `SelectableExpression` (which is *not* the bound on the public API,
    but is the bound on `SelectStatement`'s impl, and would point to
    broken code). I found one repo which is relying on internals that
    will break, and a lot of results from Wundergraph. I didnt' look at
    those. This probably breaks Wundergraph. Sorry, Georg. It should be
    easy to fix I promise.
- `NonAggregate` can no longer be directly implemented
  - I did a GitHub-wide search for `NonAggregate`. With one exception,
    every repo implementing this trait directly was on pre-1.0. The only
    affected repo is manually implementing `Expression` instead of using
    `#[derive(AsExpression)]`. With that change they will be
    future-proofed.
- `T: NonAggregate` no longer implies `(OtherType, T): NonAggregate`
  - This broke `infer_schema`, since it was relying on `AssocType:
    NonAggregate` to know `(known_column, AssocType, known_column):
    Expression`. Ironically, that's no longer important, it did still
    break due to the first item on this list. I could not find any code
    in the wild that fell into this category.

Originally I thought that the only code which would be affected by this
is code that wrote `impl NonAggregate`, since that code would need to
change to `impl ValidGrouping`. However, I missed a few things when I
made that assessment.

The first is that... Well the feature exists. The whole point of this
was to prevent `aggregate + non_aggregate` from compiling when passed to
select, which implies a new trait bound *somewhere*. While
`SelectStatement` and its impls are private API, it's really easy to
couple yourself ot the bounds on those impls. It doesn't help that
`rustc` literally recommends that folks do that when errors occur. Any
code that is written as `Foo: SelectDsl<Selection>` will be fine, but
any code that's gone down the rabbit hole and has copied the bounds from
`impl SelectDsl for SelectStatement<...>` will break. I didn't find many
cases in the wild, but I do think it's relatively common.

The second thing I missed is that "is this aggregate" is not a binary
question. Originally I expected we would have two answers to the
question, and compound expressions would enforce that their
sub-expressions all had the same answer. The issue is that `1` doesn't
care. You can have `count(*) + 1`, and you can also have `non_aggregate
+ 1`. `1` is always valid, regardless of aggregation. So we need three
answers. The problem is that this means we can't have `T: NonAggregate`
mean everything it used to.

On stable `T: NonAggregate` will mean `T: ValidGrouping<()>`, and that
`T` can be passed to everything with a `NonAggregate` bound (`filter`,
`order`, etc). On nightly, it also implies `T::IsAggregate:
MixedAggregates<is_aggregate::No, Output = is_aggregate::No>`. In
English, it implies that this is valid with no group by clause, and its
output can appear with an expression which is not aggregate (but might
be with a different group by clause). The outcome of this is that `T:
NonAggregate` implies `(T, Other): NonAggregate`. However the missing
link from both stable and unstable is `is_aggregate::No:
MixedAggregates<T::IsAggregate>` being implied, which would mean
`(Other, T): NonAggregate` would be implied.

Ultimately this is a really long-winded way of saying that `T:
NonAggregate` used to imply interactions with other types. That is no
longer consistently true. It's unlikely there are many affected users,
but any that are affected will need to directly have a `ValidGrouping`
bound.

Implementation Notes
---

Because this broke diesel_infer_schema, I had to do a version bump to
get that crate to rely on the released 1.4.

There's a note on the unsoundness of the `NonAggregate` impl of
`Subselect`. I haven't changed the bounds on that impl, but we almost
certainly should address it before 2.0 is released.

I opted to validate the new bound in `SelectDsl`, so folks get errors on
`.select` instead of `.load`. We can't validate this on calls to both
`.select` *and* `.group_by`, since a select statement valid with a group
by is often invalid without one, and a group by clause usually makes the
default select clause invalid.

While this commit does not add proper group by support, I fleshed it out
a bit to test the type constraints. Right now a column only considers
itself valid if there is no group by clause, or if the group by clause
is exactly that column.

I had more to say but I went on a call and lost my train of thought. I
need to come back and finish writing this later.

Notable Limitations
---

`SELECT lower(name) FROM users GROUP BY lower(name)` probably can't be
represented.

Unanswered Questions
---

`BoxableExpression` has been limited to only work when there's no
group by clause, and only work with types which are `is_aggregate::No`.
This is probably not what we want to do, but was the smallest change to
start.
weiznich pushed a commit to diesel-rs/diesel that referenced this pull request Apr 3, 2020
This commit implements the proposal laid out at
https://discourse.diesel.rs/t/proposed-change-replace-nonaggregate-with-something-less-broken-that-can-support-group-by/18.

The changes made in this commit now correctly enforce semantics around
the mixing of aggregate/non-aggregate expressions, and lays the
foundations required for full `GROUP BY` support in the future. This
commit does not implement `GROUP BY` as a feature in public API, though
I have added some minimal support to prove the soundness of the change.

Since this will likely be the largest breaking change in 2.0, I am going
to include as much detail as I can about the problem, the reasoning
behind the solution, and the likely impact of the change.

Recap of the problem
----

`NonAggregate` was originally introduced in
c66d96f. The goal was to prevent the
following error at compile time:

   [local] sean@sean=# select count(*), name from users;
   ERROR:  42803: column "users.name" must appear in the GROUP BY clause or be used in an aggregate function

I didn't think about this nearly as much as I should have at the time,
which resulted in a hilariously broken implementation that has prevented
the addition of `group_by` support, and left bugs in the codebase for
more than 4 years.

At the time I was focused on "mixing aggregate and non-aggregate
expressions in a select statement". Since select uses tuples to
represent multiple values, I added a trait to represent non-aggregate
values, added it as a bound for `impl Expression for AnyTuple`, and
called it a day. This had a number of problems.

The most obvious was that it prevented valid queries involving multiple
aggregate expressions. At the time I figured we'd have a separate
trait for aggregate expressions, and then `impl Expression for AnyTuple
where AllFields: NonAggregate | AllFields: Aggregate`. This eventually
led to [RFC #1268](rust-lang/rfcs#1268), which
doesn't seem likely to be stabilized soon, and even if it were we still
have the other issues with this solution.

The next issue is that you cannot say whether a given expression is
aggregate by looking at it on its own. The answer to "Is `users`.`name`
aggregate?" depends on whether or not that column appears in the group
by clause. So any trait which correctly handles this must be generic
over the group by clause, or it cannot be correctly implemented for
columns.

However, the most egregious issue with that commit is that it does not
handle *any* composite expressions besides tuples. Even today,
`.select((count_star(), id))` fails, but `.select(count_star() + id)`
will compile (and either error at runtime or give non-deterministic
results depending on your backend).

User Impact
----

This change will unfortunately have more breakage than I had
anticipated. That said, the breakage only affects *extremely* generic
code, and I do not believe that the majority of our users will notice or
care about this change.

There are three main ways in which the breakage will affect users:

- The additional bound on `SelectStatement<...>: SelectDsl` and
  `SelectStatement<...>: Query`.
  - This one broke our test suite in a few places, mainly where we have
    *really* generic code to say "I can select T.eq(sql_string)". I do
    not believe this is representative of real code.
  - I did a GitHub-wide search of all occurances of
    `SelectableExpression` (which is *not* the bound on the public API,
    but is the bound on `SelectStatement`'s impl, and would point to
    broken code). I found one repo which is relying on internals that
    will break, and a lot of results from Wundergraph. I didnt' look at
    those. This probably breaks Wundergraph. Sorry, Georg. It should be
    easy to fix I promise.
- `NonAggregate` can no longer be directly implemented
  - I did a GitHub-wide search for `NonAggregate`. With one exception,
    every repo implementing this trait directly was on pre-1.0. The only
    affected repo is manually implementing `Expression` instead of using
    `#[derive(AsExpression)]`. With that change they will be
    future-proofed.
- `T: NonAggregate` no longer implies `(OtherType, T): NonAggregate`
  - This broke `infer_schema`, since it was relying on `AssocType:
    NonAggregate` to know `(known_column, AssocType, known_column):
    Expression`. Ironically, that's no longer important, it did still
    break due to the first item on this list. I could not find any code
    in the wild that fell into this category.

Originally I thought that the only code which would be affected by this
is code that wrote `impl NonAggregate`, since that code would need to
change to `impl ValidGrouping`. However, I missed a few things when I
made that assessment.

The first is that... Well the feature exists. The whole point of this
was to prevent `aggregate + non_aggregate` from compiling when passed to
select, which implies a new trait bound *somewhere*. While
`SelectStatement` and its impls are private API, it's really easy to
couple yourself ot the bounds on those impls. It doesn't help that
`rustc` literally recommends that folks do that when errors occur. Any
code that is written as `Foo: SelectDsl<Selection>` will be fine, but
any code that's gone down the rabbit hole and has copied the bounds from
`impl SelectDsl for SelectStatement<...>` will break. I didn't find many
cases in the wild, but I do think it's relatively common.

The second thing I missed is that "is this aggregate" is not a binary
question. Originally I expected we would have two answers to the
question, and compound expressions would enforce that their
sub-expressions all had the same answer. The issue is that `1` doesn't
care. You can have `count(*) + 1`, and you can also have `non_aggregate
+ 1`. `1` is always valid, regardless of aggregation. So we need three
answers. The problem is that this means we can't have `T: NonAggregate`
mean everything it used to.

On stable `T: NonAggregate` will mean `T: ValidGrouping<()>`, and that
`T` can be passed to everything with a `NonAggregate` bound (`filter`,
`order`, etc). On nightly, it also implies `T::IsAggregate:
MixedAggregates<is_aggregate::No, Output = is_aggregate::No>`. In
English, it implies that this is valid with no group by clause, and its
output can appear with an expression which is not aggregate (but might
be with a different group by clause). The outcome of this is that `T:
NonAggregate` implies `(T, Other): NonAggregate`. However the missing
link from both stable and unstable is `is_aggregate::No:
MixedAggregates<T::IsAggregate>` being implied, which would mean
`(Other, T): NonAggregate` would be implied.

Ultimately this is a really long-winded way of saying that `T:
NonAggregate` used to imply interactions with other types. That is no
longer consistently true. It's unlikely there are many affected users,
but any that are affected will need to directly have a `ValidGrouping`
bound.

Implementation Notes
---

Because this broke diesel_infer_schema, I had to do a version bump to
get that crate to rely on the released 1.4.

There's a note on the unsoundness of the `NonAggregate` impl of
`Subselect`. I haven't changed the bounds on that impl, but we almost
certainly should address it before 2.0 is released.

I opted to validate the new bound in `SelectDsl`, so folks get errors on
`.select` instead of `.load`. We can't validate this on calls to both
`.select` *and* `.group_by`, since a select statement valid with a group
by is often invalid without one, and a group by clause usually makes the
default select clause invalid.

While this commit does not add proper group by support, I fleshed it out
a bit to test the type constraints. Right now a column only considers
itself valid if there is no group by clause, or if the group by clause
is exactly that column.

I had more to say but I went on a call and lost my train of thought. I
need to come back and finish writing this later.

Notable Limitations
---

`SELECT lower(name) FROM users GROUP BY lower(name)` probably can't be
represented.

Unanswered Questions
---

`BoxableExpression` has been limited to only work when there's no
group by clause, and only work with types which are `is_aggregate::No`.
This is probably not what we want to do, but was the smallest change to
start.
weiznich pushed a commit to diesel-rs/diesel that referenced this pull request Apr 16, 2020
This commit implements the proposal laid out at
https://discourse.diesel.rs/t/proposed-change-replace-nonaggregate-with-something-less-broken-that-can-support-group-by/18.

The changes made in this commit now correctly enforce semantics around
the mixing of aggregate/non-aggregate expressions, and lays the
foundations required for full `GROUP BY` support in the future. This
commit does not implement `GROUP BY` as a feature in public API, though
I have added some minimal support to prove the soundness of the change.

Since this will likely be the largest breaking change in 2.0, I am going
to include as much detail as I can about the problem, the reasoning
behind the solution, and the likely impact of the change.

Recap of the problem
----

`NonAggregate` was originally introduced in
c66d96f. The goal was to prevent the
following error at compile time:

   [local] sean@sean=# select count(*), name from users;
   ERROR:  42803: column "users.name" must appear in the GROUP BY clause or be used in an aggregate function

I didn't think about this nearly as much as I should have at the time,
which resulted in a hilariously broken implementation that has prevented
the addition of `group_by` support, and left bugs in the codebase for
more than 4 years.

At the time I was focused on "mixing aggregate and non-aggregate
expressions in a select statement". Since select uses tuples to
represent multiple values, I added a trait to represent non-aggregate
values, added it as a bound for `impl Expression for AnyTuple`, and
called it a day. This had a number of problems.

The most obvious was that it prevented valid queries involving multiple
aggregate expressions. At the time I figured we'd have a separate
trait for aggregate expressions, and then `impl Expression for AnyTuple
where AllFields: NonAggregate | AllFields: Aggregate`. This eventually
led to [RFC #1268](rust-lang/rfcs#1268), which
doesn't seem likely to be stabilized soon, and even if it were we still
have the other issues with this solution.

The next issue is that you cannot say whether a given expression is
aggregate by looking at it on its own. The answer to "Is `users`.`name`
aggregate?" depends on whether or not that column appears in the group
by clause. So any trait which correctly handles this must be generic
over the group by clause, or it cannot be correctly implemented for
columns.

However, the most egregious issue with that commit is that it does not
handle *any* composite expressions besides tuples. Even today,
`.select((count_star(), id))` fails, but `.select(count_star() + id)`
will compile (and either error at runtime or give non-deterministic
results depending on your backend).

User Impact
----

This change will unfortunately have more breakage than I had
anticipated. That said, the breakage only affects *extremely* generic
code, and I do not believe that the majority of our users will notice or
care about this change.

There are three main ways in which the breakage will affect users:

- The additional bound on `SelectStatement<...>: SelectDsl` and
  `SelectStatement<...>: Query`.
  - This one broke our test suite in a few places, mainly where we have
    *really* generic code to say "I can select T.eq(sql_string)". I do
    not believe this is representative of real code.
  - I did a GitHub-wide search of all occurances of
    `SelectableExpression` (which is *not* the bound on the public API,
    but is the bound on `SelectStatement`'s impl, and would point to
    broken code). I found one repo which is relying on internals that
    will break, and a lot of results from Wundergraph. I didnt' look at
    those. This probably breaks Wundergraph. Sorry, Georg. It should be
    easy to fix I promise.
- `NonAggregate` can no longer be directly implemented
  - I did a GitHub-wide search for `NonAggregate`. With one exception,
    every repo implementing this trait directly was on pre-1.0. The only
    affected repo is manually implementing `Expression` instead of using
    `#[derive(AsExpression)]`. With that change they will be
    future-proofed.
- `T: NonAggregate` no longer implies `(OtherType, T): NonAggregate`
  - This broke `infer_schema`, since it was relying on `AssocType:
    NonAggregate` to know `(known_column, AssocType, known_column):
    Expression`. Ironically, that's no longer important, it did still
    break due to the first item on this list. I could not find any code
    in the wild that fell into this category.

Originally I thought that the only code which would be affected by this
is code that wrote `impl NonAggregate`, since that code would need to
change to `impl ValidGrouping`. However, I missed a few things when I
made that assessment.

The first is that... Well the feature exists. The whole point of this
was to prevent `aggregate + non_aggregate` from compiling when passed to
select, which implies a new trait bound *somewhere*. While
`SelectStatement` and its impls are private API, it's really easy to
couple yourself ot the bounds on those impls. It doesn't help that
`rustc` literally recommends that folks do that when errors occur. Any
code that is written as `Foo: SelectDsl<Selection>` will be fine, but
any code that's gone down the rabbit hole and has copied the bounds from
`impl SelectDsl for SelectStatement<...>` will break. I didn't find many
cases in the wild, but I do think it's relatively common.

The second thing I missed is that "is this aggregate" is not a binary
question. Originally I expected we would have two answers to the
question, and compound expressions would enforce that their
sub-expressions all had the same answer. The issue is that `1` doesn't
care. You can have `count(*) + 1`, and you can also have `non_aggregate
+ 1`. `1` is always valid, regardless of aggregation. So we need three
answers. The problem is that this means we can't have `T: NonAggregate`
mean everything it used to.

On stable `T: NonAggregate` will mean `T: ValidGrouping<()>`, and that
`T` can be passed to everything with a `NonAggregate` bound (`filter`,
`order`, etc). On nightly, it also implies `T::IsAggregate:
MixedAggregates<is_aggregate::No, Output = is_aggregate::No>`. In
English, it implies that this is valid with no group by clause, and its
output can appear with an expression which is not aggregate (but might
be with a different group by clause). The outcome of this is that `T:
NonAggregate` implies `(T, Other): NonAggregate`. However the missing
link from both stable and unstable is `is_aggregate::No:
MixedAggregates<T::IsAggregate>` being implied, which would mean
`(Other, T): NonAggregate` would be implied.

Ultimately this is a really long-winded way of saying that `T:
NonAggregate` used to imply interactions with other types. That is no
longer consistently true. It's unlikely there are many affected users,
but any that are affected will need to directly have a `ValidGrouping`
bound.

Implementation Notes
---

Because this broke diesel_infer_schema, I had to do a version bump to
get that crate to rely on the released 1.4.

There's a note on the unsoundness of the `NonAggregate` impl of
`Subselect`. I haven't changed the bounds on that impl, but we almost
certainly should address it before 2.0 is released.

I opted to validate the new bound in `SelectDsl`, so folks get errors on
`.select` instead of `.load`. We can't validate this on calls to both
`.select` *and* `.group_by`, since a select statement valid with a group
by is often invalid without one, and a group by clause usually makes the
default select clause invalid.

While this commit does not add proper group by support, I fleshed it out
a bit to test the type constraints. Right now a column only considers
itself valid if there is no group by clause, or if the group by clause
is exactly that column.

I had more to say but I went on a call and lost my train of thought. I
need to come back and finish writing this later.

Notable Limitations
---

`SELECT lower(name) FROM users GROUP BY lower(name)` probably can't be
represented.

Unanswered Questions
---

`BoxableExpression` has been limited to only work when there's no
group by clause, and only work with types which are `is_aggregate::No`.
This is probably not what we want to do, but was the smallest change to
start.
@traviscross traviscross added disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this RFC. and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this RFC. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants