-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow overlapping implementations for marker traits #1268
Conversation
The currently presented version of specialization won't allow this. Your proposal is basically lattice impls for marker traits. |
The integration tests file and the primitives.rs file were getting ridiculously huge. This breaks them both up into more logical components. This will make it easier to unit test the various types. While we can't move all the integration tests into the `tests` folder until either specialization, or rust-lang/rfcs#1268 lands, due to how we're having to implement `SelectableColumn` for `JoinSource`, this doesn't affet the integration tests for the type casting behavior, and we can safely move those over.
@sgrif First, sorry for the very overdue response; I'm behind on my RFC shepherding. Second: some improvements in this area would be most welcome. For example, the design I recently proposed for a trait here won't work without some adjustment to coherence for marker traits. Simply allowing overlap for marker traits (as proposed here) seems like a pure win. But I wonder if you've thought much about negative impls and potential overlap there? For example, in the comment I linked above, I propose something like: use std::cell::UnsafeCell;
trait ExnSafe {}
impl ExnSafe for .. {}
trait RefExnSafe {}
impl RefExnSafe for .. {}
impl<'a, T: RefExnSafe> ExnSafe for &'a T {}
impl<'a, T> !ExnSafe for &'a mut T {}
impl<T> !ExnSafe for UnsafeCell<T> {}
impl<T: 'static + Send> ExnSafe for T {}
// Imagine a variant of RefCell that did poisoning:
impl<T> ExnSafe for PoisoningRefCell<T> {}
struct AssertExnSafe<T>(T);
impl<T> ExnSafe for AssertExnSafe<T> {}
impl<T> Deref for AssertExnSafe<T> { ... }
impl<T> DerefMut for AssertExnSafe<T> { ... }
fn recover<F: FnOnce() -> R + ExnSafe, R>(f: F) -> thread::Result<R> which has potential for overlap between positive and negative cases. How should that work out? I've been thinking recently about a more flexible form of specialization, which @arielb1 calls "lattice impls". Whether we go that way or not, I think that we want to provide some way of resolving such ambiguities with specialization, and maybe that's the answer for these conflicts. Regardless, I don't see a lot of downside to moving forward with the simple case of "positive impl" overlap proposed here in the meantime! |
Due to limitations of the language, I still cannot have the tests live in another crate (unfortunate, as it means this is actually impossible to use in any context...) However, we can force a lot of the code changes (mostly namespacing in `macros`) by moving the tests out of `lib.rs`. The two blockers for usage in another crate are the impls for `SelectableColumn` created by the `joinable_inner!` macro (this constitutes a blanket impl for a non-crate-local type). I can define a generic implementation in the library once either rust-lang/rfcs#1268, or specialization lands. The second blocker is the impl for `Queriable` that is generated by `belongs_to!`, as `(A, B)` is not considered crate-local, even if both `A` and `B` are crate local. I can define a generic impl for this one, but only once specialization lands. (I'm not sure that I want to in this case, but it looks like I might not have a choice?) In fact, in the second one, I'm almost certain that I'd rather not have to define a generic type, as I will probably want to define a separate implementation depending on whether the relationship is one-to-many, one-to-one, and optional one-to-one. It also kinda sucks that these restrictions exist at all when the definition comes from a macro in the library's crate. However I understand that for coherence reasons, where the macro is defined makes absolutely no difference.
I'm not certain of what exact restrictions should be in place, but I'm strongly in favor of something like this as it would allow #91 to be implemented as a library. |
The integration tests file and the primitives.rs file were getting ridiculously huge. This breaks them both up into more logical components. This will make it easier to unit test the various types. While we can't move all the integration tests into the `tests` folder until either specialization, or rust-lang/rfcs#1268 lands, due to how we're having to implement `SelectableColumn` for `JoinSource`, this doesn't affet the integration tests for the type casting behavior, and we can safely move those over.
Due to limitations of the language, I still cannot have the tests live in another crate (unfortunate, as it means this is actually impossible to use in any context...) However, we can force a lot of the code changes (mostly namespacing in `macros`) by moving the tests out of `lib.rs`. The two blockers for usage in another crate are the impls for `SelectableColumn` created by the `joinable_inner!` macro (this constitutes a blanket impl for a non-crate-local type). I can define a generic implementation in the library once either rust-lang/rfcs#1268, or specialization lands. The second blocker is the impl for `Queriable` that is generated by `belongs_to!`, as `(A, B)` is not considered crate-local, even if both `A` and `B` are crate local. I can define a generic impl for this one, but only once specialization lands. (I'm not sure that I want to in this case, but it looks like I might not have a choice?) In fact, in the second one, I'm almost certain that I'd rather not have to define a generic type, as I will probably want to define a separate implementation depending on whether the relationship is one-to-many, one-to-one, and optional one-to-one. It also kinda sucks that these restrictions exist at all when the definition comes from a macro in the library's crate. However I understand that for coherence reasons, where the macro is defined makes absolutely no difference.
I hadn't thought too closely about negative impl overlap, but I started working on actually implementing this, and it turns out we do have an explicit test case which got me thinking about it, and illustrates it pretty well. I think the right answer here is to disallow any overlap between a negative impl and a positive impl (excluding |
At this point, we are officially separating the concept of a join from an association, and I am no longer thinking about associations just yet. We'll figure out what they look like later, this will sit underneath. We still need to figure out what selecting from more than two tables looks like, as well. Implementing this revealed a fatal flaw in our infrastructure thus far. The type of a column is not concrete, and changes based on the query. In particular, a outer join can change the type of a column to be nullable. As such, we end up with the following strucutre: - The type of a join is (Left, Nullable<Right>) - The type of a column from the right side when selected from a join is Nullable<Original> (note: We still need to figure out if we can avoid having `Option<Option<T>>` here) - A column of `(Nullable<A>, Nullable<B>)` can be treated as `Nullable<(A, B)>` instead These rules then allow fairly ergonomic working with these columns in various contstructs. Selecting a single column gives you a nullable value. Any sort of data structure, including tuples, can be treated as optional instead of its fields being optional. For a time during implementation, I ended up having the additional parameter on `SelectableColumn` be an associated type, which I really wanted to avoid due to the potential changes in marker trait overlap coming if rust-lang/rfcs#1268 lands. We managed to have it be fairly painless to make it an input type, however. We needed to re-introduce `next_is_null`, to implement the nullable tuple generic. I opted to have it take the number of columns to check, as this is the best way I could find to check if all of the values are null, without adding a method on `FromSql` to ask if it's able to handle nulls, or anything like that. Once again, this caused some werid changes in the compiler error messages. Sometimes we're now getting the same error an extra time, or one time less, or it's asking for a different trait than it was before. I need to follow up on this. As I'm writing this, I'm realizing that I'm probably missing a few tests, which I need to go backfill now.
All of the queriable macros related to joins have caused issues, since they are invalid via the orphan rules for coherence that exist today. While I think we can relax those once specialization lands, or work around it doing something ugly like this: d65df55, neither of those solutions are great right now. This should remove the last blocker that makes this library impossible to use in other crates. We still need to work around the trait coherence rules for the `SelectableColumn` impls in `joinable!`, but that's a much more livable workaround, as it will go away entirely once either specialization or rust-lang/rfcs#1268 lands, as I can replace that with a full blanket impl in the crate itself. Once concern I do have is that we aren't doing much to enforce the relationship between models. One thing that was already true, but is coming to the forefront of my mind is that there's nothing to stop you from doing `users.inner_join(posts).select((users::id, posts::title))` and collecting that into `User { id: i32, name: String }`. Do we even need to care though? Would caring be too tightly conflating the concept of a record and a row from a specific table? Is this ever a case that would be easy to accidentally get wrong? I don't so at this time, you'd have to be pretty specific to get it wrong. I think this is probably fine. Once we add the work-around for the `SelectableColumn` stuff, we can move the rest of the integration tests into the tests directory, and start looking at how places like crates.io might use this library.
555993e has more detailed reasoning. This workaround will eventually just go away as it can be defined generically once the rules for overlap are loosened either by specialization (allowing for lattice) or rust-lang/rfcs#1268. In order to use joins, this macro will need to be called and passed every column for the table on each side.
The main thing that stood out to me was that the final test, `filter_then_select` didn't immedaitely pass once I made it compile. It was ignoring the where clause of the parent source. This re-affirms to me that I need to start looking for a more generic way to implement this, instead of having all of these different concrete types. In the short term, I've removed the default impl to force myself to consider the implications of this method on the rest of the types. For Join sources, I think it still makes sense to always return `None`, since they only work with tables at the moment. It's worth noting this now allows chaining `select` and `filter`, where subsequent calls will override the previous. I think that's fine for `select`, but `filter` should of course chain with `and`. This has a lot of the same caveats that we've seen in the past when moving things off of `Table` and onto `QuerySource`. The tests get a tad bit verbose here, but I'm not sure if that's indicative of a problem. Once again I need to do the selectable expression workaround. This can go away once rust-lang/rfcs#1268 lands. I was unable to add a blanket impl in these cases, since it overlapped with the blanket impls defined for types like `expression::Eq`. This was actually quite surprising to me, since both impls had concrete types which could be known for a fact not to overlap.
We talked about this a bit at last week's lang team meeting, and I think everyone is on board, except for the definition of "marker trait". The consensus was that, for the purposes of this RFC, coherence rules should be relaxed for any trait that does not contain any items, regardless of what it inherits from. In particular, inheriting from a trait that happens to include items doesn't actually introduce coherence problems around the subtrait implementation. Can you update the RFC accordingly? Once you've done that, we can move to final comment period and try to land this. |
The consensus has been that inheriting from a trait which has associated items does not introduce coherence violations for the subtrait.
I've updated the RFC to use that definition of marker trait |
This RFC is entering its Final Comment Period. |
I'm not sure of the right way to interpret this. As far as I can tell, empty traits with non-empty supertraits couldn't possibly have incoherent |
@glaebhoerl Yes; thanks for putting it much more clearly :) |
That said, I think you could actually implement a subtrait incoherently even without doing so for the supertrait (which may also be part of what you meant). Contrived example:
But this also seems fine. |
Was there an analysis of the interaction with the other parts of the type system, and with the interaction with the current implementation strategy (I think this would basically require the new trait system). |
@glaebhoerl Yes, that's what I'm trying to get at: supertraits are simply irrelevant when it comes to the coherence rules of a subtrait, because the impl blocks are separate; we can govern them by separate coherence rules. In particular, if a subtrait doesn't have any items, than there's nothing to be coherent about, which is of course what this RFC is getting at. Hence it makes sense for relaxed coherence rules to apply. @arielb1 Could you be slightly more concrete/specific? In terms of semantic interactions, this proposal doesn't seem problematic, given that the coherence property is basically vacuous in these cases. |
I want to find any surprises before, not after, this is stabilized, especially as I am not sure what is the best way to stabilize this. |
@arielb1 I'm definitely on board with that! But what I mean is, I'm not sure if you have any concrete worries in mind? Certainly both @nikomatsakis and I have given this RFC some thought in terms of semantic interactions and haven't turned up anything problematic, but you're often a little more devious, so maybe you see something fishy? |
Nothing particularly, except that I don't feel like we understand the current system well enough. On the other hand, this is both not stronger than lattice impls (which we may want), and feels to be semantically equivalent to fixing rust-lang/rust#27544, which we (probably) want to. |
For example, there was the OIBIT = negative impls interaction. |
One thing that worries me is that we could have non-deterministic effects about which impl gets used when inference, particularly region inference, is involved. impl<'a, 'b> Marker<'a> for &'b T {}
impl<'a> Marker<'a> for T where T: 'a {} |
Tracking issue is rust-lang/rust#29864 |
Thanks! Sorry for the delay on following up |
I accidentally rebased when I landed this, so I am going to close the PR manually. |
This RFC was written at a time where the specialization RFC appeared to include the lattice rule. Since the RFC was accepted without it, this RFC implies that it is superseded by specialization, but it is not.
Update the alternative section for #1268
Support an explicit annotation for marker traits From the tracking issue for rust-lang/rfcs#1268: > It seems obvious that we should make a `#[marker]` annotation. ~ #29864 (comment) This PR allows you to put `#[marker]` on a trait, at which point: - [x] The trait must not have any items ~~All of the trait's items must have defaults~~ - [x] Any impl of the trait must be empty (not override any items) - [x] But impls of the trait are allowed to overlap r? @nikomatsakis
Could you update the link at the top now it's been merged 😁 |
Done. |
Thanks very much :D |
This commit implements the proposal laid out at https://discourse.diesel.rs/t/proposed-change-replace-nonaggregate-with-something-less-broken-that-can-support-group-by/18. The changes made in this commit now correctly enforce semantics around the mixing of aggregate/non-aggregate expressions, and lays the foundations required for full `GROUP BY` support in the future. This commit does not implement `GROUP BY` as a feature in public API, though I have added some minimal support to prove the soundness of the change. Since this will likely be the largest breaking change in 2.0, I am going to include as much detail as I can about the problem, the reasoning behind the solution, and the likely impact of the change. Recap of the problem ---- `NonAggregate` was originally introduced in c66d96f. The goal was to prevent the following error at compile time: [local] sean@sean=# select count(*), name from users; ERROR: 42803: column "users.name" must appear in the GROUP BY clause or be used in an aggregate function I didn't think about this nearly as much as I should have at the time, which resulted in a hilariously broken implementation that has prevented the addition of `group_by` support, and left bugs in the codebase for more than 4 years. At the time I was focused on "mixing aggregate and non-aggregate expressions in a select statement". Since select uses tuples to represent multiple values, I added a trait to represent non-aggregate values, added it as a bound for `impl Expression for AnyTuple`, and called it a day. This had a number of problems. The most obvious was that it prevented valid queries involving multiple aggregate expressions. At the time I figured we'd have a separate trait for aggregate expressions, and then `impl Expression for AnyTuple where AllFields: NonAggregate | AllFields: Aggregate`. This eventually led to [RFC #1268](rust-lang/rfcs#1268), which doesn't seem likely to be stabilized soon, and even if it were we still have the other issues with this solution. The next issue is that you cannot say whether a given expression is aggregate by looking at it on its own. The answer to "Is `users`.`name` aggregate?" depends on whether or not that column appears in the group by clause. So any trait which correctly handles this must be generic over the group by clause, or it cannot be correctly implemented for columns. However, the most egregious issue with that commit is that it does not handle *any* composite expressions besides tuples. Even today, `.select((count_star(), id))` fails, but `.select(count_star() + id)` will compile (and either error at runtime or give non-deterministic results depending on your backend). User Impact ---- This change will unfortunately have more breakage than I had anticipated. That said, the breakage only affects *extremely* generic code, and I do not believe that the majority of our users will notice or care about this change. There are three main ways in which the breakage will affect users: - The additional bound on `SelectStatement<...>: SelectDsl` and `SelectStatement<...>: Query`. - This one broke our test suite in a few places, mainly where we have *really* generic code to say "I can select T.eq(sql_string)". I do not believe this is representative of real code. - I did a GitHub-wide search of all occurances of `SelectableExpression` (which is *not* the bound on the public API, but is the bound on `SelectStatement`'s impl, and would point to broken code). I found one repo which is relying on internals that will break, and a lot of results from Wundergraph. I didnt' look at those. This probably breaks Wundergraph. Sorry, Georg. It should be easy to fix I promise. - `NonAggregate` can no longer be directly implemented - I did a GitHub-wide search for `NonAggregate`. With one exception, every repo implementing this trait directly was on pre-1.0. The only affected repo is manually implementing `Expression` instead of using `#[derive(AsExpression)]`. With that change they will be future-proofed. - `T: NonAggregate` no longer implies `(OtherType, T): NonAggregate` - This broke `infer_schema`, since it was relying on `AssocType: NonAggregate` to know `(known_column, AssocType, known_column): Expression`. Ironically, that's no longer important, it did still break due to the first item on this list. I could not find any code in the wild that fell into this category. Originally I thought that the only code which would be affected by this is code that wrote `impl NonAggregate`, since that code would need to change to `impl ValidGrouping`. However, I missed a few things when I made that assessment. The first is that... Well the feature exists. The whole point of this was to prevent `aggregate + non_aggregate` from compiling when passed to select, which implies a new trait bound *somewhere*. While `SelectStatement` and its impls are private API, it's really easy to couple yourself ot the bounds on those impls. It doesn't help that `rustc` literally recommends that folks do that when errors occur. Any code that is written as `Foo: SelectDsl<Selection>` will be fine, but any code that's gone down the rabbit hole and has copied the bounds from `impl SelectDsl for SelectStatement<...>` will break. I didn't find many cases in the wild, but I do think it's relatively common. The second thing I missed is that "is this aggregate" is not a binary question. Originally I expected we would have two answers to the question, and compound expressions would enforce that their sub-expressions all had the same answer. The issue is that `1` doesn't care. You can have `count(*) + 1`, and you can also have `non_aggregate + 1`. `1` is always valid, regardless of aggregation. So we need three answers. The problem is that this means we can't have `T: NonAggregate` mean everything it used to. On stable `T: NonAggregate` will mean `T: ValidGrouping<()>`, and that `T` can be passed to everything with a `NonAggregate` bound (`filter`, `order`, etc). On nightly, it also implies `T::IsAggregate: MixedAggregates<is_aggregate::No, Output = is_aggregate::No>`. In English, it implies that this is valid with no group by clause, and its output can appear with an expression which is not aggregate (but might be with a different group by clause). The outcome of this is that `T: NonAggregate` implies `(T, Other): NonAggregate`. However the missing link from both stable and unstable is `is_aggregate::No: MixedAggregates<T::IsAggregate>` being implied, which would mean `(Other, T): NonAggregate` would be implied. Ultimately this is a really long-winded way of saying that `T: NonAggregate` used to imply interactions with other types. That is no longer consistently true. It's unlikely there are many affected users, but any that are affected will need to directly have a `ValidGrouping` bound. Implementation Notes --- Because this broke diesel_infer_schema, I had to do a version bump to get that crate to rely on the released 1.4. There's a note on the unsoundness of the `NonAggregate` impl of `Subselect`. I haven't changed the bounds on that impl, but we almost certainly should address it before 2.0 is released. I opted to validate the new bound in `SelectDsl`, so folks get errors on `.select` instead of `.load`. We can't validate this on calls to both `.select` *and* `.group_by`, since a select statement valid with a group by is often invalid without one, and a group by clause usually makes the default select clause invalid. While this commit does not add proper group by support, I fleshed it out a bit to test the type constraints. Right now a column only considers itself valid if there is no group by clause, or if the group by clause is exactly that column. I had more to say but I went on a call and lost my train of thought. I need to come back and finish writing this later. Notable Limitations --- `SELECT lower(name) FROM users GROUP BY lower(name)` probably can't be represented. Unanswered Questions --- `BoxableExpression` has been limited to only work when there's no group by clause, and only work with types which are `is_aggregate::No`. This is probably not what we want to do, but was the smallest change to start.
This commit implements the proposal laid out at https://discourse.diesel.rs/t/proposed-change-replace-nonaggregate-with-something-less-broken-that-can-support-group-by/18. The changes made in this commit now correctly enforce semantics around the mixing of aggregate/non-aggregate expressions, and lays the foundations required for full `GROUP BY` support in the future. This commit does not implement `GROUP BY` as a feature in public API, though I have added some minimal support to prove the soundness of the change. Since this will likely be the largest breaking change in 2.0, I am going to include as much detail as I can about the problem, the reasoning behind the solution, and the likely impact of the change. Recap of the problem ---- `NonAggregate` was originally introduced in c66d96f. The goal was to prevent the following error at compile time: [local] sean@sean=# select count(*), name from users; ERROR: 42803: column "users.name" must appear in the GROUP BY clause or be used in an aggregate function I didn't think about this nearly as much as I should have at the time, which resulted in a hilariously broken implementation that has prevented the addition of `group_by` support, and left bugs in the codebase for more than 4 years. At the time I was focused on "mixing aggregate and non-aggregate expressions in a select statement". Since select uses tuples to represent multiple values, I added a trait to represent non-aggregate values, added it as a bound for `impl Expression for AnyTuple`, and called it a day. This had a number of problems. The most obvious was that it prevented valid queries involving multiple aggregate expressions. At the time I figured we'd have a separate trait for aggregate expressions, and then `impl Expression for AnyTuple where AllFields: NonAggregate | AllFields: Aggregate`. This eventually led to [RFC #1268](rust-lang/rfcs#1268), which doesn't seem likely to be stabilized soon, and even if it were we still have the other issues with this solution. The next issue is that you cannot say whether a given expression is aggregate by looking at it on its own. The answer to "Is `users`.`name` aggregate?" depends on whether or not that column appears in the group by clause. So any trait which correctly handles this must be generic over the group by clause, or it cannot be correctly implemented for columns. However, the most egregious issue with that commit is that it does not handle *any* composite expressions besides tuples. Even today, `.select((count_star(), id))` fails, but `.select(count_star() + id)` will compile (and either error at runtime or give non-deterministic results depending on your backend). User Impact ---- This change will unfortunately have more breakage than I had anticipated. That said, the breakage only affects *extremely* generic code, and I do not believe that the majority of our users will notice or care about this change. There are three main ways in which the breakage will affect users: - The additional bound on `SelectStatement<...>: SelectDsl` and `SelectStatement<...>: Query`. - This one broke our test suite in a few places, mainly where we have *really* generic code to say "I can select T.eq(sql_string)". I do not believe this is representative of real code. - I did a GitHub-wide search of all occurances of `SelectableExpression` (which is *not* the bound on the public API, but is the bound on `SelectStatement`'s impl, and would point to broken code). I found one repo which is relying on internals that will break, and a lot of results from Wundergraph. I didnt' look at those. This probably breaks Wundergraph. Sorry, Georg. It should be easy to fix I promise. - `NonAggregate` can no longer be directly implemented - I did a GitHub-wide search for `NonAggregate`. With one exception, every repo implementing this trait directly was on pre-1.0. The only affected repo is manually implementing `Expression` instead of using `#[derive(AsExpression)]`. With that change they will be future-proofed. - `T: NonAggregate` no longer implies `(OtherType, T): NonAggregate` - This broke `infer_schema`, since it was relying on `AssocType: NonAggregate` to know `(known_column, AssocType, known_column): Expression`. Ironically, that's no longer important, it did still break due to the first item on this list. I could not find any code in the wild that fell into this category. Originally I thought that the only code which would be affected by this is code that wrote `impl NonAggregate`, since that code would need to change to `impl ValidGrouping`. However, I missed a few things when I made that assessment. The first is that... Well the feature exists. The whole point of this was to prevent `aggregate + non_aggregate` from compiling when passed to select, which implies a new trait bound *somewhere*. While `SelectStatement` and its impls are private API, it's really easy to couple yourself ot the bounds on those impls. It doesn't help that `rustc` literally recommends that folks do that when errors occur. Any code that is written as `Foo: SelectDsl<Selection>` will be fine, but any code that's gone down the rabbit hole and has copied the bounds from `impl SelectDsl for SelectStatement<...>` will break. I didn't find many cases in the wild, but I do think it's relatively common. The second thing I missed is that "is this aggregate" is not a binary question. Originally I expected we would have two answers to the question, and compound expressions would enforce that their sub-expressions all had the same answer. The issue is that `1` doesn't care. You can have `count(*) + 1`, and you can also have `non_aggregate + 1`. `1` is always valid, regardless of aggregation. So we need three answers. The problem is that this means we can't have `T: NonAggregate` mean everything it used to. On stable `T: NonAggregate` will mean `T: ValidGrouping<()>`, and that `T` can be passed to everything with a `NonAggregate` bound (`filter`, `order`, etc). On nightly, it also implies `T::IsAggregate: MixedAggregates<is_aggregate::No, Output = is_aggregate::No>`. In English, it implies that this is valid with no group by clause, and its output can appear with an expression which is not aggregate (but might be with a different group by clause). The outcome of this is that `T: NonAggregate` implies `(T, Other): NonAggregate`. However the missing link from both stable and unstable is `is_aggregate::No: MixedAggregates<T::IsAggregate>` being implied, which would mean `(Other, T): NonAggregate` would be implied. Ultimately this is a really long-winded way of saying that `T: NonAggregate` used to imply interactions with other types. That is no longer consistently true. It's unlikely there are many affected users, but any that are affected will need to directly have a `ValidGrouping` bound. Implementation Notes --- Because this broke diesel_infer_schema, I had to do a version bump to get that crate to rely on the released 1.4. There's a note on the unsoundness of the `NonAggregate` impl of `Subselect`. I haven't changed the bounds on that impl, but we almost certainly should address it before 2.0 is released. I opted to validate the new bound in `SelectDsl`, so folks get errors on `.select` instead of `.load`. We can't validate this on calls to both `.select` *and* `.group_by`, since a select statement valid with a group by is often invalid without one, and a group by clause usually makes the default select clause invalid. While this commit does not add proper group by support, I fleshed it out a bit to test the type constraints. Right now a column only considers itself valid if there is no group by clause, or if the group by clause is exactly that column. I had more to say but I went on a call and lost my train of thought. I need to come back and finish writing this later. Notable Limitations --- `SELECT lower(name) FROM users GROUP BY lower(name)` probably can't be represented. Unanswered Questions --- `BoxableExpression` has been limited to only work when there's no group by clause, and only work with types which are `is_aggregate::No`. This is probably not what we want to do, but was the smallest change to start.
This commit implements the proposal laid out at https://discourse.diesel.rs/t/proposed-change-replace-nonaggregate-with-something-less-broken-that-can-support-group-by/18. The changes made in this commit now correctly enforce semantics around the mixing of aggregate/non-aggregate expressions, and lays the foundations required for full `GROUP BY` support in the future. This commit does not implement `GROUP BY` as a feature in public API, though I have added some minimal support to prove the soundness of the change. Since this will likely be the largest breaking change in 2.0, I am going to include as much detail as I can about the problem, the reasoning behind the solution, and the likely impact of the change. Recap of the problem ---- `NonAggregate` was originally introduced in c66d96f. The goal was to prevent the following error at compile time: [local] sean@sean=# select count(*), name from users; ERROR: 42803: column "users.name" must appear in the GROUP BY clause or be used in an aggregate function I didn't think about this nearly as much as I should have at the time, which resulted in a hilariously broken implementation that has prevented the addition of `group_by` support, and left bugs in the codebase for more than 4 years. At the time I was focused on "mixing aggregate and non-aggregate expressions in a select statement". Since select uses tuples to represent multiple values, I added a trait to represent non-aggregate values, added it as a bound for `impl Expression for AnyTuple`, and called it a day. This had a number of problems. The most obvious was that it prevented valid queries involving multiple aggregate expressions. At the time I figured we'd have a separate trait for aggregate expressions, and then `impl Expression for AnyTuple where AllFields: NonAggregate | AllFields: Aggregate`. This eventually led to [RFC #1268](rust-lang/rfcs#1268), which doesn't seem likely to be stabilized soon, and even if it were we still have the other issues with this solution. The next issue is that you cannot say whether a given expression is aggregate by looking at it on its own. The answer to "Is `users`.`name` aggregate?" depends on whether or not that column appears in the group by clause. So any trait which correctly handles this must be generic over the group by clause, or it cannot be correctly implemented for columns. However, the most egregious issue with that commit is that it does not handle *any* composite expressions besides tuples. Even today, `.select((count_star(), id))` fails, but `.select(count_star() + id)` will compile (and either error at runtime or give non-deterministic results depending on your backend). User Impact ---- This change will unfortunately have more breakage than I had anticipated. That said, the breakage only affects *extremely* generic code, and I do not believe that the majority of our users will notice or care about this change. There are three main ways in which the breakage will affect users: - The additional bound on `SelectStatement<...>: SelectDsl` and `SelectStatement<...>: Query`. - This one broke our test suite in a few places, mainly where we have *really* generic code to say "I can select T.eq(sql_string)". I do not believe this is representative of real code. - I did a GitHub-wide search of all occurances of `SelectableExpression` (which is *not* the bound on the public API, but is the bound on `SelectStatement`'s impl, and would point to broken code). I found one repo which is relying on internals that will break, and a lot of results from Wundergraph. I didnt' look at those. This probably breaks Wundergraph. Sorry, Georg. It should be easy to fix I promise. - `NonAggregate` can no longer be directly implemented - I did a GitHub-wide search for `NonAggregate`. With one exception, every repo implementing this trait directly was on pre-1.0. The only affected repo is manually implementing `Expression` instead of using `#[derive(AsExpression)]`. With that change they will be future-proofed. - `T: NonAggregate` no longer implies `(OtherType, T): NonAggregate` - This broke `infer_schema`, since it was relying on `AssocType: NonAggregate` to know `(known_column, AssocType, known_column): Expression`. Ironically, that's no longer important, it did still break due to the first item on this list. I could not find any code in the wild that fell into this category. Originally I thought that the only code which would be affected by this is code that wrote `impl NonAggregate`, since that code would need to change to `impl ValidGrouping`. However, I missed a few things when I made that assessment. The first is that... Well the feature exists. The whole point of this was to prevent `aggregate + non_aggregate` from compiling when passed to select, which implies a new trait bound *somewhere*. While `SelectStatement` and its impls are private API, it's really easy to couple yourself ot the bounds on those impls. It doesn't help that `rustc` literally recommends that folks do that when errors occur. Any code that is written as `Foo: SelectDsl<Selection>` will be fine, but any code that's gone down the rabbit hole and has copied the bounds from `impl SelectDsl for SelectStatement<...>` will break. I didn't find many cases in the wild, but I do think it's relatively common. The second thing I missed is that "is this aggregate" is not a binary question. Originally I expected we would have two answers to the question, and compound expressions would enforce that their sub-expressions all had the same answer. The issue is that `1` doesn't care. You can have `count(*) + 1`, and you can also have `non_aggregate + 1`. `1` is always valid, regardless of aggregation. So we need three answers. The problem is that this means we can't have `T: NonAggregate` mean everything it used to. On stable `T: NonAggregate` will mean `T: ValidGrouping<()>`, and that `T` can be passed to everything with a `NonAggregate` bound (`filter`, `order`, etc). On nightly, it also implies `T::IsAggregate: MixedAggregates<is_aggregate::No, Output = is_aggregate::No>`. In English, it implies that this is valid with no group by clause, and its output can appear with an expression which is not aggregate (but might be with a different group by clause). The outcome of this is that `T: NonAggregate` implies `(T, Other): NonAggregate`. However the missing link from both stable and unstable is `is_aggregate::No: MixedAggregates<T::IsAggregate>` being implied, which would mean `(Other, T): NonAggregate` would be implied. Ultimately this is a really long-winded way of saying that `T: NonAggregate` used to imply interactions with other types. That is no longer consistently true. It's unlikely there are many affected users, but any that are affected will need to directly have a `ValidGrouping` bound. Implementation Notes --- Because this broke diesel_infer_schema, I had to do a version bump to get that crate to rely on the released 1.4. There's a note on the unsoundness of the `NonAggregate` impl of `Subselect`. I haven't changed the bounds on that impl, but we almost certainly should address it before 2.0 is released. I opted to validate the new bound in `SelectDsl`, so folks get errors on `.select` instead of `.load`. We can't validate this on calls to both `.select` *and* `.group_by`, since a select statement valid with a group by is often invalid without one, and a group by clause usually makes the default select clause invalid. While this commit does not add proper group by support, I fleshed it out a bit to test the type constraints. Right now a column only considers itself valid if there is no group by clause, or if the group by clause is exactly that column. I had more to say but I went on a call and lost my train of thought. I need to come back and finish writing this later. Notable Limitations --- `SELECT lower(name) FROM users GROUP BY lower(name)` probably can't be represented. Unanswered Questions --- `BoxableExpression` has been limited to only work when there's no group by clause, and only work with types which are `is_aggregate::No`. This is probably not what we want to do, but was the smallest change to start.
This commit implements the proposal laid out at https://discourse.diesel.rs/t/proposed-change-replace-nonaggregate-with-something-less-broken-that-can-support-group-by/18. The changes made in this commit now correctly enforce semantics around the mixing of aggregate/non-aggregate expressions, and lays the foundations required for full `GROUP BY` support in the future. This commit does not implement `GROUP BY` as a feature in public API, though I have added some minimal support to prove the soundness of the change. Since this will likely be the largest breaking change in 2.0, I am going to include as much detail as I can about the problem, the reasoning behind the solution, and the likely impact of the change. Recap of the problem ---- `NonAggregate` was originally introduced in c66d96f. The goal was to prevent the following error at compile time: [local] sean@sean=# select count(*), name from users; ERROR: 42803: column "users.name" must appear in the GROUP BY clause or be used in an aggregate function I didn't think about this nearly as much as I should have at the time, which resulted in a hilariously broken implementation that has prevented the addition of `group_by` support, and left bugs in the codebase for more than 4 years. At the time I was focused on "mixing aggregate and non-aggregate expressions in a select statement". Since select uses tuples to represent multiple values, I added a trait to represent non-aggregate values, added it as a bound for `impl Expression for AnyTuple`, and called it a day. This had a number of problems. The most obvious was that it prevented valid queries involving multiple aggregate expressions. At the time I figured we'd have a separate trait for aggregate expressions, and then `impl Expression for AnyTuple where AllFields: NonAggregate | AllFields: Aggregate`. This eventually led to [RFC #1268](rust-lang/rfcs#1268), which doesn't seem likely to be stabilized soon, and even if it were we still have the other issues with this solution. The next issue is that you cannot say whether a given expression is aggregate by looking at it on its own. The answer to "Is `users`.`name` aggregate?" depends on whether or not that column appears in the group by clause. So any trait which correctly handles this must be generic over the group by clause, or it cannot be correctly implemented for columns. However, the most egregious issue with that commit is that it does not handle *any* composite expressions besides tuples. Even today, `.select((count_star(), id))` fails, but `.select(count_star() + id)` will compile (and either error at runtime or give non-deterministic results depending on your backend). User Impact ---- This change will unfortunately have more breakage than I had anticipated. That said, the breakage only affects *extremely* generic code, and I do not believe that the majority of our users will notice or care about this change. There are three main ways in which the breakage will affect users: - The additional bound on `SelectStatement<...>: SelectDsl` and `SelectStatement<...>: Query`. - This one broke our test suite in a few places, mainly where we have *really* generic code to say "I can select T.eq(sql_string)". I do not believe this is representative of real code. - I did a GitHub-wide search of all occurances of `SelectableExpression` (which is *not* the bound on the public API, but is the bound on `SelectStatement`'s impl, and would point to broken code). I found one repo which is relying on internals that will break, and a lot of results from Wundergraph. I didnt' look at those. This probably breaks Wundergraph. Sorry, Georg. It should be easy to fix I promise. - `NonAggregate` can no longer be directly implemented - I did a GitHub-wide search for `NonAggregate`. With one exception, every repo implementing this trait directly was on pre-1.0. The only affected repo is manually implementing `Expression` instead of using `#[derive(AsExpression)]`. With that change they will be future-proofed. - `T: NonAggregate` no longer implies `(OtherType, T): NonAggregate` - This broke `infer_schema`, since it was relying on `AssocType: NonAggregate` to know `(known_column, AssocType, known_column): Expression`. Ironically, that's no longer important, it did still break due to the first item on this list. I could not find any code in the wild that fell into this category. Originally I thought that the only code which would be affected by this is code that wrote `impl NonAggregate`, since that code would need to change to `impl ValidGrouping`. However, I missed a few things when I made that assessment. The first is that... Well the feature exists. The whole point of this was to prevent `aggregate + non_aggregate` from compiling when passed to select, which implies a new trait bound *somewhere*. While `SelectStatement` and its impls are private API, it's really easy to couple yourself ot the bounds on those impls. It doesn't help that `rustc` literally recommends that folks do that when errors occur. Any code that is written as `Foo: SelectDsl<Selection>` will be fine, but any code that's gone down the rabbit hole and has copied the bounds from `impl SelectDsl for SelectStatement<...>` will break. I didn't find many cases in the wild, but I do think it's relatively common. The second thing I missed is that "is this aggregate" is not a binary question. Originally I expected we would have two answers to the question, and compound expressions would enforce that their sub-expressions all had the same answer. The issue is that `1` doesn't care. You can have `count(*) + 1`, and you can also have `non_aggregate + 1`. `1` is always valid, regardless of aggregation. So we need three answers. The problem is that this means we can't have `T: NonAggregate` mean everything it used to. On stable `T: NonAggregate` will mean `T: ValidGrouping<()>`, and that `T` can be passed to everything with a `NonAggregate` bound (`filter`, `order`, etc). On nightly, it also implies `T::IsAggregate: MixedAggregates<is_aggregate::No, Output = is_aggregate::No>`. In English, it implies that this is valid with no group by clause, and its output can appear with an expression which is not aggregate (but might be with a different group by clause). The outcome of this is that `T: NonAggregate` implies `(T, Other): NonAggregate`. However the missing link from both stable and unstable is `is_aggregate::No: MixedAggregates<T::IsAggregate>` being implied, which would mean `(Other, T): NonAggregate` would be implied. Ultimately this is a really long-winded way of saying that `T: NonAggregate` used to imply interactions with other types. That is no longer consistently true. It's unlikely there are many affected users, but any that are affected will need to directly have a `ValidGrouping` bound. Implementation Notes --- Because this broke diesel_infer_schema, I had to do a version bump to get that crate to rely on the released 1.4. There's a note on the unsoundness of the `NonAggregate` impl of `Subselect`. I haven't changed the bounds on that impl, but we almost certainly should address it before 2.0 is released. I opted to validate the new bound in `SelectDsl`, so folks get errors on `.select` instead of `.load`. We can't validate this on calls to both `.select` *and* `.group_by`, since a select statement valid with a group by is often invalid without one, and a group by clause usually makes the default select clause invalid. While this commit does not add proper group by support, I fleshed it out a bit to test the type constraints. Right now a column only considers itself valid if there is no group by clause, or if the group by clause is exactly that column. I had more to say but I went on a call and lost my train of thought. I need to come back and finish writing this later. Notable Limitations --- `SELECT lower(name) FROM users GROUP BY lower(name)` probably can't be represented. Unanswered Questions --- `BoxableExpression` has been limited to only work when there's no group by clause, and only work with types which are `is_aggregate::No`. This is probably not what we want to do, but was the smallest change to start.
rendered
(Despite showing as closed, this RFC was accepted and merged.)