-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concatenation Rollups order maybe none Deterministic - Consider moving order by from LREngine.RollupSummaryField to LREngine.Context #239
Comments
… ordering issue
…lds and move to context Issue SFDO-Community#216 - Add support for multiple order by fields and optional ASC/DESC and NULLS FIRST/LAST Issue SFDO-Community#239 - Move order by clause to LREngine Context
Thanks for the options here John... RE: 1, agree. RE: 2, agree with your edit, lets keep the order by for the lookup RE: 3, i think the existing documentation would still stand given the revised direction of 2 above, yes? |
I've also marked this as a bug and revised the title a little. I think some users may have got this from time to time, but cannot be sure. Anyway its a good spot and lets get it fixed. |
Glad you agree @afawcett, I definitely feel like this approach makes the most sense and like you mentioned, addresses some issues that others might have experienced in the past. Regarding #2, this PR maintains the lookup field in the order by. Regarding #3, the docs state "Defaults to the field given in Field to Aggregate" which, given this PR, would no longer be applicable. When no order by is specified, the only field that would be specified in the SOQL projection would be the RelationshipField (lookupfield on the LRE side). See jondavis9898@574302d#diff-958fe84b38eacf8b58f949ab104a8504R185 for reference. |
RE: #3, ah i see, so the reason to not continue to support defaulting order by to Field to Aggregate would be that it basically stops any kind of context sharing, have i got that correct? |
Exactly. If we default the fieldToAggregate, every rollup that doesn't contain an orderby will end up with a unique context (except for situations where the fieldtoaggregate is the same in which case they would share one). My thinking is that since no orderby was specified, it's correct to assume completely non-deterministic approach and therefore we can share context when all other criteria align (reduces limit consumption and improves perf). Along these lines, it might be a good idea to update the docs "best practices" to suggest specifying orderby when using query based rollups (concat, first, last, etc.). I don't think it would be outrageous to even require an orderby for query based rollups but I could see situations where the user doesn't care what the order is and therefore requiring one might be too restrictive. I think a doc update is likely the right course here. |
Ok i understand, the only issue then is one of backwards compatibility. Since Order By on the rollup def is not a required field. Need to think of a way provide this without breaking backwards compatibility i think. Tricky one, as in some cases its a bug and in other cases expected behaviour. I guess we can detect this at runtime and invoke the old behaviour when its safe to do so? |
Sorry @afawcett, not sure I follow. In the code previous to this PR, ordering was non-deterministic when order by wasn't specified due to the ThenBy approach. It was a controlled orderby but non-deterministic nonetheless. With this PR, it remains non-deterministic. The only situation I can think of where a user would see different results than what they "expect" is when they were taking advantage of the fieldtoaggregate being defaulted in to the orderby AND they had no other rollups on the same parent/child/criteria that did not not have an orderby. If they did, they would have experienced the "ThenBy" projection and they wouldn't have been getting the fieldtoaggregate as the default for anything other than the first rollup field in the context. Even in that case, the order of the rollupfields that were added to the context was also non-deterministic so their results could have varied as they added more rollups. Possibly I'm misunderstanding what you are thinking? |
"The only situation I can think of where a user would see different results than what they "expect" is when they were taking advantage of the fieldtoaggregate being defaulted in to the orderby AND they had no other rollups on the same parent/child/criteria that did not not have an orderby." Yeah thats the one i'm thinking, if they used a concatenate operation the behaviour would change if the code stopped using the field to aggregate by? |
Yeah, the behavior would change in this case for any query based rollup (concat, concat distinct, first, last, etc.). However, if they have two (or more) rollups defined on the same parent/child, each query based and each with no order by, they are currently encountering different results than they would expect when assuming default fieldtoaggregate is added to the orderby. Here's what I'm thinking, let me know which you think is best:
I hate to do anything that risks breaking backward compat and believe firmly that we should try to maintain backwards compat as much as possible whenever possible. That said, in this case, it would be interesting to know the telemitry/analytics around how many users have only a single rollup field on a parent/child and that rollup does not have an order by. Unfortunately, I don't think there is any way to gather those metrics (maybe a new idea for DLRS future enhancement :)). You would definitely have a better idea than I would on how common this scenario is but I'm guessing it's not a large number. In most cases, if a user wanted an order by, I think they are more likely to have input one than rely on the default and even then, it would only work when there is only a single rollup. For this reason, I'm kind of partial to option 1 so we have a clean baseline to move forward with. That said, I'm happy to implement 2 if you'd like and can have it done rather quickly. I'm very hopeful to try to get this PR in to the next release as I'm in need of this feature & fixes in our app. Let me know your thoughts. Thanks! |
Yeah tough one, i'm also very worried about maintaining backwards compat, especially since this is become more mission critical for folks and often consultants upgrading might not seek to always read or have enough knowledge of existing rollups, so i'm super sensitive to this. I could actually get the Salesforce Customer Metrics feature enabled here, but that will just tell me the number of records in the objects, not the contents of course. Lets see how we get on maintaining the backwards compat here, happy to take the 'I told you so' in the future.... ;-) So your option 2 is exactly what i had in mind above when i was thinking about a runtime detection. I'm actually packaging now the currently tagged for next release items. I can hold if you want to get this one into the next release and maybe package it tomorrow. |
Actually i had just done an upload.... I'll release as is, happy to do another tomorrow if you really need it. |
Sounds like a plan on 2, definitely appreciate maintaining status quo as much as possible. I'm working on the change now and should have it uploaded soon. No worries on release package timing and it's not critical that this PR is included tomorrow. Aside from your normal items, I know your likely swamped with DF stuff, but if you have time tomorrow and/or in the near future and could get another new version out with this PR I'd greatly appreciate it. |
Cool. Do you have a Twitter and/or Salefsorce Community user btw? I've credited you just now on Twitter and in the Chatter Group for the tool, but cannot @ you in those communities. |
Thanks for the mention, happy to help improve the tool. I'm @jondavis9898 on twitter. I just updated PR #241 with option 2. Let me know what you think. Thanks! |
One thing I thought of that is a gap we could close is the following scenario: Two rollups defined
In the current code, this will result in 2 contexts being generated, two SOQL queries, etc but they'll each order by Amount__c (due to default fieldtoaggregate). The reason for this is that DLRS contextKey uses orderBy in key and orderBy for 1 will be 'Amount__c' and for 2 will be blank/emptystring. There is an optimization that would be made here to become more intelligent at the DLRS level when identifying which context to use. This comes in to play in a few situations but especially in the scenario where we default in fieldtoaggregate. To handle this we would have to project the order by within DLRS when order by isn't specified instead of within LRE. Doing this would give us an opportunity to optimize some scenarios and share context (still supporting backwards compat) when the current logic would result in separate contexts. However, this could get tricky, need to think through it some more. If you think this is worth exploring, let me know and I'll create an issue to track. |
I thought some more about this and I think putting together a solution that reduces overall limit usage while still providing the functionality desired wouldn't be too difficult. In short, what I'm thinking is that we'd build in a more intelligent "context" mapper. We would do this inside of DLRS and remove the size() == 1 that I added to LREngine. Instead of choosing a context on each iteration through rollup building, we would build all the rollups and then take that list to create/choose the context. This gives us full visibility to all rollups and all potential contexts that are required before we start creating contexts. This would provide the following features/benefits:
To do all this, we would first create contexts for all rollups that have an orderby and then create contexts in a second pass for rollups that do not have an orderby. Seems kind of complex now that I type it all out but I think it would be rather straightforward to implement. The only concern I would have is whether or not this additional logic to "find" a context would outweigh the perf gain and reduced limit consumption that would result from having that logic in place. The only way to know what would be to gather some metrics with perf testing. Look forward to your thoughts. |
I decided to go ahead and implement the approach I described in my last comment so we can consider it along with PR #241. I've actually created two different approaches and describe all 3 below.
I've added tests to all 3 PRs (RollupServiceTest4.cls) to demonstrate the differences:
I'll leave it to you to determine which of the 3 approaches should be chosen. Here's my thoughts:
Note that I did run some "non-scientific" performance analysis on #241, #243 & #244 and the difference is unremarkable on cpu, heap and total time. I was testing with smaller data sets but with larger datasets, I would expect #244 to have the best performance of the three (unconfirmed). In short and for our purposes here, all options can be considered equivalent with regards to these 3 factors. Final thoughts: As mentioned previously, I think we should always strive for backwards compat whenever possible. However, since in some cases the current code was not adhering to documented functionality anyway and since #244 has the important benefit of reducing overall limit usage (which could be substantial on larger resultsets), my recommendation is #244 should be chosen. With that said, if backwards compat is absolutely critical, #243 should be the choice over #241 because of its feature parity with current code plus additional functionality to try to get closer to the previously documented default ordering method. One last note - There is a hybrid between #243 & #244 (see here) where instead of just using the first context we find that satisfies all other criteria, we try to find one (similar to what is done here in #243) that starts with the same orderby (first field same plus ASC NULLS FIRST). If one is not found, we just grab the first one. The downside here is that we incur a little bit of additional processing to "find" the context and we have to continue to maintain functionality moving forward that we wouldn't otherwise need to if we gave up backwards compat. If this is the preferred approach, I'm happy to make the adjustment and update the PR. Apologize for all the long comments and multitude of PRs, just wanted to get everything out there for us to be able to make the best decision on. Look forward to your thoughts. |
@jondavis9898 wow what great work, i really appreciate this, just wanted to say i'm sorry for my stop start appearances here, DF prep causes me to black out at times. I've had a good read through this, but want to read a bit more, rest assured i really value this. Do you personally have any desires to see this in a release package sooner rather than later that i should keep in mind if possible? |
thanks @afawcett, no worries about the stop/start, completely understand especially with DF prep on top of normal juggling of commitments. I do have a need for the multiple order by enhancement. I don't need it "today" but I'm unable to finish one part of our app without it so as soon as possible would be greatly appreciated. This is a pretty critical area of DLRS so wanted to make sure I spent the proper time to evaluate all the options and present the pros/cons. I do think that #244 is the best long term option (due to limiting governor usage) but it does come at a "cost" of impacting backwards compat to some degree (not sure how "exposed" it would really be though since the existing code had bugs in that code path anyway). If you have any questions as you review, please just let me know. Look forward to your thoughts once you've digested. |
Hello @afawcett, hope you had a good DF and things are returning to normal. Just wondering if you've had a chance to review this in more detail...eager to get this incorporated once you determine appropriate direction. Thanks! |
Yep, will look on Sunday, thanks for your patience. |
Just looked through this all again now, going with 244! |
Released! |
Thank you! 👍 |
Currently, order by clauses are handled by RollupSummaryFields in a "Then By" approach. However, within DLRS as of PR #223, there would never be a situation where any two RollupSummaryFields within the same context would have a different order by clause except for when the Lookup Rollup Summary itself does not contain an order by field/clause.
The documentation & help text for Order By field states that when no order by is specified, it will default to the FieldToAggregate. However, there are cases where this behavior does not occur and is a result of the "Then By" approach on a shared context. For example, if two rollups are defined and neither specify an order by, the resulting SOQL will have an order by of "FieldToAggregate1, FieldToAggregate2." This results in the second rollup not being ordered by it's field to aggregate as the documentation states it will be. For an example, see jondavis9898@84caee2 and the comments by the asserts on jondavis9898@84caee2#diff-a59e8228c780cd01c3372282a42d4712R868, jondavis9898@84caee2#diff-a59e8228c780cd01c3372282a42d4712R874 and jondavis9898@84caee2#diff-a59e8228c780cd01c3372282a42d4712R880.
In the case where multiple rollups each contain the same specific order by, they will share a context. Currently, within LREngine there is logic that tracks fields included in order by and discards duplicates (https://github.com/afawcett/declarative-lookup-rollup-summaries/blob/master/rolluptool/src/classes/LREngine.cls#L168). Since a context in this situation will never have different order by fields, this processing is unnecessary.
In reviewing the LRE project itself, it appears that the order by enhancements within DLRS have not made it in to that project. Therefore, if we make changes within DLRS, I believe it would only effect DLRS internally and anyone that uses DLRS source instead of DLRS package.
Given the above, I have the following recommendations and would appreciate feedback/thoughts:
Move the "order by" to LREngine.Context and handle in the same manner that relationship criteria (where clause) is handled. Each context has a unique order by and all rollups within the context subscribe to that order by. Making this change would eliminate the extra tracking/processing to detect duplicates, etc. thereby increasing performance of the package.
When no order by is specified, do not generate any order by for the context. This results in a non-deterministic query that is not effected by any rollup fields included in the context. While the current approach is non-deterministic, it leads to undesirable side effects such as described above with "default ordering." This change could also include not including the lookup field in the order by clause (https://github.com/afawcett/declarative-lookup-rollup-summaries/blob/master/rolluptool/src/classes/LREngine.cls#L131) as the only reason this is included currently is to assist in the "merging" of master record id's during the processing of rollup queryResults. ** Edit ** After some additional thinking on this last point, I think it might make sense to leave the lookup field in the order by as it won't effect any ordering outcome and it is likely better performing to order in DB and then loop checking for a different master id (like the code currently does) then continually lookup in the Map on each iteration through queryresults.
Update the documentation to remove the mention of a rollup defaulting to the fieldtoaggregate when no order by is specified.
Appreciate the consideration and any thoughts/feedback.
The text was updated successfully, but these errors were encountered: