Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure stability of clause order for DisjunctionMaxQuery toString #13944

Merged

Conversation

ljak
Copy link
Contributor

@ljak ljak commented Oct 22, 2024

Since #110, the disjuncts elements of DisjunctionMaxQueries don't have an order anymore, which is impacting the toString method. In isolation, that does not matter. But, in Solr, when the debug component is needed for a distributed query, every shard can return a different toString representation of the same query... and the different toString keys of the debug response will have an array value, containing those different representations (instead of having one value for one same representation).

Example with the parsedquery_toString key (of a json response within Solr):
parsedquery_toString":["((docIdentifiers:\"Okarandeep Osingh\" docIdentifiers:Otest) | (docTitle:\"Okarandeep Osingh\" docTitle:Otest) | (docBody:\"Okarandeep Osingh\" docBody:Otest))","((docBody:\"Okarandeep Osingh\" docBody:Otest) | (docTitle:\"Okarandeep Osingh\" docTitle:Otest) | (docIdentifiers:\"Okarandeep Osingh\" docIdentifiers:Otest))"]

When PR110 was merged, Solr adapted its unit tests this way: apache/solr#117 but, later on within Lucene, the toString method of DisjuctionIntervalsSource was adapted in prevision of a potential similar future change: #193.

I adapted the toString method of DisjunctionMaxQueries similarly to this PR.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit unhappy about making toString() run in O(n log(n)) with the number of clauses. Could we instead save a list copy of the queries in the constructor and use the list for toString()? BooleanQuery does something similar.

@ljak
Copy link
Contributor Author

ljak commented Oct 23, 2024

Thanks for the feedback. Looking at BooleanQuery, it "only" has one list List<BooleanClause> clauses. So, is the idea to have 2 structures for the DisjunctionMaxQuery, the unordered multiset of queries and a sorted list of queries, where the latter is only used for the toString method ? 

@jpountz
Copy link
Contributor

jpountz commented Oct 23, 2024

See BooleanQuery#clauseSets, which is used for equals()/hashcode() and BooleanQuery#clauses, which is used for toString().

@ljak
Copy link
Contributor Author

ljak commented Oct 23, 2024

I needed to sort the Querys in some ways, so I compare them according to their toString representation:

orderedQueries.sort(Comparator.comparing(Query::toString));

Not sure if it's the right way.

@jpountz
Copy link
Contributor

jpountz commented Oct 24, 2024

I wouldn't sort them, and just rely on the order that the caller supplied?

@ljak
Copy link
Contributor Author

ljak commented Oct 24, 2024

Ha, I see. Could we say that the new List<Query> orderedQueries would have the same behavior that Query[] disjuncts before https://github.com/apache/lucene/pull/110/files ? If yes, I presume it would work.

@jpountz
Copy link
Contributor

jpountz commented Oct 24, 2024

Yes, exactly.

@jpountz
Copy link
Contributor

jpountz commented Oct 25, 2024

Can you add an entry to lucene/CHANGES.txt under version 10.1.0? Then I'll merge.

@ljak
Copy link
Contributor Author

ljak commented Oct 25, 2024

Done. Thanks for reviewing!

@jpountz jpountz merged commit b12ee52 into apache:main Oct 25, 2024
3 checks passed
@jpountz jpountz added this to the 10.1.0 milestone Oct 25, 2024
jpountz pushed a commit that referenced this pull request Oct 25, 2024
@dsmiley
Copy link
Contributor

dsmiley commented Oct 27, 2024

Thanks for this! I had to redo a bunch of tests over this matter at work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants