Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing SortField comparison to use equals instead of reference equality #6901

Merged
merged 1 commit into from
Apr 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
"search across indices with mixed long and double numeric types":
Copy link
Collaborator Author

@reta reta Mar 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gashutos so I found out why test is flipping, this is very interesting.

The test only works on single node clusters (or, to generalize it, when search results come from single node without serialization being involved). The reason for that is that SortedNumericSortFields are deserialized by Apache Lucene as SortFields and that is how they come to SearchPhaseController. As you may guessed already, the sort optimization does not trigger anymore because instanceof SortedNumericSortField is not met (those are just SortField instances).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is concerning @reta , The optimization will still get applied but while merging results, if it has different SortFiled.Type (INT & LONG from two different shards), it will fail the request -
Let me see what could be done here....

Copy link
Collaborator Author

@reta reta Mar 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will fail the request, which is essentially the "expected" behavior as per current implementation. So basically without the optimization - the test always fails, with optimization - passed on single node clusters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But current implementation works for Int & Long indices. Since we used to widen sortType to Long for Int field too during shard level sort. That will break with this, I wasnt knowing serilization will trigger issue here hence didnt test on multinode scenario....
tagging @nknize too..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But current implementation works for Int & Long indices.

No, it does not (Lucene is typecasting), here is the prove:

  1> Caused by: java.lang.ClassCastException: class java.lang.Long cannot be cast to class java.lang.Integer (java.lang.Long and java.lang.Integer are in module java.base of loader 'bootstrap')
  1>    at java.base/java.lang.Integer.compareTo(Integer.java:71)                                                                                                                             
  1>    at org.apache.lucene.search.FieldComparator.compareValues(FieldComparator.java:115)   

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gashutos just to may be clarify, this is not the regression caused by optimization changes, it existed before and still exist in Elasticsearch 7.x release line.

Copy link
Contributor

@gashutos gashutos Apr 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta I verified that this is regression. To reproduce this, do below steps.

  1. Ingest http_logs workload from OSB to multi node cluster (I created with 5 nodes)
  2. Create an index with exactly same field mapping except change size field as long from int
  3. Re-index any of the http_logs workload index to newly created index on 2nd step.
  4. Sort by asc/desc order on size field with logs-* as an index.
  5. Check this behaviour without code changes in Enable numeric sort optimization support for all numeric types #6424 (it works fine, while with Enable numeric sort optimization support for all numeric types #6424 it breaks).

I raised a PR to fix this behaviour, #6926 lets get that in before this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 test case have been updated

- skip:
version: " - 2.6.99"
reason: relies on numeric sort optimization that landed in 2.7.0 only

- do:
indices.create:
index: test_1
body:
settings:
number_of_shards: "1"
number_of_replicas: "0"
mappings:
properties:
counter:
type: long

- do:
indices.create:
index: test_2
body:
settings:
number_of_shards: "1"
number_of_replicas: "0"
mappings:
properties:
counter:
type: double
- do:
bulk:
refresh: true
body:
- index:
_index: test_1
- counter: 223372036854775800
- index:
_index: test_2
- counter: 1223372036854775800.23
- index:
_index: test_2
- counter: 184.4

- do:
search:
index: test_*
rest_total_hits_as_int: true
body:
sort: [{ counter: desc }]
- match: { hits.total: 3 }
- length: { hits.hits: 3 }
- match: { hits.hits.0._index: test_2 }
- match: { hits.hits.0._source.counter: 1223372036854775800.23 }
- match: { hits.hits.0.sort: [1223372036854775800.23] }
- match: { hits.hits.1._index: test_1 }
- match: { hits.hits.1._source.counter: 223372036854775800 }
- match: { hits.hits.1.sort: [223372036854775800] }

- do:
search:
index: test_*
rest_total_hits_as_int: true
body:
sort: [{ counter: asc }]
- match: { hits.total: 3 }
- length: { hits.hits: 3 }
- match: { hits.hits.0._index: test_2 }
- match: { hits.hits.0._source.counter: 184.4 }
- match: { hits.hits.0.sort: [184.4] }
- match: { hits.hits.1._index: test_1 }
- match: { hits.hits.1._source.counter: 223372036854775800 }
- match: { hits.hits.1.sort: [223372036854775800] }
Original file line number Diff line number Diff line change
Expand Up @@ -635,7 +635,7 @@ private static Sort createSort(TopFieldDocs[] topFieldDocs) {
*/
private static boolean isSortWideningRequired(TopFieldDocs[] topFieldDocs, int sortFieldindex) {
for (int i = 0; i < topFieldDocs.length - 1; i++) {
if (topFieldDocs[i].fields[sortFieldindex] != topFieldDocs[i + 1].fields[sortFieldindex]) {
if (!topFieldDocs[i].fields[sortFieldindex].equals(topFieldDocs[i + 1].fields[sortFieldindex])) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gashutos mind taking a look? I found this one out while adding tests for multi index search queries

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I remember I did test this with int & long and it worked fine.
The unit tests too are working fine, really is this the issue ?

Copy link
Collaborator Author

@reta reta Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly this one, but assume that index1 and index2 have same counter field with type long - in this case we will do unnecessary unwinding although the sort fields are the same. The test cases I added just to capture the ability to do sort across different types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh got it, so correctness wise it was still giving right results ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, no visible effect

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, that I got.
but ./gradlew check is having integ test failure for comparator type mismtach. That should work though..

Copy link
Collaborator Author

@reta reta Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh got it, you mean we don't need this test 260_sort_mixed.yml ? I haven't found any rest spec tests for mixed indices search ... I thought it is good to have one

Copy link
Contributor

@gashutos gashutos Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, its good to have. I was just curious why it is failing :)

you mean we don't need this test 260_sort_mixed.yml

No no, we need it, thank you for adding it...

Copy link
Contributor

@gashutos gashutos Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta I think this is mixed cluster scenario which is failing in ./gradlew check.
Our old versions dont support mixing results between LONG/DOUBLE. We are specifying sortType Long for Long NumericType here, and SortType Double for Double numeric type. So it will return results in Long for those shards of index with Long mapping, while Double for another case.

The new changes we did as part of widening sort type handles it correctly but that will be in only latest version.
May be we can change mapping types from Long to Double to Int to Long or Float to Double to having them work in mixed cluster scenario.

Copy link
Collaborator Author

@reta reta Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new changes we did as part of widening sort type handles it correctly but that will be in only latest version.

Correct, so I was adding the version check here:

  - skip:
      version: " - 2.6.99"
      reason: relies on numeric sort optimization that landed in 2.7.0 only

So the test should be skipped for anything below 2.7.0 (we backported this change to 2.x so it should work for 3.0.0 and 2.7.0) but for some reasons, the test is run on versions it is not supposed to ... I am curious to find out why ....

Running the test on 2.x branch manually passed for me, looking into it ...

return true;
}
}
Expand Down