Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle range query edge case #63397

Merged
merged 4 commits into from
Oct 21, 2020
Merged

Handle range query edge case #63397

merged 4 commits into from
Oct 21, 2020

Conversation

cbuescher
Copy link
Member

Currently when searching with an empty string as lower bound for a range query
on text-based fields we return all documents when 'gte' is used (including the
lower bound) but no documents when 'gt' is used. This might seem
counterintuitive since every value should be greate than the empty string. This
PR fixed this special edge case by implicitely setting the "lower" include flag
in this case before constructing the TermRangeQuery.

Closes #63386

Currently when searching with an empty string as lower bound for a range query
on text-based fields we return all documents when 'gte' is used (including the
lower bound) but no documents when 'gt' is used. This might seem
counterintuitive since every value should be greate than the empty string. This
PR fixed this special edge case by implicitely setting the "lower" include flag
in this case before constructing the TermRangeQuery.

Closes elastic#63386
@cbuescher cbuescher added >bug :Search/Search Search-related issues that do not fall into other categories v7.11.0 labels Oct 7, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label Oct 7, 2020
@romseygeek
Copy link
Contributor

I'm confused, why is an empty BytesRef not comparing as strictly less than any other term?

@cbuescher
Copy link
Member Author

I believe I tracked it down to https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/util/automaton/Automata.java#L257 where in case minInclusive is false we don't generate the match-all state or whatever that automaton does there. That happens for empty string and "gt" currently. For "gte" all is well since 'minInclusive' is true then.

@cbuescher
Copy link
Member Author

cbuescher commented Oct 7, 2020

There are even safeguard this doesn't happen for 'null' values in the beginning of that method. I decided to but the change into StringFieldType since I thought the behaviour down in Lucene land is expected like this elsewhere.

@romseygeek
Copy link
Contributor

I think this is a bug at the lucene level - if max is null and min is of length 0, then we should be returning a match-any binary no matter what the 'includeLower' value is, surely?

@cbuescher
Copy link
Member Author

I think this is a bug at the lucene level - if max is null and min is of length 0, then we should be returning a match-any binary no matter what the 'includeLower' value is, surely?

That what I wasn't sure about. I opted fixing on our side but if you think this is a Lucene bug I can open a fix there as well.

@romseygeek
Copy link
Contributor

but if you think this is a Lucene bug I can open a fix there as well.

+1

@cbuescher
Copy link
Member Author

@romseygeek I opened apache/lucene-solr#1976 with a fix in the Automata helper class. Would be great if you could take a look, if you think that is a better place to fix we can close this.

@cbuescher
Copy link
Member Author

After apache/lucene-solr#1976 is merged, I'm keeping this PR open to add tests that confirm the fix is also used by ES. This will happen once we move to a current 8.7 snapshot or release. I expect for SimpleSearchIT to fail until then but pass once we merge the fix in.

@cbuescher
Copy link
Member Author

@elasticmachine update branch

@cbuescher
Copy link
Member Author

@elasticmachine update branch

@cbuescher
Copy link
Member Author

@elasticmachine run elasticsearch-ci/packaging-sample-windows

@cbuescher
Copy link
Member Author

@romseygeek this has been fixed in apache/lucene-solr#1976 which we use on master and 7.x now. I'd just like to add this test on our side to verify and test this behaviour going forward. Would you mind taking a quick look if you agree with adding this?

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for handling this @cbuescher

@cbuescher cbuescher merged commit d11e2d2 into elastic:master Oct 21, 2020
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Oct 21, 2020
Currently when searching with an empty string as lower bound for a range query
on text-based fields we return all documents when 'gte' is used (including the
lower bound) but no documents when 'gt' is used. This might seem
counterintuitive since every value should be greate than the empty string.
The bug has been fixed in Lucene and this PR adds a test for assuring we observe
the fixed behaviour on searches now.

Closes elastic#63386
pugnascotia pushed a commit to pugnascotia/elasticsearch that referenced this pull request Oct 21, 2020
Currently when searching with an empty string as lower bound for a range query
on text-based fields we return all documents when 'gte' is used (including the
lower bound) but no documents when 'gt' is used. This might seem
counterintuitive since every value should be greate than the empty string. 
The bug has been fixed in Lucene and this PR adds a test for assuring we observe
the fixed behaviour on searches now.

Closes elastic#63386
cbuescher pushed a commit that referenced this pull request Oct 21, 2020
Currently when searching with an empty string as lower bound for a range query
on text-based fields we return all documents when 'gte' is used (including the
lower bound) but no documents when 'gt' is used. This might seem
counterintuitive since every value should be greate than the empty string.
The bug has been fixed in Lucene and this PR adds a test for assuring we observe
the fixed behaviour on searches now.

Closes #63386
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v7.11.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using empty string in range query 'gt' returns no documents
4 participants