-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Uppercase Regex in String queries search #3578
Comments
@yyyogev could you please share the mapping for |
I didn't not specify it, so whatever the default is |
@yyyogev OK, in this the assumption is that you use standard analyzer [1] for the
[1] https://www.elastic.co/guide/en/elasticsearch/reference/7.10/analysis-standard-analyzer.html |
This behavior was different at ES 6.8, is there documentation for this change of behavior anywhere? |
I am not aware on any changes to be fair on search side, perhaps there are differences on ingestion side (mappings). |
Doesn't seem so, this is the mapping for the doc in ES 6.8: and this is at OS 1.2.3: |
@yyyogev I was really curious to find out why, here is the thing: 6.8.x used default search analyzer (== [1] https://github.com/elastic/elasticsearch/pull/61013/files |
And if not then it doesn't use any analyzers at all? |
@reta is there a way to use some analyzer (standard or a custom one) without specifying it in each and every request? |
For regex specifically - yes
You could use the
To be honest, I was not able to find the other way around by just looking into code, could you please try the other options listed here [1]? |
@reta |
Correct, if using |
This creates another problem, if we have lowercasing the query, it breaks the search for terms of type keyword since these might be saved with uppercase letters |
@yyyogev I see, we have limited number of choices than (at this moment), specifying |
@reta we tried this already and it doesn't work as well.. I specified the search analyzer for the whole index with these settings:
|
@yyyogev I meant the approach you mentioned in the issue,
|
@reta this solution has the same problem.. it would lowercase keywords won't find keywords with uppercase letters |
@yyyogev sadly I do not have universal solution for you right now, there is Apache Lucene issue / pull request to support case-insensitivity for regexes in On the Opensearch side, what we could do, is to consult
@nknize what do you think? |
On the OpenSearch side we should add In the meantime I think in this situation you'll want to use the PatternAnalyzer at query time. In the pattern analyzer the Lowercase TokenFilter is used to auto lowercase your query string so it matches the default lowercased tokens created at index time by the Standard analyzer. If you want to preserve case for exact matches you could use an index template (like you did above) to define a custom analyzer to use at index time that removes the LowerCase token filter. Then use PatternAnalyzer at query time w/ |
@nknize sorry, just to reiterate one thing, for regex queries (specifically |
@reta @nknize we believe that reverting the change (aligning the behavior with the way it worked before the fix) should be the way to go in this case because:
please let me know what you think about it. and we of course happy to contribute to it if that's the way to go here. @AmiStrn FYI |
@reta I agree with @alexgnatyuk here. this seems like a really easy fix if the code introduced in Elasticsearch 7.9 is reverted since it is an x-pack feature that is using it. the discussion is about the proposed solution - what do you think? should we make this change as proposed? |
I am totally with you @AmiStrn @alexgnatyuk (#3578 (comment)), need @nknize confirmation this is away to go |
@nknize can you please update here? green light for us to start working on these changes? |
Regex search on analyzed fields doesn’t work with capital letters now as if the analyzer saves them lower cased, or doesn't use the standard analyzer for some reason.
To Reproduce
Steps to reproduce the behavior:
message: this is a TLS handshake
){"query": {"bool": {"must": [{"query_string": {"query": "message:/TLS/"}}]}}}
Expected behavior
We should find the doc we added above since it's an exact match. However, we got 0 results.
Plugins
None
Host/Environment (please complete the following information):
I used OpenSearch container with a single node (version 1.2.3), running on iOS 11.6.4
Additional context
If we specify the standard analyzer to the search request, we get the expected results.
{"query": {"bool": {"must": [{"query_string": {"query": "message:/TLS/", "analyzer":"standard"}}]}}}
The text was updated successfully, but these errors were encountered: