-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only highlight ALL matching fields if query doesn't target specific fields #9091
Conversation
@lukasolson LGTM, seems like a sensible tradeoff. Thanks for picking this up! |
Thanks for the detailed description! It really helped me understand your thought process. Seems like ES barfs on The changes to all query have an interesting impact on this issue. If I understand correctly, it searches all of the fields (as opposed to the note on index names IMO the highlighting with the new all query is exactly what we want. Terms with an explicit field will only highlight that field, while terms without a field specified will highlight any occurrence in any field. That made me re-consider whether we should leverage the flag for forcing the new all query functionality, but that would override any user provided default field in the index settings which I don't think we want to do :/ I tried the same query in master, and the results look more consistent in this one example at least: |
I wonder if we could make the case that this is leaking implementation details of highlighting with the query string query. Ideally highlighting should work the same regardless of whether you're using the new all query or querying What do you think @dakrone? |
@lukasolson I agree this is a step in the right direction. Quick question... how come when I combine predicates, when one of them does not have a field associated with it, it seems to get ignored completely? See last two screenshots, everything else works as I'd expect. |
@tbragin because a If you try indexing your data without the _all field in order to get the new "all query" mode, I think you'll get the highlighting you'd expect since the query is actually being made against all fields, instead of the _all field. That's why I was hoping to get some thoughts from the ES team on this difference. |
Perhaps related to elastic/elasticsearch#21621. |
@Bargs I've tested with From what I can tell, we don't actually keep track of whether or not @dakrone Have you had a chance to look at @Bargs' question above? |
@lukasolson what if we forced the new all query behavior by default by setting the I think that's the only way we can ensure users will have a consistent experience even if they have some indices with _all enabled and some with _all disabled. The only downside is that this would be a breaking change because users who currently set the default field in their index settings would need to update a kibana advanced option to maintain that behavior. |
Hi Everyone, apologies for the delayed response, I've been on vacation.
Unfortunately the detection of the auto-all mode is pretty distant from anything highlighter related, so I'm not sure this would be easy to do. That said, let me ask around with someone more familiar with highlighting and see if it could be possible. |
Thanks @dakrone! I noticed @jimczi is working on something called "unified highlighter", is that work at all relevant here? elastic/elasticsearch#21621 |
That's seems feasible and this would also fix the discrepancy between |
@Bargs @tbragin Seems like this is something that is now on the radar of the Elasticsearch team, and it will certainly be fixed by the issue linked above. In the meantime, is this PR a good short-term solution while the other is being worked on? In my opinion, it improves the current behavior, though it's not the exact solution we want, and we can't be sure how long this new issue will take to fix. |
@Bargs Kibana can use a |
@clintongormley this will lead to inconsistent results if the user has defined something other than I can also imagine scenarios where queries against I think it's really important that the highlighting be rock solid. If it's not, users will get confused and lose faith that the system is working correctly. Also, what kind of impact does |
@lukasolson I've been trying to think through the pros vs cons of the current solution in this PR. Pro: Avoids over-highlighting (highlighting any field with a particular value even if the query specified a field) Am I missing any other pros/cons? Personally, I feel like I'd prefer over-highlighting to under-highlighting. With over-highlighting my eyes can somewhat quickly filter out irrelevant matches, but with under-highlighting I might miss matches completely. |
Agreed - I'd solve this by having an option in Kibana to disable
No, it'll be included in the all_fields query
This will work correctly However, if a user has excluded certain fields from all with
I think we need to take into account the fact that we're on a journey here. So it won't work perfectly for those who set |
Replaced by #9671. |
Fixes #2358.
From the Elasticsearch highlighting docs:
Prior to this PR, we were setting
require_field_match
tofalse
on all requests from discover. As a result, even if you queried specific fields, or added filters that matched specific fields, you would see matches highlighted from fields not specifically queried.This PR changes the behavior such that the
require_field_match
parameter is only set tofalse
if the search query is a simplequery_string
query that doesn't target any fields. As a result, if you query specific fields, then the highlighted matches should only apply to the queried fields.This is, in my opinion, a step in the right direction. A couple of notes about this PR:
The code that detects when the query string applies to a specific field is quite primitive. It could definitely be improved, as it currently supposes that a query such as
":foo"
targets a specific field, even though it doesn't. We could be smarter about this but I think it would require some sort of query parser that could be more complicated than it's worth at this point.There is still a limitation of this implementation: If you have a filter that targets a specific field, then do a search query that doesn't target a specific field, then it's possible that over-highlighting could still occur.
For example, if I have a
geo.dest: "CN"
filter, then search for something likejpg
in the query bar, you could still get highlighting ofCN
under other matching fields, such asgeo.src
. However, matches ofjpg
would still be highlighted, regardless of field.The alternative is to have additional logic that checks if there are any filters that target specific fields, and if there are, then to not set
require_field_match
tofalse
. However, this could have the side effect of under-highlighting; in the same scenario above, this would mean that no matches ofjpg
would be highlighted, since no field was specified. I am personally of the opinion that the former is a better trade-off, but I'm open to discussing this.