-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate Function Score Query in favour of Script Score Query #42811
Comments
Pinging @elastic/es-search |
We have identified 2
|
For score_mode A proposal is to have a new type of compound query that has an ability to combine scores from different queries: {
"query": {
"compound": {
"queries": [
{
"match": {
"message": "elasticsearch"
}
},
{
"match": {
"author": "shay banon"
}
}
],
"score_mode": "first"
}
}
} score_mode has the same values and definitions as in Function Score Query
|
Or as an alternative we can have a script compound query where scores can be freely combined using script: {
"query": {
"script_compound": {
"queries": [
{"match": {
"message": "elasticsearch"
}},
{"match": {
"author": "shay banon"
}}
],
"script": {
"source": "_scores[0] + _scores[1]"
}
}
}
} But for the {
"query": {
"in_order": {
"queries": [
"match": {
"message": "elasticsearch"
},
"match": {
"author": "shay banon"
}
],
"score_mode" : "first"
}
}
} Not sure what other |
@mayya-sharipova I am not 100% sure if these changes won't break our usage of ElasticSearch, so I've prepared the following comparison - could you assure it is proper feature translation to the new query model, please? Thank you for any tip! Actual request (ES 6.6){
"explain": true,
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": {
"lang": "painless",
"source": "return doc['ids'].containsAll(params.ids) ? 1 : 0;",
"params": {
"ids": [1, 2]
}
}
},
"weight": 65
},
{
"filter": {
"terms": {
"location.city_id": [
"1"
]
}
},
"weight": 35
}
],
"boost_mode": "replace",
"score_mode": "sum",
"min_score": 0
}
}
} New request (ES 7.x / ES 8.0 compatible)[especially not sure about how to pass filter scoring behaviour... 🤔] {
"explain": true,
"query": {
"script_compound": {
"queries": [
{
"script_score": {
"query": {
"match_all": {}
},
"script": {
"lang": "painless",
"source": "return (doc['ids'].containsAll(params.ids) ? 1 : 0) * params.weight;",
"params": {
"ids": [1, 2],
"weight": 65
}
}
}
},
{
"script_score": {
"query": {
"match": {
"terms": {
"location.city_id": [
"1"
]
}
}
},
"script": {
"lang": "painless",
"source": "return params._score * params.weight;",
"params": {
"weight": 35
}
}
}
}
],
"boost_mode": "replace",
"score_mode": "sum",
"min_score": 0
}
}
} PSI recommend one little fix of the proposed query (as for now it isn't JSON format compliant):
it should looks like this:
{
"query": {
"script_compound": {
"queries": [
{
"match": {
"message": "elasticsearch"
},
},
{
"match": {
"author": "shay banon"
}
}
],
"script": {
"source": "_scores[0] + _scores[1]"
}
}
}
} |
@mayya-sharipova Please give us any tip on the topic, thanks 🙏🙂 |
@lrynek Sorry for a late reply, I've been away. {
"query": {
"compound": {
"queries": [
{
"script_score": {
"script": {
"source": "return doc['ids'].containsAll(params.ids) ? 65 : 0;",
"params": {
"ids": [1,2]
}
}
}
},
{
"terms": {
"location.city_id": [
"1"
],
"boost": 35
}
}
],
"score_mode": "sum"
}
}
} |
@mayya-sharipova Thank you for the response 👍 It's now clear for me. |
@iRynek we plan to add a new
The case that we want to handle in the new query is the |
@jimczi Thanks for the comment 👍 But we precisely will need to implement such esoteric use cases as multiplying score of multiple queries... 😏 So are you going to get rid of those possibilities? 🤔 😨 |
@jimczi @mayya-sharipova Kindly reminder about my previous question 🙂 ☝️ |
@mayya-sharipova @jimczi Hello - I'm a happy user of elasticsearch and the current I'm using To help explain this with a concrete use case: my documents are recipes - and I'd like to sort based on factors including Currently I use The script score returns the sum of This works, although in an ideal world I'd prefer to calculate and refer to the two distinct values separately. Combining them and restricting B to One idea I had was to align with the 'sort by multiple fields' approach which works for 'native' document fields (as in this example sorting by If users could define 'named script queries' then those could be referenced in the "script_scores": {
"calculation_a": {
"script": "return ...",
...
},
"calculation_b": {
"script": "return ...",
...
}
"_score": { # the document score -- i.e. same as the current script_score
"script": "return ...",
...
}
},
"sort": [
{"calculation_a": "asc"}, # alternative: "_score.calculation_a" ?
{"calculation_b": "desc"},
...
] As far as I can tell, this isn't currently possible, but I'd be glad to be corrected if there's a way to use multiple script outputs at the moment. Either way I thought I'd share some thoughts from working through this problem. Thanks for your time and work on ES! |
We are not going to get rid of possibilities of |
@jayaddison Thank you for sharing your use-case, it is very interesting. {
"query": {
"compound": {
"queries": [
{
"match": {
"message": "elasticsearch"
}
},
{
"script_score": {
....
}
}
],
"score_mode": "first"
}
},
"sort" : [
{ scores[0] : "asc" },
{ scores[1] : "desc" }
]
} |
@lrynek We have a discussion within the team, and found a behaviour where scores from multiple queries need to be multiplied to be really esoteric, and not the behaviour that we would like to encourage our users. That's why we have decided for now only to implement Nevertheless, we would like still to learn more about use case where you need to multiple scores from multiple queries, in case we missed something. We would appreciate if you share more details about your multiply use case. Thanks a lot. |
@mayya-sharipova Thanks for your answer! 👍 ... and sorry for the delayed response on my side 😏 In the meanwhile we had decided to use ExampleRequestExpand request body
{
"size": 1,
"explain": true,
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"match_all": {}
}
]
}
},
"functions": [
{
"script_score": {
"script": {
"lang": "painless",
"source": "0.89",
"params": {
"param_1": [],
"param_2": [],
"param_3": []
}
}
},
"weight": 65
},
{
"filter": {
"term": {
"location.city_id": 1
}
},
"weight": 35
},
{
"field_value_factor": {
"field": "factors.some_indexed_factor_value",
"missing": 0
},
"weight": 18
}
],
"boost_mode": "replace",
"score_mode": "sum",
"min_score": 0
}
}
} ResponseExpand response explanation JSON
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 53280,
"max_score": 57.85,
"hits": [
{
"_shard": "__SHARD__",
"_node": "cp9Uzt8pQUeI9BGiS1eu2Q",
"_index": "__INDEX__",
"_type": "_doc",
"_id": "__ID__",
"_score": 57.85,
"_source": {},
"_explanation": {
"value": 57.85,
"description": "sum of:",
"details": [
{
"value": 57.85,
"description": "min of:",
"details": [
{
"value": 57.85,
"description": "function score, score mode [sum]",
"details": [
{
"value": 57.85,
"description": "product of:",
"details": [
{
"value": 0.89,
"description": "script score function, computed with script:\"Script{type=inline, lang='painless', idOrCode='0.89', options={}, params={param_2=[], param_3=[], param_1=[]}}\" and parameters: \n{param_2=[], param_3=[], param_1=[]}",
"details": [
{
"value": 1.0,
"description": "_score: ",
"details": [
{
"value": 1.0,
"description": "*:*",
"details": []
}
]
}
]
},
{
"value": 65.0,
"description": "weight",
"details": []
}
]
},
{
"value": 0.0,
"description": "product of:",
"details": [
{
"value": 0.0,
"description": "field value function: none(doc['factors.some_indexed_factor_value'].value?:0.0 * factor=1.0)",
"details": []
},
{
"value": 18.0,
"description": "weight",
"details": []
}
]
}
]
},
{
"value": 3.4028235E38,
"description": "maxBoost",
"details": []
}
]
},
{
"value": 0.0,
"description": "match on required clause, product of:",
"details": [
{
"value": 0.0,
"description": "# clause",
"details": []
},
{
"value": 1.0,
"description": "DocValuesFieldExistsQuery [field=_primary_term]",
"details": []
}
]
}
]
}
}
]
}
} NeedsSo I mostly would like you to:
Are those above☝️requirements possible to sustain in the brand new approach you propose here? (it's important for us as a company 🙏). If you have a possibility / wanto to sync via Zoom remote call, please propose a schedule, so I will be able to explain everything into detail 🙂 Thanks for any further insights on the topic 🙂 |
@lrynek Thank for posting your query and explanations. Just wanted to let you know that I read and aware of your post. I will get back to you with an answer when I have something concrete, there are still some things we want to discuss within the search team. |
Another use case for function score query. Our application constructs Elasticsearch queries using multiple components. Each component may implement its own scoring method. Multiple scoring functions are defined and combined with function score query. It helps isolate the implementation of each function. With script query all functions have to be implemented in single script. It doesn't support modularity of the code. |
@yuri-lab Thanks for your comment.
You can define multiple functions in painless script as well, for example: {
"query": {
"script_score" : {
"query" : {
"match_all" : {}
},
"script" : {
"source": """
long myFun1(long x) { x * 10 }
long myFun2(long x) { x * 100 }
return myFun1(doc['field1'].value) + myFun2(doc['field2'].value)
"""
}
}
} The only limitation of Does this satisfy your requirement? |
@lrynek I have thought about your query, and I can see that we can implement it through a {
"query" : {
"bool" : {
"should" : [
{
"constant_score" : {
"filter" : {
"term" : { "location.city_id": 1}
},
"boost" : 35
}
},
{
"script_score" : {
"query" : {"match_all": {}},
"script": {
"source": """
0.89* 65 + 18 * doc['some_indexed_factor_value'].value
""",
"params": {
<params>
}
}
}
}
]
}
}
} About your needs:
|
@yuri-lab I cannot agree more with you - it's also our use case 👍 @mayya-sharipova I can see the possibility of transferring and reproducing our current query needs into the To give you even more examples from the real usage, where we are not manipulating ElasticSearch requests directly but via other language components that adds up their own scoring factors (it is very modular, reusable and scalable at the same time // here in PHP each factor generator class is defined as a service that can depend on other services in order to build/generate the final particular PHP Factor builders / generators classes:
namespace SearchBridge\Factor;
use SearchEngine\Domain\ValueObject\Query;
final class FirstFactor implements FactorInterface
{
public function key(): string
{
return 'first_factor';
}
public function definition(Query $query): array
{
return [
'script_score' => [
'script' => [
'lang' => 'painless',
'source' => $this->scriptSource(),
'params' => $this->scriptParams($query),
],
],
'weight' => $this->factorWeightRepository->weightForFactor($this),
];
}
}
final class SecondFactor implements FactorInterface
{
public function key(): string
{
return 'second_factor';
}
public function definition(Query $query): array
{
$cityIds = $this->cityResolver->resolve($query);
if ($cityIds->isEmpty())
{
return [];
}
return [
'filter' => [
'terms' => [
'location.city_id' => $cityIds->all(),
],
],
'weight' => $this->factorWeightRepository->weightForFactor($this),
];
}
}
final class ThirdFactor implements FactorInterface
{
public function key(): string
{
return 'third_factor';
}
public function definition(Query $query): array
{
return [
'field_value_factor' => [
'field' => 'factors.some_indexed_factor_value',
'missing' => 0,
],
'weight' => $this->factorWeightRepository->weightForFactor($this),
];
}
} With the approach you suggest, my team will be forced to get rid of this very good architecture and to parse parts of ElasticSearch query (scripts, etc.) as strings and inject here or there specific logic. For me it is unacceptable // will force me to (sarcasm starts) sort of programmer's seppuku... (sarcasm ended 😉😄 🏯⚔️) |
@mayya-sharipova Conceptually yes, but in practice this approach requires string manipulation to construct source for the script, which is error-prone. Also it goes against best practice recommended by Elasticsearch to keep script source static and change only script parameters. |
We have discussed this issue again within the team, and the conclusion is that we would like ES users to use bool query instead of function_score query to combine queries/functions. There has been a lot of optimizations done for bool queries on the Lucene side to make them smarter and more efficient, which function_score query doesn't have. |
@mayya-sharipova Can we please sync via a Zoom call on the topic? (maybe even with some other devs involved on your side) It would be cool to explain everything this way 🙏 |
We have a ecommerce store where use case of combining score of two factors:-
Right now they are used in a very clean manner using 'function_score query'. Based on the discussion above it looks like 'script score query' will result in too much complexity. @mayya-sharipova : What are the specific advantages you and your team see in deprecating function score query in favour of script score query? |
We have a solution in place which is very similar to that of AakashMittal. We've got two crucial points:
Our greatest concern is whether this is still possible when migrating to script score query? |
@AakashMittal thanks for providing your use-case. All these factors can be combined using
It is a quite bulky query, and difficult to reason about. It has a number of bugs for edge-cases and un-intuitive behaviours (how weights get propagated, behaviour in nested context etc). We want to replace with a simple script_score and bool queries that we have put and putting a lot of work in optimizing. |
@webemat thank you for providing your use-case.
You can do this combination through script_score query.
Depending what exactly are these factors, some of them can be implemented through a script_score query. In this example a doc score will get a boost depending on a category it belongs. {
"query": {
"script_score": {
"query": {
"match": {
"message": "apple"
}
},
"script": {
"source": "_score * params.getOrDefault(doc["category"].value, 1)",
"params": {
"books": 10,
"computers": 100,
"food": 1
}
}
}
}
} We also would like to emphasize that we have not found much evidence from literature where it is useful to multiply scores from two textual queries, that's why we encourage ES users to combine queries by summing scores through bool queries. |
@mayya-sharipova can you please refer also to my latest comment? Thank you! 🙂 |
@mayya-sharipova I currently use function_score to multiply the values of a standard query (with should, must, boosts, etc) and a custom query (involving a dense_vector dotProduct). How can I replicate that with script_score? I see your suggestion about putting the custom script in a bool, but bool sums the results whereas I need them multiplied. |
How does your second, "custom query" involving dot product look like? It uses script_score, doesn't it? Then your script can return |
@telendt Great, thank you, that does indeed work. |
We are looking for a way to combine scores from two sub queries. One subquery is will be our standard TFIDF text query and another query will be our KNN query. We are trying to boost our recall when the input text from our customers dont exactly match the indexed documents by leveraging word embeddings. We want to Multiply the scores of these queries. Is this possible to do with a script_score query? |
@ngadkari3 No, |
@mayya-sharipova are your team decided to proceed with the deprecation or it will be preserved as a feature? What are the decision 🤔 🙂 |
@lrynek Sorry, I don't have an update for this. For now, we are keeping |
Update on this issue:
|
We had a plan to deprecate function_score query with script_score query, but ran into a roadblock of missing functionality to combine scores from different functions (particularly "first" script_score). Wee have several proposal to address this missing functionality: [scripted_boolean](elastic#27588 (comment)) [compound_query](elastic#51967) [first_query](elastic#52482) But for now, we decided not to deprecate function_score query, and hence we need to remove any mention that we are deprecating it. Relates to elastic#42811 Closes elastic#71934
We had a plan to deprecate function_score query with script_score query, but ran into a roadblock of missing functionality to combine scores from different functions (particularly "first" script_score). Wee have several proposal to address this missing functionality: [scripted_boolean](#27588 (comment)) [compound_query](#51967) [first_query](#52482) But for now, we decided not to deprecate function_score query, and hence we need to remove any mention that we are deprecating it. Relates to #42811 Closes #71934
We had a plan to deprecate function_score query with script_score query, but ran into a roadblock of missing functionality to combine scores from different functions (particularly "first" script_score). Wee have several proposal to address this missing functionality: [scripted_boolean](#27588 (comment)) [compound_query](#51967) [first_query](#52482) But for now, we decided not to deprecate function_score query, and hence we need to remove any mention that we are deprecating it. Relates to #42811 Closes #71934
We had a plan to deprecate function_score query with script_score query, but ran into a roadblock of missing functionality to combine scores from different functions (particularly "first" script_score). Wee have several proposal to address this missing functionality: [scripted_boolean](#27588 (comment)) [compound_query](#51967) [first_query](#52482) But for now, we decided not to deprecate function_score query, and hence we need to remove any mention that we are deprecating it. Relates to #42811 Closes #71934
We would like to deprecate Function Score Query in
7.x8.0 and completely remove it starting from8.09.0 (increasing versions as we are far deep in 7.x already).The new script score query will replace it.
Here we describe how a function score query can be replaced with a script score query.
We would like to know from the community about cases when something was possible to do with Function Score Query, but the new Script Score Query can't address it.
The text was updated successfully, but these errors were encountered: