Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate Function Score Query in favour of Script Score Query #42811

Closed
mayya-sharipova opened this issue Jun 3, 2019 · 37 comments
Closed

Deprecate Function Score Query in favour of Script Score Query #42811

mayya-sharipova opened this issue Jun 3, 2019 · 37 comments
Labels
:Search Relevance/Ranking Scoring, rescoring, rank evaluation. Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@mayya-sharipova
Copy link
Contributor

mayya-sharipova commented Jun 3, 2019

We would like to deprecate Function Score Query in 7.x 8.0 and completely remove it starting from 8.0 9.0 (increasing versions as we are far deep in 7.x already).

The new script score query will replace it.
Here we describe how a function score query can be replaced with a script score query.

We would like to know from the community about cases when something was possible to do with Function Score Query, but the new Script Score Query can't address it.

@mayya-sharipova mayya-sharipova added the :Search Relevance/Ranking Scoring, rescoring, rank evaluation. label Jun 3, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@mayya-sharipova
Copy link
Contributor Author

We have identified 2 function_score query functionalities that are missing in the script_score query

  1. score_modefirst, applying the score from the function with the 1st matching filter (wonder how prevalent is this use-case)
  2. _explain explanation of score calculation doesn't work in script_score query.

@mayya-sharipova
Copy link
Contributor Author

mayya-sharipova commented Aug 26, 2019

For score_mode first, we had a discussion before.

A proposal is to have a new type of compound query that has an ability to combine scores from different queries:

{
  "query": {
    "compound": {
      "queries": [
        {
          "match": {
            "message": "elasticsearch"
          }
        },
        {
          "match": {
            "author": "shay banon"
          }
        }
      ],
      "score_mode": "first"
    }
  }
}

score_mode has the same values and definitions as in Function Score Query

score_mode definition
sum scores are summed (default)
multiply scores are multiplied
avg scores are averaged
first the first function that has a matching filter is applied
max maximum score is used
min minimum score is used

@mayya-sharipova
Copy link
Contributor Author

mayya-sharipova commented Aug 26, 2019

Or as an alternative we can have a script compound query where scores can be freely combined using script:

{
  "query": {
    "script_compound": {
      "queries": [
        {"match": {
          "message": "elasticsearch"
        }},
        {"match": {
          "author": "shay banon"
        }}
      ],
      "script": {
        "source": "_scores[0] + _scores[1]"
      }
    }
  }
}

But for the first mode, we would still need to implement a type of in-order query, as it is difficult to implement this logic through a script. A possible API for in_order query:

{
  "query": {
    "in_order": {
      "queries": [
        "match": {
          "message": "elasticsearch"
        },
        "match": {
          "author": "shay banon"
        }
      ],
      "score_mode" : "first"
    }
  }
}

Not sure what other score_modes are useful here.

@lrynek
Copy link

lrynek commented Oct 1, 2019

@mayya-sharipova I am not 100% sure if these changes won't break our usage of ElasticSearch, so I've prepared the following comparison - could you assure it is proper feature translation to the new query model, please? Thank you for any tip!

Actual request (ES 6.6)

{
  "explain": true,
  "query": {
    "function_score": {
      "query": {
        "match_all": {}
      },
      "functions": [
        {
          "script_score": {
            "script": {
              "lang": "painless",
              "source": "return doc['ids'].containsAll(params.ids) ? 1 : 0;",
              "params": {
                "ids": [1, 2]
              }
            }
          },
          "weight": 65
        },
        {
          "filter": {
            "terms": {
              "location.city_id": [
                "1"
              ]
            }
          },
          "weight": 35
        }
      ],
      "boost_mode": "replace",
      "score_mode": "sum",
      "min_score": 0
    }
  }
}

New request (ES 7.x / ES 8.0 compatible)

[especially not sure about how to pass filter scoring behaviour... 🤔]

{
  "explain": true,
  "query": {
    "script_compound": {
      "queries": [
        {
          "script_score": {
            "query": {
              "match_all": {}
            },
            "script": {
              "lang": "painless",
              "source": "return (doc['ids'].containsAll(params.ids) ? 1 : 0) * params.weight;",
              "params": {
                "ids": [1, 2],
                "weight": 65
              }
            }
          }
        },
        {
          "script_score": {
            "query": {
              "match": {
                "terms": {
                  "location.city_id": [
                    "1"
                  ]
                }
              }
            },
            "script": {
              "lang": "painless",
              "source": "return params._score * params.weight;",
              "params": {
                "weight": 35
              }
            }
          }
        }
      ],
      "boost_mode": "replace",
      "score_mode": "sum",
      "min_score": 0
    }
  }
}

PS

I recommend one little fix of the proposed query (as for now it isn't JSON format compliant):
Instead of this:

Or as an alternative we can have a script compound query where scores can be freely combined using script:

{
  "query": {
    "script_compound": {
      "queries": [
        "match": {
          "message": "elasticsearch"
        },
        "match": {
          "author": "shay banon"
        }
      ],
      "script": {
        "source": "_scores[0] + _scores[1]"
      }
    }
  }
}

it should looks like this:

Or as an alternative we can have a script compound query where scores can be freely combined using script:

 {
   "query": {
     "script_compound": {
       "queries": [
        {
          "match": {
             "message": "elasticsearch"
           },
        },
        {
          "match": {
            "author": "shay banon"
          }
        } 
       ],
       "script": {
         "source": "_scores[0] + _scores[1]"
       }
     }
   }
 }

@lrynek
Copy link

lrynek commented Oct 17, 2019

@mayya-sharipova Please give us any tip on the topic, thanks 🙏🙂

@mayya-sharipova
Copy link
Contributor Author

@lrynek Sorry for a late reply, I've been away.
Thank you for your tips, indeed with a new query type we need to cover all existing functionalities of function_score query before we can deprecate it.
And you are right, script doesn't allow to filter functions. For now, the plan is to investigate the implementation of compound query without script, so your query could be translated to something like this:

{
  "query": {
    "compound": {
      "queries": [
        {
          "script_score": {
            "script": {
              "source": "return doc['ids'].containsAll(params.ids) ? 65 : 0;",
              "params": {
                "ids": [1,2]
              }
            }
          }
        },
        {
          "terms": {
            "location.city_id": [
              "1"
            ],
            "boost": 35
          }
        }
      ],
      "score_mode": "sum"
    }
  }
}

@lrynek
Copy link

lrynek commented Oct 29, 2019

@mayya-sharipova Thank you for the response 👍 It's now clear for me.
Please share any resource regarding this future implementation as me and my coworkers are very interested in being updated on the topic. Thank you! 🙂

@jimczi
Copy link
Contributor

jimczi commented Oct 30, 2019

@iRynek we plan to add a new compound query to handle some of the unique functionalities that the function_score query provides. However, looking at your example I don't think you need any replacement here. When the scores of sub-queries are summed together a plain bool query would work fine:

{
    "query": {
        "bool": {
            "filter": {
                "match_all": {}
            }
        },
        "should": [
            {
                "script_score": {
                    "script": {
                        "source": "return doc['ids'].containsAll(params.ids) ? 65 : 0;",
                        "params": {
                            "ids": [
                                1,
                                2
                            ]
                        }
                    }
                }
            },
            {
                "terms": {
                    "location.city_id": [
                        "1"
                    ],
                    "boost": 35
                }
            }
        ]
    }
}

The case that we want to handle in the new query is the first mode that picks the score of the first matching query. This is not possible to achieve with a bool query so we need a replacement.
There are other cases but they are esoteric like for instance if you want to multiply the score of multiple queries instead of using a sum like the bool query is doing.

@lrynek
Copy link

lrynek commented Nov 4, 2019

@jimczi Thanks for the comment 👍 But we precisely will need to implement such esoteric use cases as multiplying score of multiple queries... 😏 So are you going to get rid of those possibilities? 🤔 😨

@lrynek
Copy link

lrynek commented Jan 4, 2020

@jimczi @mayya-sharipova Kindly reminder about my previous question 🙂 ☝️

@jayaddison
Copy link

@mayya-sharipova @jimczi Hello - I'm a happy user of elasticsearch and the current script_score functionality. I have a use case related to compound / multiple script scores and thought I'd share it here along with some ideas. I hope this is the correct GitHub issue and is on-topic - let me know if not and I'll be happy to move/hide this comment.

I'm using script_score to handle a situation where multiple sort criteria are involved. For example: I'd like results to appear sorted based on criteria A, and then use criteria B as a tie-breaker.

To help explain this with a concrete use case: my documents are recipes - and I'd like to sort based on factors including number of matched ingredients (desc), and then recipe rating (desc).

Currently I use script_score to produce a single output value to achieve this -- it calculates an integer value A (count of ingredients matched) and a decimal value B which is normalized to the range 0...1 (float) based on the recipe rating (code ref).

The script score returns the sum of A+B -- so 'three matched ingredients on a recipe with a rating of 4 stars' becomes 3.4 in the document _score, and will rank above 'three matched ingredients on a recipe of 2 stars', for example, thanks to the sort order.

This works, although in an ideal world I'd prefer to calculate and refer to the two distinct values separately. Combining them and restricting B to 0...1 makes the intent of the calculation and sorting less clear.

One idea I had was to align with the 'sort by multiple fields' approach which works for 'native' document fields (as in this example sorting by post_date, user, ...), by using named outputs (similar to the way that aggregations can be named).

If users could define 'named script queries' then those could be referenced in the sort parameter - and it'd also make it easier to use a mix of sort orders (asc, desc, ...) on the different scripted outputs.

"script_scores": {
  "calculation_a": {
    "script": "return ...",
    ...
  },
  "calculation_b": {
    "script": "return ...",
    ...
  }
  "_score": {  # the document score -- i.e. same as the current script_score
    "script": "return ...",
    ...
  }
},
"sort": [
  {"calculation_a": "asc"},  # alternative: "_score.calculation_a" ?
  {"calculation_b": "desc"},
  ...
]

As far as I can tell, this isn't currently possible, but I'd be glad to be corrected if there's a way to use multiple script outputs at the moment. Either way I thought I'd share some thoughts from working through this problem. Thanks for your time and work on ES!

@mayya-sharipova
Copy link
Contributor Author

@lrynek

But we precisely will need to implement such esoteric use cases as multiplying score of multiple queries... 😏 So are you going to get rid of those possibilities?

We are not going to get rid of possibilities of function_score query unless we have other alternative ways to implement them.

@mayya-sharipova
Copy link
Contributor Author

@jayaddison Thank you for sharing your use-case, it is very interesting.
I will bring your proposal to my team for a discussion.
I can see how it could this could be used with a compound query we are investigating.

{
  "query": {
    "compound": {
      "queries": [
        {
          "match": {
            "message": "elasticsearch"
          }
        },
        {
          "script_score": {
            ....
          }
        }
      ],
      "score_mode": "first"
    }
  },
  "sort" : [
    { scores[0] : "asc" },
    { scores[1] : "desc" }
  ]
}

@mayya-sharipova
Copy link
Contributor Author

@lrynek We have a discussion within the team, and found a behaviour where scores from multiple queries need to be multiplied to be really esoteric, and not the behaviour that we would like to encourage our users. That's why we have decided for now only to implement first behaviour of function_score query, and drop all other esoteric behaviours (multiply, min, and avg).

Nevertheless, we would like still to learn more about use case where you need to multiple scores from multiple queries, in case we missed something. We would appreciate if you share more details about your multiply use case. Thanks a lot.

@lrynek
Copy link

lrynek commented Apr 8, 2020

@mayya-sharipova Thanks for your answer! 👍 ... and sorry for the delayed response on my side 😏

In the meanwhile we had decided to use sum value of score_mode configuration in our function_score queries, so I would like to assure that future changes in ElasticSearch API won't affect this usage we have now 👉 see the example query we do // and its explanation from ES:

Example

Request

Expand request body

{
  "size": 1,
  "explain": true,
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "must": [
            {
              "match_all": {}
            }
          ]
        }
      },
      "functions": [
        {
          "script_score": {
            "script": {
              "lang": "painless",
              "source": "0.89",
              "params": {
                "param_1": [],
                "param_2": [],
                "param_3": []
              }
            }
          },
          "weight": 65
        },
        {
          "filter": {
            "term": {
              "location.city_id": 1
            }
          },
          "weight": 35
        },
        {
          "field_value_factor": {
            "field": "factors.some_indexed_factor_value",
            "missing": 0
          },
          "weight": 18
        }
      ],
      "boost_mode": "replace",
      "score_mode": "sum",
      "min_score": 0
    }
  }
}

Response

Expand response explanation JSON

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 53280,
    "max_score": 57.85,
    "hits": [
      {
        "_shard": "__SHARD__",
        "_node": "cp9Uzt8pQUeI9BGiS1eu2Q",
        "_index": "__INDEX__",
        "_type": "_doc",
        "_id": "__ID__",
        "_score": 57.85,
        "_source": {},
        "_explanation": {
          "value": 57.85,
          "description": "sum of:",
          "details": [
            {
              "value": 57.85,
              "description": "min of:",
              "details": [
                {
                  "value": 57.85,
                  "description": "function score, score mode [sum]",
                  "details": [
                    {
                      "value": 57.85,
                      "description": "product of:",
                      "details": [
                        {
                          "value": 0.89,
                          "description": "script score function, computed with script:\"Script{type=inline, lang='painless', idOrCode='0.89', options={}, params={param_2=[], param_3=[], param_1=[]}}\" and parameters: \n{param_2=[], param_3=[], param_1=[]}",
                          "details": [
                            {
                              "value": 1.0,
                              "description": "_score: ",
                              "details": [
                                {
                                  "value": 1.0,
                                  "description": "*:*",
                                  "details": []
                                }
                              ]
                            }
                          ]
                        },
                        {
                          "value": 65.0,
                          "description": "weight",
                          "details": []
                        }
                      ]
                    },
                    {
                      "value": 0.0,
                      "description": "product of:",
                      "details": [
                        {
                          "value": 0.0,
                          "description": "field value function: none(doc['factors.some_indexed_factor_value'].value?:0.0 * factor=1.0)",
                          "details": []
                        },
                        {
                          "value": 18.0,
                          "description": "weight",
                          "details": []
                        }
                      ]
                    }
                  ]
                },
                {
                  "value": 3.4028235E38,
                  "description": "maxBoost",
                  "details": []
                }
              ]
            },
            {
              "value": 0.0,
              "description": "match on required clause, product of:",
              "details": [
                {
                  "value": 0.0,
                  "description": "# clause",
                  "details": []
                },
                {
                  "value": 1.0,
                  "description": "DocValuesFieldExistsQuery [field=_primary_term]",
                  "details": []
                }
              ]
            }
          ]
        }
      }
    ]
  }
}

Needs

So I mostly would like you to:

  1. Preserve the ability of sum for all scores being involved in a query
  2. Keep this ability of explicit weight API for each of the scoring functions if possible (as it is more elegant to pass explicitly the weight from a script that generates the ElasticSearch query)
  3. Add a functionality of particular scores retrieval from the explanation (without a need of recursive searches through the nested explanation JSON // with all those value, description, details triplets 😉) as stated in my feature request proposal.

Are those above☝️requirements possible to sustain in the brand new approach you propose here? (it's important for us as a company 🙏).

If you have a possibility / wanto to sync via Zoom remote call, please propose a schedule, so I will be able to explain everything into detail 🙂

Thanks for any further insights on the topic 🙂

@mayya-sharipova
Copy link
Contributor Author

@lrynek Thank for posting your query and explanations. Just wanted to let you know that I read and aware of your post. I will get back to you with an answer when I have something concrete, there are still some things we want to discuss within the search team.

@yuri-lab
Copy link

Another use case for function score query. Our application constructs Elasticsearch queries using multiple components. Each component may implement its own scoring method. Multiple scoring functions are defined and combined with function score query. It helps isolate the implementation of each function. With script query all functions have to be implemented in single script. It doesn't support modularity of the code.

@mayya-sharipova
Copy link
Contributor Author

@yuri-lab Thanks for your comment.

Multiple scoring functions are defined and combined with function score query.

You can define multiple functions in painless script as well, for example:

{
  "query": {
    "script_score" : {
      "query" : {
        "match_all" : {}
      },
      "script" : {
        "source": """
          long myFun1(long x) { x * 10 }
          long myFun2(long x) { x * 100 }
          return myFun1(doc['field1'].value) + myFun2(doc['field2'].value)
        """
      }
    }
  }

The only limitation of script_score query in comparison with function_score query is that you can't apply separate filters for the functions in script_score query. For this, you would need to write more complex bool query.

Does this satisfy your requirement?

@mayya-sharipova
Copy link
Contributor Author

@lrynek I have thought about your query, and I can see that we can implement it through a bool and script_score queries, something like this:

{
  "query" : {
    "bool" : {
      "should" : [
        {
          "constant_score" : {
            "filter" : {
              "term" : { "location.city_id": 1}
            },
            "boost" : 35
          }
        },
        {
          "script_score" : {
            "query" : {"match_all": {}},
            "script": {
              "source": """
                0.89* 65 + 18 * doc['some_indexed_factor_value'].value
              """,
              "params": {
                <params>
              }
            }
          }
        }
      ]
    }
  }
}

About your needs:

  • sum. bool query sums the scores of its clauses. In script score you can write any scoring formula you want.
  • weight. Many ES queries (including script_score and bool) support boost param.
  • explanation. What we have right now is that for each query, you can provide a name , and the search response will include for each hit the matched_queries it matched on. There also a feature to provide a custom script explanation. This doesn't completely address your explanation request, but may be later we can consider including name queries in explanations as well.

@lrynek
Copy link

lrynek commented Apr 24, 2020

@yuri-lab I cannot agree more with you - it's also our use case 👍

@mayya-sharipova I can see the possibility of transferring and reproducing our current query needs into the bool and script_score combinations but for me it degrades the experience of the new approach versus the one we are used to... 😞
The explicit and consistent approach in function_score is far better IMHO... Not sure what is the reasoning behind deprecating it 🤷‍♂️

To give you even more examples from the real usage, where we are not manipulating ElasticSearch requests directly but via other language components that adds up their own scoring factors (it is very modular, reusable and scalable at the same time // here in PHP each factor generator class is defined as a service that can depend on other services in order to build/generate the final particular function_score query function):

PHP Factor builders / generators classes:

namespace SearchBridge\Factor;

use SearchEngine\Domain\ValueObject\Query;

final class FirstFactor implements FactorInterface
{
    public function key(): string
    {
        return 'first_factor';
    }

    public function definition(Query $query): array
    {
        return [
            'script_score' => [
                'script' => [
                    'lang'   => 'painless',
                    'source' => $this->scriptSource(),
                    'params' => $this->scriptParams($query),
                ],
            ],
            'weight' => $this->factorWeightRepository->weightForFactor($this),
        ];
    }
}

final class SecondFactor implements FactorInterface
{
    public function key(): string
    {
        return 'second_factor';
    }

    public function definition(Query $query): array
    {
        $cityIds = $this->cityResolver->resolve($query);

        if ($cityIds->isEmpty())
        {
            return [];
        }

        return [
            'filter' => [
                'terms' => [
                    'location.city_id' => $cityIds->all(),
                ],
            ],
            'weight' => $this->factorWeightRepository->weightForFactor($this),
        ];
    }
}

final class ThirdFactor implements FactorInterface
{
    public function key(): string
    {
        return 'third_factor';
    }

    public function definition(Query $query): array
    {
        return [
            'field_value_factor' => [
                'field' => 'factors.some_indexed_factor_value',
                'missing' => 0,
            ],
            'weight' => $this->factorWeightRepository->weightForFactor($this),
        ];
    }
}

With the approach you suggest, my team will be forced to get rid of this very good architecture and to parse parts of ElasticSearch query (scripts, etc.) as strings and inject here or there specific logic. For me it is unacceptable // will force me to (sarcasm starts) sort of programmer's seppuku... (sarcasm ended 😉😄 🏯⚔️)

@yuri-lab
Copy link

@mayya-sharipova Conceptually yes, but in practice this approach requires string manipulation to construct source for the script, which is error-prone. Also it goes against best practice recommended by Elasticsearch to keep script source static and change only script parameters.

@rjernst rjernst added the Team:Search Meta label for search team label May 4, 2020
@mayya-sharipova
Copy link
Contributor Author

We have discussed this issue again within the team, and the conclusion is that we would like ES users to use bool query instead of function_score query to combine queries/functions. There has been a lot of optimizations done for bool queries on the Lucene side to make them smarter and more efficient, which function_score query doesn't have.

@lrynek
Copy link

lrynek commented May 12, 2020

@mayya-sharipova Can we please sync via a Zoom call on the topic? (maybe even with some other devs involved on your side) It would be cool to explain everything this way 🙏
I would like to have this functions declaration possibility apart of the bool queries as the Lucene optimizations are regarding the word analysis and text matching scoring that I would like to use as one of the factors. With this in mind I will have full control over the ranking on my side with a bunch of many other cofactors apart from the mere text matching / bool query matching.

@AakashMittal
Copy link

We have a ecommerce store where use case of combining score of two factors:-

  1. lucene score of text matching/bool query matching
  2. Other factors like 'newnesss' , numerical factors like 'number of images' of products ,'number. of words in name/decription' and some other numerical factors.

Right now they are used in a very clean manner using 'function_score query'. Based on the discussion above it looks like 'script score query' will result in too much complexity.

@mayya-sharipova : What are the specific advantages  you and your team see in deprecating function score query in favour of script score query?

@webemat
Copy link

webemat commented May 25, 2020

We have a solution in place which is very similar to that of AakashMittal.

We've got two crucial points:

  1. The Score of the document gets multiplied by the factors like 'newness', individual boost score, etc. (boostMode = MULTIPLY). The factors themselves get combined by weighted average (scoreMode = AVG)
    At first we had these factors added to the score, but the impact on the final score was varying too much throughout the different requests (too less on high scores, too much on low scores).

  2. Some factors are only applied, if the documents meet the (sub-)query of the function score factor. E.g. Only give a boost, if document X belongs to category XY.

Our greatest concern is whether this is still possible when migrating to script score query?

@mayya-sharipova
Copy link
Contributor Author

mayya-sharipova commented Jun 4, 2020

@AakashMittal thanks for providing your use-case. All these factors can be combined using script_score, and I can't see much complexity in this except of trouble of rewriting queries.

: What are the specific advantages you and your team see in deprecating function score query in favour of script score query?

It is a quite bulky query, and difficult to reason about. It has a number of bugs for edge-cases and un-intuitive behaviours (how weights get propagated, behaviour in nested context etc). We want to replace with a simple script_score and bool queries that we have put and putting a lot of work in optimizing.

@mayya-sharipova
Copy link
Contributor Author

@webemat thank you for providing your use-case.

The Score of the document gets multiplied by the factors like 'newness', individual boost score, etc. (boostMode = MULTIPLY).

You can do this combination through script_score query.

Some factors are only applied, if the documents meet the (sub-)query of the function score factor. E.g. Only give a boost, if document X belongs to category XY. \

Depending what exactly are these factors, some of them can be implemented through a script_score query. In this example a doc score will get a boost depending on a category it belongs.

{
  "query": {
    "script_score": {
      "query": {
        "match": {
          "message": "apple"
        }
      },
      "script": {
        "source": "_score * params.getOrDefault(doc["category"].value, 1)",
        "params": {
          "books": 10,
          "computers": 100,
          "food": 1
        }
      }
    }
  }
}

We also would like to emphasize that we have not found much evidence from literature where it is useful to multiply scores from two textual queries, that's why we encourage ES users to combine queries by summing scores through bool queries.

@lrynek
Copy link

lrynek commented Jun 4, 2020

@mayya-sharipova can you please refer also to my latest comment? Thank you! 🙂

@timforr
Copy link

timforr commented Jun 23, 2020

@mayya-sharipova I currently use function_score to multiply the values of a standard query (with should, must, boosts, etc) and a custom query (involving a dense_vector dotProduct). How can I replicate that with script_score? I see your suggestion about putting the custom script in a bool, but bool sums the results whereas I need them multiplied.

@telendt
Copy link
Contributor

telendt commented Jun 23, 2020

@timforr

I currently use function_score to multiply the values of a standard query (with should, must, boosts, etc) and a custom query (involving a dense_vector dotProduct). How can I replicate that with script_score?

How does your second, "custom query" involving dot product look like? It uses script_score, doesn't it? Then your script can return _score * dotProduct(...), you don't need function score for that.

@timforr
Copy link

timforr commented Jun 23, 2020

@telendt Great, thank you, that does indeed work.

@ngadkari3
Copy link

ngadkari3 commented Feb 6, 2021

We are looking for a way to combine scores from two sub queries. One subquery is will be our standard TFIDF text query and another query will be our KNN query. We are trying to boost our recall when the input text from our customers dont exactly match the indexed documents by leveraging word embeddings. We want to Multiply the scores of these queries. Is this possible to do with a script_score query?

@mayya-sharipova
Copy link
Contributor Author

@ngadkari3 No, script_score is not suitable for your use case, as it will not increase recall, only documents selected by the internal query can be scored (by knn or any other methods).
Currently there is no way in elasticsearch to multiple scores from 2 queries, you can only do the sum using bool or dis_max query.

@lrynek
Copy link

lrynek commented Feb 8, 2021

@mayya-sharipova are your team decided to proceed with the deprecation or it will be preserved as a feature? What are the decision 🤔 🙂

@mayya-sharipova
Copy link
Contributor Author

are your team decided to proceed with the deprecation or it will be preserved as a feature? What are the decision

@lrynek Sorry, I don't have an update for this. For now, we are keeping function_score query. I am hoping sometime to get back to this.

@mayya-sharipova
Copy link
Contributor Author

Update on this issue:

  • Currently script_score query has all functionalities of the function_score query except the ability to execute several functions in parallel and combine their scores in different modes through script_score parameter
  • We found script_score of min, max, avg, multiply to be very esoteric and not much used. script_score of sum currently can be easily imitated through a boolean query.
  • The only missing functionality that we would like to implement is script_score of first – a score of the first query is applied. For this we have several proposals: scripted_boolean, compound_query, first_query
  • For now, we decided not to deprecate function_score query until we have a way to implement missing functionalities. Thus I will be closing this issue, and the missing functionalities will be tracked in the corresponding issues.

mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this issue Jun 17, 2022
We had a plan to deprecate function_score query with
script_score query, but ran into a roadblock of missing
functionality to combine scores from different
functions (particularly "first" script_score).
Wee have several proposal to address this missing
functionality:
 [scripted_boolean](elastic#27588 (comment))
 [compound_query](elastic#51967)
 [first_query](elastic#52482)

But for now, we decided not to deprecate function_score query,
and hence we need to remove any mention that we are deprecating it.

Relates to elastic#42811
Closes elastic#71934
mayya-sharipova added a commit that referenced this issue Jun 17, 2022
We had a plan to deprecate function_score query with
script_score query, but ran into a roadblock of missing
functionality to combine scores from different
functions (particularly "first" script_score).
Wee have several proposal to address this missing
functionality:
 [scripted_boolean](#27588 (comment))
 [compound_query](#51967)
 [first_query](#52482)

But for now, we decided not to deprecate function_score query,
and hence we need to remove any mention that we are deprecating it.

Relates to #42811
Closes #71934
mayya-sharipova added a commit that referenced this issue Jun 17, 2022
We had a plan to deprecate function_score query with
script_score query, but ran into a roadblock of missing
functionality to combine scores from different
functions (particularly "first" script_score).
Wee have several proposal to address this missing
functionality:
 [scripted_boolean](#27588 (comment))
 [compound_query](#51967)
 [first_query](#52482)

But for now, we decided not to deprecate function_score query,
and hence we need to remove any mention that we are deprecating it.

Relates to #42811
Closes #71934
mayya-sharipova added a commit that referenced this issue Jun 17, 2022
We had a plan to deprecate function_score query with
script_score query, but ran into a roadblock of missing
functionality to combine scores from different
functions (particularly "first" script_score).
Wee have several proposal to address this missing
functionality:
 [scripted_boolean](#27588 (comment))
 [compound_query](#51967)
 [first_query](#52482)

But for now, we decided not to deprecate function_score query,
and hence we need to remove any mention that we are deprecating it.

Relates to #42811
Closes #71934
@javanna javanna added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search Relevance/Ranking Scoring, rescoring, rank evaluation. Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

No branches or pull requests