Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script: Fields API for Dense Vector #83550

Merged
merged 19 commits into from
Feb 16, 2022

Conversation

stu-elastic
Copy link
Contributor

@stu-elastic stu-elastic commented Feb 4, 2022

Adds the fields API for dense_vector field mapper.

Adds a DenseVector interface for the value type.

Implemented by:

  • KnnDenseVector which wraps a decoded float array from VectorValues
  • BinaryDenseVector which lazily decodes a BytesRef from BinaryDocValues

The vector operations have moved into those implements from BinaryDenseVectorScriptDocValues.java and KnnDenseVectorScriptDocValues.java, respectively.

The DenseVector API is:

float getMagnitude();
double dotProduct(float[] | List);
double l1Norm(float[] | List);
double l2Norm(float[] | List);
float[] getVector();
int dims();

boolean isEmpty(); // does the value exist
int size();        // 0 if isEmpty(), 1 otherwise
Iterator<Float> iterator()

dotProduct, l1Norm and l2Norm take a float[] or a List via the
a delegating default method on the DenseVector interface.

The DenseVectorDocValuesField abstract class contains two getter APIS.
It is implemented by KnnDenseVectorDocValuesField and
BinaryDenseVectorDocValuesField.

DenseVector get()
DenseVector get(DenseVector defaultValue)

The get() method is included because there isn't a good default dense vector,
so that API returns an empty DenseVector which throws an
IllegalArgumentException for all method calls other than isEmpty(),
size() and iterator().

The empty dense vector will always be DenseVector.EMPTY in case users want
to use equality checks.

Refs: #79105

Adds the fields API for `dense_vector` field mapper.

Adds a `DenseVector` interface for the value type.

Implemented by:
 * `KnnDenseVector` which wraps a decoded float array from `VectorValues`
 * `BinaryDenseVector` which lazily decodes a `BytesRef` from `BinaryDocValues`

The `DenseVector` API is simliar to `BinaryDenseVectorScriptDocValues`.

```
float getMagnitude();
double dotProduct(float[]);
double l1Norm(float[]);
double l2Norm(float[]);
float[] getVector();
int dims();

boolean isEmpty(); // does the value exist
int size();        // 0 if isEmpty(), 1 otherwise
PrimitiveIterator.OfDouble iterator()
```

The `DenseVectorDocValuesField` abstract class contains two getter APIS.
It is implemented by  `KnnDenseVectorDocValuesField` and
`BinaryDenseVectorDocValuesField`.

```
DenseVector get()
DenseVector get(DenseVector)
```

The `get()` method is included because there isn't a good default dense vector,
so that API returns an empty `DenseVector` which throws an
`IllegalArgumentException` for all method calls other than `isEmpty()`,
`size()` and `iterator()`.

The empty dense vector will always be `DenseVector.EMPTY` in case users want
to use equality checks.

Refs: elastic#79105
@stu-elastic stu-elastic added >enhancement :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache v8.2.0 labels Feb 4, 2022
@elasticsearchmachine
Copy link
Collaborator

Hi @stu-elastic, I've created a changelog YAML for you.

@stu-elastic stu-elastic marked this pull request as ready for review February 9, 2022 19:22
@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Feb 9, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@stu-elastic
Copy link
Contributor Author

@jtibshirani can you particularly take a look to make sure I moved the vector operations dotProduct, l1Norm, etc over correctly? KnnDenseVectorScriptDocValues -> KnnDenseVector, BinaryDenseVectorScriptDocValues -> BinaryDenseVector

Copy link
Contributor

@jdconrad jdconrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me from the scripting perspective. Left some comments. Please wait for @jtibshirani to review before committing.

import java.util.Iterator;
import java.util.NoSuchElementException;

public interface DenseVector extends Iterable<Float> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should cosineSimilarity be part of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ported over the ScriptDocValues API which was also missing this. We'd have to normalize the query vector. @jtibshirani, what are your thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should include cosineSimilarity, this was available in the old API and is something people will expect.

Some other notes on API:

  • I don't think it's helpful to extend Iterable<Float>, since you can just retrieve the vector through getVector.
  • I liked the old name vectorValue a little more. Currently you'd write DenseVector#getVector which seems kind of weird since the object is already a vector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 17 to 21
double dotProduct(QueryVector queryVector);

double l1Norm(QueryVector queryVector);

double l2Norm(QueryVector queryVector);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should any of these take an additional boolean parameter to normalize the QueryVector? Or should we document that these expect the QueryVector to be normalized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those don't need to be normalized, is my understanding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, they do not expect the query vector to be normalized. Only cosine similarity needs to normalize the vector (and this should happen "behind the scenes").

public interface DenseVector extends Iterable<Float> {
float getMagnitude();

// The implementations of these vector operations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a helpful comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

/**
* Get the DenseVector for a document if one exists, DenseVector.EMPTY otherwise
*/
public abstract DenseVector get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we should we want to add this now? It may be beneficial to figure out how we plan to implement implicit default values for the scripting fields API in general first since deprecation is a long-term process if we decide to do something different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on discussions with the search team, the only sensible defaults are the empty vector I've added and null. That is what makes this field type an outlier. I think it's worth keeping it.

this.dims = dims;
}

public int dims() {
return dims;
}

private DenseVector getVectorChecked() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename getCheckedVector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed.

throw new IllegalArgumentException(badElement(element, i));
}

protected static String badElement(Object element, int index) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is only used a single time so please just move the message to the get method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved.

return new ListQueryVector(vector);
}

public static QueryVector fromObject(Object vector) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May want a comment that is also for Painless support as we do not have type overloading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@@ -64,30 +58,14 @@ public SortedBinaryDocValues getBytesValues() {
if (indexed) {
VectorValues values = reader.getVectorValues(field);
if (values == null || values == VectorValues.EMPTY) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the null check here anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Comment on lines +61 to +63
// There's no way for KnnDenseVectorDocValuesField to reliably differentiate between VectorValues.EMPTY and
// values that can be iterated through. Since VectorValues.EMPTY throws on docID(), pass a null instead.
values = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this behavior to be a bit strange in that VectorValues.EMPTY will throw an exception, but null does not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VectorValues.EMPTY is part of the Lucene API, so it cannot be easily changed as part of this PR

Comment on lines +40 to +47
# implementation of DenseVectorDocValuesField
class org.elasticsearch.xpack.vectors.query.KnnDenseVectorDocValuesField {
}

# implementation of DenseVectorDocValuesField
class org.elasticsearch.xpack.vectors.query.BinaryDenseVectorDocValuesField {
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While not necessary as these are top-level field classes, for consistency we have duck-typed all of them so far.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @stu-elastic, this looks good to me overall. I had one high-level question about the QueryVector class -- why do we need this class? Can we just force the user to pass in a float array? Or always convert to an array up-front? In our testing we found that converting to a float array upfront was good for performance, since you avoid the unboxing costs of iterating over a List<Float> .

*/
public abstract DenseVector get();

public abstract DenseVector get(DenseVector defaultValue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really doubtful people will use this API because there's no natural value for a "default" high dimensional vector. I think we could drop it (if that fits with how the rest of the API is designed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd like it to be consistent. There'd only be two "defaults" available right now, DenseVector.EMPTY and null, which is unfortunate but the nature of this type.

import java.util.Iterator;
import java.util.NoSuchElementException;

public interface DenseVector extends Iterable<Float> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should include cosineSimilarity, this was available in the old API and is something people will expect.

Some other notes on API:

  • I don't think it's helpful to extend Iterable<Float>, since you can just retrieve the vector through getVector.
  • I liked the old name vectorValue a little more. Currently you'd write DenseVector#getVector which seems kind of weird since the object is already a vector?

Comment on lines 17 to 21
double dotProduct(QueryVector queryVector);

double l1Norm(QueryVector queryVector);

double l2Norm(QueryVector queryVector);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, they do not expect the query vector to be normalized. Only cosine similarity needs to normalize the vector (and this should happen "behind the scenes").

public BinaryDenseVector(BytesRef value, int dims, Version indexVersion) {
this.value = value;
this.indexVersion = indexVersion;
this.vector = new float[dims];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid eagerly creating this vector -- it can happen on every doc and the allocation is significant (the vector dims can be as high as 2048). Maybe we can put it off to getVector() and cache the value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@stu-elastic
Copy link
Contributor Author

I don't think it's helpful to extend Iterable, since you can just retrieve the vector through getVector.

The Field interface extends Iterable, because most fields are multi-valued. Since it doesn't make sense in this case, I'll have it throw UnsupportedOperationException and we'll make iterable duck typed later on.

I liked the old name vectorValue a little more. Currently you'd write DenseVector#getVector which seems kind of weird since the object is already a vector?

Users rarely see the class name in painless, check out the rest tests to see how that ends up:
field('dv').get().vector[2], that's why I changed the name to vector. I agree it looks weird in Java.

I can change it to field('dv').get().vectorValue[2] if you'd prefer.

I agree we should include cosineSimilarity, this was available in the old API and is something people will expect.
Only cosine similarity needs to normalize the vector (and this should happen "behind the scenes").

cosine similarity is:
dotProduct(queryVector) / denseVector.getMagnitude()

The query vector will be normalized on every API call in this case. I can do the normalization without allocating a new float[] for the query vector for BinaryDenseVector. However, KnnDenseVector calls into Lucene's VectorUtil.cosine so I'll have to create a new float[] for the normalized query vector.

ScoreScriptUtils.cosineSimilarity() will still work and will avoid this penalty for KnnDenseVector.

@stu-elastic
Copy link
Contributor Author

Re: QueryVector

why do we need this class? Can we just force the user to pass in a float array? Or always convert to an array up-front? In our testing we found that converting to a float array upfront was good for performance, since you avoid the unboxing costs of iterating over a List .

My thinking around this class was if you are using the query vector once, it's the same cost, minus the array allocation, to do the conversion in line.

Users of the class bindings in ScoreScriptUtils only pay this cost one time at the cost of the limitation that using different query vectors will silently fail.

Users of this API have to pay that cost on every execution unless we add new painless features.

I could remove this class and add a toFloatArray() augmentation to List so users will need to call field('dv').get().dotProduct(params.query.toFloatArray()).

If we go this route, I'd add those augmentations in a different pr (we'd want ones for double array, long array and int array as well). What are your thoughts?

@jtibshirani
Copy link
Contributor

Thanks for the explanations! I agree field('dv').get().vector[...] is pretty readable, it works for me. For cosine similarity, I think it is okay to normalize on every call. We've moved towards the messaging of "if you really want to optimize cosine similarity, then normalize all your vectors in advance and use dot product". Cosine similarity is there as a convenience and it's okay if it is not as performant.

Maybe we could catch up offline about QueryVector to discuss the alternatives?

@stu-elastic
Copy link
Contributor Author

Based on offline discussion with @jtibshirani,
A) It's preferable to do extra runtime operations to handle Lists than to allocate large arrays for every call for every document.
B) The existing ScriptScriptUtils class bindings need to stay approximately as performant as before, even if it costs some code reuse. Some per-segment overhead is fine.

A implies

  1. We'll keep QueryVector for the new API.
  2. The new API can avoid calls into VectorUtil to avoid array allocations

@stu-elastic
Copy link
Contributor Author

stu-elastic commented Feb 14, 2022

@jdconrad and @jtibshirani I've updated the PR based on the comments above, can you take another look?

  • added cosineSimilarity, the API can handle pre-normalized query vectors for use from ScoreScriptUtils. The default behavior is to normalize the query vector while doing the dot product computation.
  • there's QueryVector and float[] versions of dotProduct, l1Norm, l2Norm and cosineSimilarity. This avoids forcing users to allocate large float arrays to use those methods as would be necessary if there were only float[] versions
  • QueryVector now is just a thin wrapper around List. The avoids any indirection in the functions above when working with float[] and so avoid any performance change for existing users of ScoreScriptUtils.

@stu-elastic
Copy link
Contributor Author

Do we need getVector() at all? What are people using that for in the old API?

@jtibshirani
Copy link
Contributor

Do we need getVector() at all? What are people using that for in the old API?

Indeed this is useful to have, people were using it as part of custom calculations (#51964).

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some last small comments, apart from these it looks good to me!

return dims;
}

protected static ByteBuffer wrap(BytesRef dv) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be protected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made wrap private.

/**
* dotProduct of doc vector and query vector that normalizes the query vector while performing the calculation.
*/
protected double dotProduct(QueryVector qv, float qvMagnitude) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need these special methods that take qvMagnitude, because this calculation is the equivalent to just doing a normal dotProduct, then dividing the result by qvMagnitude.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed these methods and updated cosSim implementation.

checkDimensions(getDims(), qv.size());
return dotProduct(qv);

} else if (queryVector instanceof List<?> list) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this just be instanceof List<Number>? When we wrote the first version of vectors in painless, I think we could assume the param would have this type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't due to type erasure. We can check instanceof List<?> and then do an unchecked cast below, adding @SuppressWarnings("unchecked").

If they send in a bad type, they'll get a different reason message. For the test

  - do:
      catch: bad_request
      search:
        body:
          query:
            script_score:
              query: { "exists": { "field": "bdv" } }
              script:
                source: |
                  (int) field(params.field).get().l2Norm(params.query)
                params:
                  query: [1, 2, "three"]
                  field: bdv

the error will look like

        "failed_shards" : [
          {
            "shard" : 0,
            "index" : "test-index",
            "node" : "ueT4QigUSbK_ngYxjFfLZA",
            "reason" : {
              "type" : "script_exception",
              "reason" : "runtime error",
              "script_stack" : [
                "org.elasticsearch.xpack.vectors.query.BinaryDenseVector.l2Norm(BinaryDenseVector.java:127)",
                "org.elasticsearch.xpack.vectors.query.DenseVector.l2Norm(DenseVector.java:89)",
                "(int) field(params.field).get().l2Norm(params.query)
",
                "                                             ^---- HERE"
              ],
              "script" : "(int) field(params.field).get().l2Norm(params.query) ...",
              "lang" : "painless",
              "position" : {
                "offset" : 45,
                "start" : 0,
                "end" : 53
              },
              "caused_by" : {
                "type" : "class_cast_exception",
                "reason" : "class java.lang.String cannot be cast to class java.lang.Number (java.lang.String and java.lang.Number are in module java.base of loader 'bootstrap')",
                "stack_trace" : "java.lang.ClassCastException: class java.lang.String cannot be cast to class java.lang.Number (java.lang.String and java.lang.Number are in module java.base of loader 'bootstrap')
	at org.elasticsearch.xpack.vectors.query.BinaryDenseVector.l2Norm(BinaryDenseVector.java:127)

currently, the error looks like

        "failed_shards" : [
          {
            "shard" : 0,
            "index" : "test-index",
            "node" : "Fty-bHa-RC6HTekobvpLrw",
            "reason" : {
              "type" : "script_exception",
              "reason" : "runtime error",
              "script_stack" : [
                "org.elasticsearch.xpack.vectors.query.QueryVector.get(QueryVector.java:44)",
                "org.elasticsearch.xpack.vectors.query.BinaryDenseVector.l2Norm(BinaryDenseVector.java:126)",
                "org.elasticsearch.xpack.vectors.query.DenseVector.l2Norm(DenseVector.java:89)",
                "(int) field(params.field).get().l2Norm(params.query)
",
                "                                             ^---- HERE"
              ],
              "script" : "(int) field(params.field).get().l2Norm(params.query) ...",
              "lang" : "painless",
              "position" : {
                "offset" : 45,
                "start" : 0,
                "end" : 53
              },
              "caused_by" : {
                "type" : "illegal_argument_exception",
                "reason" : "Cannot treat [three] at index [2] of type [java.lang.String] as Number",
                "stack_trace" : "java.lang.IllegalArgumentException: Cannot treat [three] at index [2] of type [java.lang.String] as Number
	at org.elasticsearch.xpack.vectors.query.QueryVector.get(QueryVector.java:44)

ie the reason will be

class java.lang.String cannot be cast to class java.lang.Number (java.lang.String and java.lang.Number are in module java.base of loader 'bootstrap')

vs

Cannot treat [three] at index [2] of type [java.lang.String] as Number

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading quickly, I actually find the first one easier to understand, so happy to go with that if it helps simplify the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me.

* Provide consistent access to query vector elements from a backing list or float array. Used to provide
* a flexible API for DenseVector in painless, which does not have method overloading.
*/
public class QueryVector {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there still a benefit to introducing this wrapper class, or could we just use List<Number> directly in the vector functions? (Sorry if I missed this in our conversations)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benefit is avoiding an unchecked cast and a nicer error message if the user sends in a mixed type List.

We could avoid the wrapper and use a helper instead method, that would still give a nice error and remove the QueryVector helper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed QueryVector.

Copy link
Contributor

@jdconrad jdconrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the changes look good to me! Thanks for the work on this @stu-elastic

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the iterations! We have rally nightly benchmarks for dense_vector performance, I will keep an eye on them to double-check there's no performance regression with the original docvalues syntax.

@@ -99,15 +81,15 @@ public double l2Norm(QueryVector queryVector) {
@Override
public double cosineSimilarity(float[] queryVector, boolean normalizeQueryVector) {
if (normalizeQueryVector) {
return dotProduct(queryVector, DenseVector.getMagnitude(queryVector)) / getMagnitude();
return (dotProduct(queryVector) / DenseVector.getMagnitude(queryVector)) / getMagnitude();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment, we usually would do dotProduct(queryVector) / (DenseVector.getMagnitude(queryVector) * getMagnitude()); since it's more readable (similar to how you'd write it mathematically) and potentially a little faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, changed.

@stu-elastic
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/part-2

@stu-elastic stu-elastic merged commit b44fcfb into elastic:master Feb 16, 2022
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Feb 16, 2022
…ijun/elasticsearch into fix-none-tsdb-index-dimension-tests

* 'fix-none-tsdb-index-dimension-tests' of github.com:weizijun/elasticsearch: (37 commits)
  [docs] Mention JDK 17 in the Contributing docs (elastic#84018)
  Fix GeoIpDownloader startup during rolling upgrade (elastic#84000)
  Script: Fields API for Dense Vector (elastic#83550)
  Move InferenceConfigUpdate under VersionedNamedWriteable (elastic#84022)
  [ML] Fix license feature test cleanup (elastic#84020)
  Replace deprecated api in artifact transforms (elastic#84015)
  QL: Add leniency option to SQL CLI (elastic#83795)
  [Stack Monitoring] add kibana_stats version alias to -mb template (elastic#83930)
  Optimize spliterator for ImmutableOpenMap (elastic#83899)
  Feature usage actions for archive (elastic#83931)
  Use latch to speedup multi feature migration test (elastic#84007)
  Make action names available in NodeClient (elastic#83919)
  [DOCS] Re-add HTTP proxy setings from elastic#82737 (elastic#84001)
  Add CI matrix configuration for snapshot BWC versions (elastic#83990)
  Update YAML Rest tests to check for product header on all responses (elastic#83290)
  TSDB: Add time series aggs cancellation (elastic#83492)
  [DOCS] Fix percolate query headings (elastic#83988)
  [DOCS] Move tip for percolate query example (elastic#83972)
  Simplify LocalExporter cleaner function to fix failing tests (elastic#83812)
  [GCE Discovery] Correcly handle large zones with 500 or more instances (elastic#83785)
  ...
probakowski pushed a commit to probakowski/elasticsearch that referenced this pull request Feb 23, 2022
Adds the fields API for `dense_vector` field mapper.

Adds a `DenseVector` interface for the value type.

Implemented by:
 * `KnnDenseVector` which wraps a decoded float array from `VectorValues`
 * `BinaryDenseVector` which lazily decodes a `BytesRef` from `BinaryDocValues`

The vector operations have moved into those implements from `BinaryDenseVectorScriptDocValues.java` and  `KnnDenseVectorScriptDocValues.java`, respectively.

The `DenseVector` API is:
```
float getMagnitude();
double dotProduct(float[] | List);
double l1Norm(float[] | List);
double l2Norm(float[] | List);
float[] getVector();
int dims();

boolean isEmpty(); // does the value exist
int size();        // 0 if isEmpty(), 1 otherwise
Iterator<Float> iterator()
```

`dotProduct`, `l1Norm` and `l2Norm` take a `float[]` or a `List` via the
a delegating `default` method on the `DenseVector` interface.

The `DenseVectorDocValuesField` abstract class contains two getter APIS.
It is implemented by  `KnnDenseVectorDocValuesField` and
`BinaryDenseVectorDocValuesField`.

```
DenseVector get()
DenseVector get(DenseVector defaultValue)
```

The `get()` method is included because there isn't a good default dense vector,
so that API returns an empty `DenseVector` which throws an
`IllegalArgumentException` for all method calls other than `isEmpty()`,
`size()` and `iterator()`.

The empty dense vector will always be `DenseVector.EMPTY` in case users want
to use equality checks.

Refs: elastic#79105
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >enhancement Team:Core/Infra Meta label for core/infra team v8.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants