Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix few bugs on binary index with Faiss HNSW #1850

Merged
merged 1 commit into from
Jul 18, 2024

Conversation

heemin32
Copy link
Collaborator

@heemin32 heemin32 commented Jul 18, 2024

Description

Did a manual test on following items. Will raise a new PR implementing this as integ test.

Test case

  1. Can create binary index using Faiss HNSW
  2. Can create binary index with knn index as false
  3. Cannot create binary index using Lucene
  4. Cannot create binary index using Nmslib
  5. Cannot run training API on binary index
  6. Cannot create binary index using encoding(sq)
  7. Cannot run range query on binary index
  8. Can run knn query on binary index
  9. Can run knn query with nested field on binary index
  10. Can run knn query with filtering on binary index
  11. Can run knn query with filtering on nested field with binary index
  12. Can run script scoring with hamming on binary index
  13. Cannot run script scoring with hamming on non binary index
  14. Cannot run script scoring with non hamming on binary index
  15. Can run painless script with hamming on binary index
  16. Cannot run painless script with hamming on non binary index
  17. Cannot run painless script with non hamming on binary index
  18. Can run script scoring with hamming on binary index with knn set as false
  19. Can run painless script with hamming on binary index with knn set as false

Bug1 (2. Can create binary index with knn index as false)

  • Actual: Following index creation succeeded but got an error while ingesting as default space was set as l2.
  • Expected: It should set hamming as default space type and ingesting should succeed.
{
  "settings": {
    "index": {
      "knn": false
    }
  },
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "knn_vector",
        "dimension": 8,
        "data_type": "binary"
      }
    }
  }
}

Bug2 (3. Cannot create binary index using Lucene)

  • Actual: I got error \"hnsw\" configuration does not support space type: \"hammingbit\" which is not appropriate without mentioning what engine it uses.
  • Expected: The error message should have engine name as well because hnsw with faiss support hammingbit.
Create index with knn as false
{
  "settings": {
    "index": {
      "knn": false
    }
  },
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "knn_vector",
        "dimension": 8,
        "data_type": "binary",
        "method": {
          "name": "hnsw",
          "engine": "lucene",
          "space_type": "hammingbit"
        }
      }
    }
  }
}

Bug 3. (7. Cannot run range query on binary index)

  • Actual: Runtime error
  • Expected: It should be Unsupported operation error with a message like "radial search is not supported with binary index"
"root_cause": [
    {
    "type": "runtime_exception",
    "reason": "java.lang.Exception: Query Vector cannot be null"
    }
],

Bug 4 (11. Can run knn query with filtering on nested field with binary index)

  • Actual: Failed to run filtered query with binary Index. Error was happening while validating if exact search is required or not based on calculation cost
  • Expected: Filtered query with binary index should work.
    "root_cause": [
      {
        "type": "null_pointer_exception",
        "reason": "Cannot read the array length because the return value of \"org.opensearch.knn.index.query.KNNQuery.getQueryVector()\" is null"
      }
    ],

Related Issues

N/A

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

junqiu-lei
junqiu-lei previously approved these changes Jul 18, 2024
ryanbogan
ryanbogan previously approved these changes Jul 18, 2024
Copy link
Member

@ryanbogan ryanbogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! The integration tests for these cases will be in 2.16 right?

@@ -190,6 +190,14 @@ public static ValidationException validateKnnField(
return exception;
}

String vectorDataType = (String) fieldMap.get(VECTOR_DATA_TYPE_FIELD);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can fieldMap.get(VECTOR_DATA_TYPE_FIELD) be null?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be null.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it can be null then the type casting to string will cause a NPE. Please handle this gracefully.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't cause a NPE.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?? casting a null to string cause NPE. I am missing something here.

Copy link
Collaborator Author

@heemin32 heemin32 Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it and it didn't cause NPE. String is of Object and String can be null.

@@ -190,6 +190,14 @@ public static ValidationException validateKnnField(
return exception;
}

String vectorDataType = (String) fieldMap.get(VECTOR_DATA_TYPE_FIELD);
if (VectorDataType.BINARY.toString().equalsIgnoreCase(vectorDataType)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should convert the datatype to enum and then use == in if condition

Copy link
Collaborator Author

@heemin32 heemin32 Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because vectorDataType can be null, I choose this approach. Why should we convert it to the data type?
FYI, this line of code will be removed once we support binary index for IVF.

src/main/java/org/opensearch/knn/index/IndexUtil.java Outdated Show resolved Hide resolved
@jmazanec15
Copy link
Member

@heemin32 Can you provide intended behavior for the bugs as well?

@heemin32 heemin32 dismissed stale reviews from ryanbogan and junqiu-lei via 215068f July 18, 2024 17:02
@heemin32
Copy link
Collaborator Author

heemin32 commented Jul 18, 2024

LGTM! The integration tests for these cases will be in 2.16 right?

Will try. As the last resort, will do the manual test after code freeze once more.

@heemin32
Copy link
Collaborator Author

@heemin32 Can you provide intended behavior for the bugs as well?

Added.

Signed-off-by: Heemin Kim <heemin@amazon.com>
Copy link
Contributor

@shatejas shatejas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +486 to +487
? knnQuery.getQueryVector().length
: knnQuery.getByteQueryVector().length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the fact that we have two vectors. This will create a lot of branching in the code. We should have gone with generics or something with a different type of query. Something similar to lucene.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are already aware of it and captured it in #1810

@navneet1v
Copy link
Collaborator

@heemin32 lets ensure that GH action are successful before we merge this PR

Copy link
Member

@vamshin vamshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks

@heemin32 heemin32 merged commit 3a5fe64 into opensearch-project:main Jul 18, 2024
52 checks passed
@heemin32 heemin32 deleted the bugfix branch July 18, 2024 22:37
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jul 18, 2024
Signed-off-by: Heemin Kim <heemin@amazon.com>
(cherry picked from commit 3a5fe64)
heemin32 added a commit that referenced this pull request Jul 19, 2024
Signed-off-by: Heemin Kim <heemin@amazon.com>
(cherry picked from commit 3a5fe64)

Co-authored-by: Heemin Kim <heemin@amazon.com>
naveentatikonda pushed a commit to naveentatikonda/k-NN that referenced this pull request Jul 20, 2024
opensearch-project#1856)

Signed-off-by: Heemin Kim <heemin@amazon.com>
(cherry picked from commit 3a5fe64)

Co-authored-by: Heemin Kim <heemin@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants