Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix semantic_search_usearch() for single query #2572

Merged
merged 1 commit into from
Apr 4, 2024

Conversation

karmi
Copy link
Contributor

@karmi karmi commented Apr 3, 2024

This patch fixes a bug where the sentence_transformers.quantization.semantic_search_usearch() method would fail with TypeError: 'numpy.float32' object is not iterable where only a single query is used.

Example code:

import numpy as np
import sentence_transformers

query_embeddings = np.array([[1, 2, 3]], dtype=np.int8)
corpus_embeddings = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.int8)

hits, duration = sentence_transformers.quantization.semantic_search_usearch(
  query_embeddings=query_embeddings,
  corpus_embeddings=corpus_embeddings,
  corpus_precision="binary",
  rescore=False,
  top_k=min(len(corpus_embeddings), 10))

This raises TypeError: 'numpy.float32' object is not iterable without this patch, because the code expects the scores and indices to be a 2D array:

outputs = (
[
[
{"corpus_id": int(neighbor), "score": float(score)}
for score, neighbor in zip(scores[query_id], indices[query_id])
]
for query_id in range(len(query_embeddings))
],
delta_t,
)

However, the usearch implementation in the Python client returns 1D arrays for a single query, and a 2D array for multiple queries:

def search(self, vectors: np.ndarray, count: int) -> Matches:
    if vectors.ndim == 1 or (vectors.ndim == 2 and vectors.shape[0] == 1):
        return self.search_one(vectors, count)
    else:
        return self.search_many(vectors, count)

The problem also manifests in the example code here:

queries = [
"How do I become a good programmer?",
"How do I become a good data scientist?",
]

When just a single query is passed, it fails with the error above.

I wasn't sure if I should add tests for this patch — I can definitely do so, when needed.

This patch fixes a bug where the semantic_search_usearch() method would
fail with `TypeError: 'numpy.float32' object is not iterable` where only
a single query is used.
@tomaarsen
Copy link
Collaborator

Hello!

... or (vectors.ndim == 2 and vectors.shape[0] == 1)

Oh I see! That's unexpected! I appreciate the details to reproduce and regarding the cause of the issue.
No tests will be necessary for this - I've opted to keep the tests minimal for the semantic_search_... quantization helper functions, as they're designed primarily to be quick helpers to get some experimentation going, to later be replaced by a more robust solution such as this (i.e. one that doesn't create an index on the fly 😄)

I've reproduced the issue and can confirm that your solution fixes it. Thanks a bunch! I'll merge this once the tests are green.

  • Tom Aarsen

@tomaarsen tomaarsen merged commit 3f4067f into UKPLab:master Apr 4, 2024
9 checks passed
@karmi
Copy link
Contributor Author

karmi commented Apr 4, 2024

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants