Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add_faiss_index raises ValueError: not enough values to unpack (expected 2, got 1) #6791

Closed
NeuralFlux opened this issue Apr 8, 2024 · 3 comments · Fixed by #6803
Closed

Comments

@NeuralFlux
Copy link

NeuralFlux commented Apr 8, 2024

Describe the bug

Calling add_faiss_index on a Dataset with a column argument raises a ValueError. The following is the trace

    214 def replacement_add(self, x):
    215     """Adds vectors to the index.
    216     The index must be trained before vectors can be added to it.
    217     The vectors are implicitly numbered in sequence. When `n` vectors are
   (...)
    224         `dtype` must be float32.
    225     """
--> 227     n, d = x.shape
    228     assert d == self.d
    229     x = np.ascontiguousarray(x, dtype='float32')

ValueError: not enough values to unpack (expected 2, got 1)

Steps to reproduce the bug

  1. Load any dataset like ds = datasets.load_dataset("wikimedia/wikipedia", "20231101.en")["train"]
  2. Add an FAISS index on any column ds.add_faiss_index('title')

Expected behavior

The index should be created

Environment info

  • datasets version: 2.18.0
  • Platform: Linux-6.5.0-26-generic-x86_64-with-glibc2.35
  • Python version: 3.9.19
  • huggingface_hub version: 0.22.2
  • PyArrow version: 15.0.2
  • Pandas version: 2.2.1
  • fsspec version: 2024.2.0
  • faiss-cpu version: 1.8.0
@NeuralFlux
Copy link
Author

I realized I was passing a string column to this instead of float. Is it possible to add a warning or error to prevent users from falsely believing there's a bug?

@Dref360
Copy link
Contributor

Dref360 commented Apr 8, 2024

Hello!

I agree that we could add some safeguards around the type of ds[column]. At least for FAISS, we need the column to be made of embeddings as FAISS doesn't perform the embeddings itself.

I can propose a PR sometime this week.

@NeuralFlux
Copy link
Author

@Dref360 thanks for the initiative!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants