Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] similarity search through query within the UI #2443

Closed
msminhas93 opened this issue Feb 28, 2023 · 10 comments
Closed

[FEATURE] similarity search through query within the UI #2443

msminhas93 opened this issue Feb 28, 2023 · 10 comments
Assignees
Labels
status: stale Indicates that there is no activity on an issue or pull request type: community request Indicates a feature requested by someone outside of the Argilla organization type: enhancement Indicates new feature requests

Comments

@msminhas93
Copy link

Is your feature request related to a problem? Please describe.
Not having the capability via the UI to quickly perform an embedding search based on a text query typed in the search bar is limiting. This capability would make bulk annotation much more flexible since you could search for concepts via a custom text input query rather than a fixed sample from the dataset.

Describe the solution you'd like
An option in the UI to allow for embedding search from the text query. This could be as a drop down having two option:

  1. word search
  2. embedding search

image

@davidberenstein1957
Copy link
Member

davidberenstein1957 commented Mar 2, 2023

Hi @msminhas93 I would love to see the feature.

We need to fine-tune what we want to achieve. Users that do have the ability to actually get embeddings are able to do so via the python client, hence, they could also use rg.load("dataset", vector=embedding). However, it might be useful to allow for deploying an embedding model alongside Argilla to allow for this, like weaviate does here or elasticsearch 8.5 does here.

@frascuchon @dvsrepo IMO, this also aligns with
#2150

@msminhas93 what would work best for you?

@msminhas93
Copy link
Author

Thank you for responding! I think the python client is awesome, but for rapid searches based on custom text inputs followed by bulk annotation with few deselections kind of workflow, having UI that supports embedding search would be extremely powerful. Also, domain experts can be nontechnical which would limit their capability to do such queries.

I would imagine this functionality similar to how the new search similar feature works. However, at the backend instead of just storing the embeddings, we store the encoder possibly as some kind of config. This could be as simple as the encoder name or an embed_text function or method (that has to subclass some default base with certain other housekeeping things) that accepts text as input and returns embeddings.
image

So when we press enter and the embedding search is enabled the callback will run the same logic as the find similar method but with the encoded input text vector.

An additional slider or UI component to filter the similarity score based on the input threshold would be useful too.

@davidberenstein1957
Copy link
Member

davidberenstein1957 commented Mar 2, 2023

@msminhas93 Thanks

An additional slider or UI component to filter the similarity score based on the input threshold would be useful too.

great suggestion! Could you mention that suggestion here too?

@davidberenstein1957
Copy link
Member

@msminhas93 better still could you add a UI specific issue for this and tag @Amelie-V ?

@github-actions
Copy link

github-actions bot commented Jun 1, 2023

This issue is stale because it has been open for 90 days with no activity.

@github-actions github-actions bot added the status: stale Indicates that there is no activity on an issue or pull request label Jun 1, 2023
@github-actions
Copy link

github-actions bot commented Jul 2, 2023

This issue was closed because it has been inactive for 30 days since being marked as stale.

@github-actions github-actions bot closed this as completed Jul 2, 2023
@davidberenstein1957 davidberenstein1957 added the type: community request Indicates a feature requested by someone outside of the Argilla organization label Jan 10, 2024
@davidberenstein1957
Copy link
Member

davidberenstein1957 commented Jan 10, 2024

Revisited some old issues as proposed by Damien Tanner.

@davidberenstein1957 davidberenstein1957 removed the status: stale Indicates that there is no activity on an issue or pull request label Jan 10, 2024
@davidberenstein1957 davidberenstein1957 changed the title Embedding search via the UI [FEATURE] Embedding search via the UI Jan 10, 2024
@davidberenstein1957 davidberenstein1957 changed the title [FEATURE] Embedding search via the UI [FEATURE] similarity search via the UI Jan 10, 2024
@davidberenstein1957 davidberenstein1957 changed the title [FEATURE] similarity search via the UI [FEATURE] similarity search within the UI Jan 10, 2024
@davidberenstein1957 davidberenstein1957 changed the title [FEATURE] similarity search within the UI [FEATURE] similarity search through query within the UI Feb 8, 2024
@davidberenstein1957
Copy link
Member

Potentially use BM25 as proposed here #2150

Copy link

github-actions bot commented Aug 4, 2024

This issue is stale because it has been open for 90 days with no activity.

@github-actions github-actions bot added the status: stale Indicates that there is no activity on an issue or pull request label Aug 4, 2024
Copy link

This issue was closed because it has been inactive for 30 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: stale Indicates that there is no activity on an issue or pull request type: community request Indicates a feature requested by someone outside of the Argilla organization type: enhancement Indicates new feature requests
Projects
None yet
Development

No branches or pull requests

4 participants