Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] enhance model_uploader workflow to support BGE models from huggingface #387

Closed
zhichao-aws opened this issue Apr 26, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@zhichao-aws
Copy link
Member

zhichao-aws commented Apr 26, 2024

Is your feature request related to a problem?
In OpenSearch we support some sentence-transformers model as pretrained models. The registration of pretrained models is much more convenient, and users don't need to change the cluster settings plugins.ml_commons.allow_registering_model_via_url.

With the development of the research and engineering evolution in IR domain, now there are much stronger text_embedding models in the open source community. (leaderboard ref) However, users still need to trace these models and generate the tarball manually, which is a heavy workload especially for those with little machine-learning background knowledge.

What solution would you like?
BGE models(https://huggingface.co/BAAI/bge-small-en-v1.5, https://huggingface.co/BAAI/bge-base-en-v1.5, https://huggingface.co/BAAI/bge-large-en-v1.5) have very strong text_embedding representation among the models with same size. And we can use them consistently with other sentence-transformers text_embedding models.

Considering the models will consume resources in local deployment, We can support bge-small-en-v1.5 and bge-base-en-v1.5 as pretrained models in OpenSearch.

What alternatives have you considered?
A clear and concise description of any alternative solutions or features you've considered.

Do you have any additional context?
opensearch-project/ml-commons#2210

@dblock
Copy link
Member

dblock commented Jun 24, 2024

Catch All Triage - 1 2 3 4 5 6

@dblock dblock removed the untriaged label Jun 24, 2024
@zhichao-aws
Copy link
Member Author

We need to deprecate this work item as the model use Reddits as training data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants