[FEATURE] enhance model_uploader workflow to support BGE models from huggingface #387

zhichao-aws · 2024-04-26T02:48:01Z

Is your feature request related to a problem?
In OpenSearch we support some sentence-transformers model as pretrained models. The registration of pretrained models is much more convenient, and users don't need to change the cluster settings plugins.ml_commons.allow_registering_model_via_url.

With the development of the research and engineering evolution in IR domain, now there are much stronger text_embedding models in the open source community. (leaderboard ref) However, users still need to trace these models and generate the tarball manually, which is a heavy workload especially for those with little machine-learning background knowledge.

What solution would you like?
BGE models(https://huggingface.co/BAAI/bge-small-en-v1.5, https://huggingface.co/BAAI/bge-base-en-v1.5, https://huggingface.co/BAAI/bge-large-en-v1.5) have very strong text_embedding representation among the models with same size. And we can use them consistently with other sentence-transformers text_embedding models.

Considering the models will consume resources in local deployment, We can support bge-small-en-v1.5 and bge-base-en-v1.5 as pretrained models in OpenSearch.

What alternatives have you considered?
A clear and concise description of any alternative solutions or features you've considered.

Do you have any additional context?
opensearch-project/ml-commons#2210

dblock · 2024-06-24T16:27:16Z

Catch All Triage - 1 2 3 4 5 6

zhichao-aws · 2024-06-25T02:30:46Z

We need to deprecate this work item as the model use Reddits as training data

zhichao-aws added enhancement New feature or request untriaged labels Apr 26, 2024

zhichao-aws mentioned this issue Apr 26, 2024

[FEATURE] enhance model_uploader workflow to support MIT-licensed models from huggingface #388

Closed

5 tasks

dblock removed the untriaged label Jun 24, 2024

zhichao-aws closed this as completed Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] enhance model_uploader workflow to support BGE models from huggingface #387

[FEATURE] enhance model_uploader workflow to support BGE models from huggingface #387

zhichao-aws commented Apr 26, 2024 •

edited

Loading

dblock commented Jun 24, 2024

zhichao-aws commented Jun 25, 2024

[FEATURE] enhance model_uploader workflow to support BGE models from huggingface #387

[FEATURE] enhance model_uploader workflow to support BGE models from huggingface #387

Comments

zhichao-aws commented Apr 26, 2024 • edited Loading

dblock commented Jun 24, 2024

zhichao-aws commented Jun 25, 2024

zhichao-aws commented Apr 26, 2024 •

edited

Loading