Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support embeddings via ollama #21

Closed
wants to merge 1 commit into from

Conversation

miku
Copy link

@miku miku commented Apr 12, 2024

Note: This maybe made obsolete by #8.

This add semantic_transforms.OllamaEmbeddings, which allows to calculate embeddings locally using ollama (https://ollama.com/), following the api from OpenAIEmbeddings. Currently, ollama does not support batching (but it is on their roadmap, cf. https://ollama.com/blog/embedding-models).

The LocalSemanticIngestionPipeline shows how it can be used.

To test locally, install ollama, then pull an embeddings model, such as https://ollama.com/library/mxbai-embed-large, then:

from openparse import processing, DocumentParser
semantic_pipeline = processing.LocalSemanticIngestionPipeline(
    url="http://localhost:11434",
    model="mxbai-embed-large",
)
parser = DocumentParser(
        processing_pipeline=semantic_pipeline,
)
parsed = parser.parse("path/to/file.pdf")

This add semantic_transforms.OllamaEmbeddings, which allows to calculate
embeddings locally using ollama (https://ollama.com/), following the api
from OpenAIEmbeddings. Currently, ollama does not support batching (but
it is on their roadmap, cf. https://ollama.com/blog/embedding-models).

The LocalSemanticIngestionPipeline shows how it can be used.

To test locally, install ollama, then pull an embeddings model, such as
https://ollama.com/library/mxbai-embed-large, then:

    from openparse import processing, DocumentParser
    semantic_pipeline = processing.LocalSemanticIngestionPipeline(
        url="http://localhost:11434",
        model="mxbai-embed-large",
    )
    parser = DocumentParser(
            processing_pipeline=semantic_pipeline,
    )
    parsed = parser.parse("path/to/file.pdf")
@Filimoa
Copy link
Owner

Filimoa commented Apr 12, 2024

Thanks for taking the time to create this!

We'll be integrating embedding modules in the next few days which should enable people to use a ton of different embedding providers in a single interface (choosing which ones they install).

You can track progress in PR #23

@miku
Copy link
Author

miku commented Apr 19, 2024

Thanks for your work on open-parse - closing this in favor of #23.

@miku miku closed this Apr 19, 2024
@Kydlaw Kydlaw mentioned this pull request Apr 24, 2024
@Bruce337f
Copy link

Bruce337f commented May 9, 2024

Any updates here please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants