Ollama integration #30

Kydlaw · 2024-04-23T21:24:04Z

Description

Description of the feature: Provide the ability to use a Ollama in SemanticIngestionPipeline (currently it only supports proprietary models).

This way it would be possible to use the semantic parsing without spending money on a proprietary model.

Why the feature should be added to openparse (as opposed to another library or just implemented in your code):
The interface already exists in this library (similar feature) and I'm not aware of a straightforward way to go around the existing code to inject this feature into the current openparse.

I can contribute this feature if this interest you.

The text was updated successfully, but these errors were encountered:

Filimoa · 2024-04-23T22:42:27Z

This is a duplicate of #8. You can track progress in #23 - the main difficulty of doing this is we currently use a hard coded similarity that works well for OpenAI's models. But each embedding models will have it's own optimal cutoff. There's a couple approaches of dealing with this:

1. Start using a percentile cutoff.

This is the approach that llang-chain and llama-index use. In my limited testing, finding the optimal cutoff is still not trivial and I found it to perform worse than a hard coded approach.

We could offload choosing this to the user, but the library aims to have opinionated defaults.

2. Figure out cutoff dynamically

We would generate examples of text that should / shouldn't be combined and use this to figure out a similarity threshold.

similar_pairs = [("very similar text", "continuation"), ...]

similarities = []
for text1, text2 in similar_pairs:
    sim = get_similarity(text1, text2)
    similarities.append(sim)

avg_cutoff = ...

While this is kind of dirty, this is the approach I'm currently leaning to.

Kydlaw · 2024-04-24T10:57:46Z

I apologize for the duplicate (didn't see the links to #21 and #23... in #8)

Ok, I see and understand the problem. It is indeed very hard to provide good defaults on that.
I'll have a look at your progress in #23 and see if I can maybe suggest something.

I'm closing this issue as it doesn't provide anything useful.

Kydlaw closed this as completed Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ollama integration #30

Ollama integration #30

Kydlaw commented Apr 23, 2024

Filimoa commented Apr 23, 2024

Kydlaw commented Apr 24, 2024

Ollama integration #30

Ollama integration #30

Comments

Kydlaw commented Apr 23, 2024

Description

Filimoa commented Apr 23, 2024

1. Start using a percentile cutoff.

2. Figure out cutoff dynamically

Kydlaw commented Apr 24, 2024