Skip to content

Commit

Permalink
update ollama performance mode (#2874)
Browse files Browse the repository at this point in the history
  • Loading branch information
timothycarambat authored Dec 18, 2024
1 parent af70342 commit a51de73
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 8 deletions.
12 changes: 8 additions & 4 deletions frontend/src/components/LLMSelection/OllamaLLMOptions/index.jsx
Original file line number Diff line number Diff line change
Expand Up @@ -169,18 +169,22 @@ export default function OllamaLLMOptions({ settings }) {
className="tooltip !text-xs max-w-xs"
>
<p className="text-red-500">
<strong>Note:</strong> Only change this setting if you
understand its implications on performance and resource usage.
<strong>Note:</strong> Be careful with the Maximum mode. It may
increase resource usage significantly.
</p>
<br />
<p>
<strong>Base:</strong> Ollama automatically limits the context
to 2048 tokens, reducing VRAM usage. Suitable for most users.
to 2048 tokens, keeping resources usage low while maintaining
good performance. Suitable for most users and models.
</p>
<br />
<p>
<strong>Maximum:</strong> Uses the full context window (up to
Max Tokens). May increase VRAM usage significantly.
Max Tokens). Will result in increased resource usage but allows
for larger context conversations. <br />
<br />
This is not recommended for most users.
</p>
</Tooltip>
</div>
Expand Down
15 changes: 11 additions & 4 deletions server/utils/AiProviders/ollama/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,13 @@ class OllamaAILLM {
this.client = new Ollama({ host: this.basePath });
this.embedder = embedder ?? new NativeEmbedder();
this.defaultTemp = 0.7;
this.#log(
`OllamaAILLM initialized with\nmodel: ${this.model}\nperf: ${this.performanceMode}\nn_ctx: ${this.promptWindowLimit()}`
);
}

#log(text, ...args) {
console.log(`\x1b[32m[Ollama]\x1b[0m ${text}`, ...args);
}

#appendContext(contextTexts = []) {
Expand Down Expand Up @@ -131,11 +138,11 @@ class OllamaAILLM {
keep_alive: this.keepAlive,
options: {
temperature,
useMLock: true,
use_mlock: true,
// There are currently only two performance settings so if its not "base" - its max context.
...(this.performanceMode === "base"
? {}
: { numCtx: this.promptWindowLimit() }),
: { num_ctx: this.promptWindowLimit() }),
},
})
.then((res) => {
Expand Down Expand Up @@ -179,11 +186,11 @@ class OllamaAILLM {
keep_alive: this.keepAlive,
options: {
temperature,
useMLock: true,
use_mlock: false,
// There are currently only two performance settings so if its not "base" - its max context.
...(this.performanceMode === "base"
? {}
: { numCtx: this.promptWindowLimit() }),
: { num_ctx: this.promptWindowLimit() }),
},
}),
messages,
Expand Down

2 comments on commit a51de73

@lewismacnow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timothycarambat just wondering why mlock should be false?

One impact to this is, if using the same model for both Agent & regular chat on Ollama, is this will cause the model to unload and re-load because this parameter is different.

My vote would be for mlock to be consistant, off everywhere or on everywhere, to prevent unloading.

@timothycarambat
Copy link
Member Author

@timothycarambat timothycarambat commented on a51de73 Dec 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this was a leftover mistake during testing - will patch now
4b2bb52

Please sign in to comment.