-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: llama3.1 8B Context Size Max Tokens Ignored in Both Performance Modes #2442
Comments
Anything LLM v1.6.7 |
This is normal and expected. See this from Ollama: ollama/ollama#1005 (comment)
Any parameters passed into the API will override whatever is in a Modelfile in Ollama: So here, we would be passing in whatever value you have for |
Looking at the --ctx-size parameter in the shell - it is always 8K or 128K, never 32K. 128K setting is too large to execute 8K always truncates the data in context for incomplete results: 814339519 42877 42374 4004 0 31 0 35095332 14060 - S 0 ?? 0:01.61 /Applications/AnythingLLM.app/Contents/Resources/ollama/llm serve |
Any progress or status on this? Blocking development of the next version of the Anything LLM content pack in our marketplace. |
We're having the same issue. We'd also like to run Ollama with a mid-sized context (128k is too much, 8k is too little). |
The log
is the Look lower in the logs to see the real n_ctx to see if it is applied
|
How are you running AnythingLLM?
AnythingLLM desktop app
What happened?
When using "Base" as the "Performance Mode", the Max Tokens setting is ignored and Llama 3.1 is invoked with 8K context size. When setting Performance Mode to "Maximum", the Max Tokens settings is ignored and Llama 3.1 is invoked with 128K context size. Created a modelfile to enforce 32K context size but the result was 128K. Workspace was set to use the system defined LLM settings.
Are there known steps to reproduce?
See above
The text was updated successfully, but these errors were encountered: