-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Server: Don't ignore llama.cpp params #8754
Conversation
I don't understand why #4668 is linked. That PR changed the default values used, but it did not change the logic for how and when these values are set, right? The temperature was already being set to a hard-coded value, just a different one. In any case, I think the correct way to fix this would be to still use the same default parameters as OAI but to override them with CLI arguments if the user sets them. But I'm not sure whether that can be easily implemented with the current structure. |
Yes, but it added that comment which probably made those default OAI values stay when changing how params are handled in the server.
The default server params are set here, and they are correctly overridden by cli arg: llama.cpp/examples/server/public/index.html Lines 301 to 323 in 439b3fc
On API request they are being used if no value is provided in the request: llama.cpp/examples/server/server.cpp Lines 901 to 924 in 439b3fc
The problem was that the lines I removed were putting default values in the json data, I don't understand why. |
How about this: create two copies of |
Is there any reason to use the OAI default values? Presumably these are values are tuned for their models, which have nothing to do with the models llama.cpp can use. |
Good point. Also I'm noticing that the current values on master aren't actually the ones described in the linked documentation. The documentation says that the default temperature value is 0, but here it is 1. |
And regardless of what the documentation says, |
I think that without passing a value for |
Multi users OAI completions compatibility test is failing: It seems to work correctly on my end, n_tokens is respected (on v1/chat/completions endpoint). |
So, before it was using I added fallback for |
Good point. In fact, I believe that from the very first very of server, we just get the default of OAI because we wanted it to be compatible with OAI specs. We kept the same code since then. But it make sense to get rid of this. IMO I don't expect the temperature to be defaulted to 1.0. It's too random for small models. I guest it make sense on OAI because their models are large. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that the tests now succeed (on my local machine). And I agree that it doesn't make sense to use OAI sampling parameters for completely different models in the first place so the code should just be removed.
* Don't ignore llama.cpp params * Add fallback for max_tokens
I was wondering why I had high variance using server API after launching llama-server with --temp 0.2. I found out that llama.cpp server ignore that and is even lying in the /props endpoint (showing the temp that I had set).
I found out that this was because of this PR: #4668 which makes API endpoint use OAI default values instead of llama.cpp one, I don't understand why anyone would want that, if there is a problem with llama.cpp default parameters, those should be changed not ignored.