server : fix templates for llama2, llama3 and zephyr in new UI #8196
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change makes some adjustments to the pre-defined chat templates in the new server UI, which in my interpretation bring them in line to the recommended versions. I have done this for the following templates:
as these are the models I have some experience with. There might be further similar discrepancies for other templates, but I have not checked those. For the Llama models, I have also removed the start-of-text tokens at the beginning, as they are automatically added by the server, and their duplication leads to a warning message.
It would of course be nicer to connect this to the
llama_chat_apply_template()
implementation to have only one set of templates to maintain and test in the codebase, for example by making the server UI use the chat endpoint rather than the completion one. Is anyone already working on this?