-
Notifications
You must be signed in to change notification settings - Fork 7.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement /v1/chat/completions endpoint for CPU mode #1979
Conversation
a48b37b
to
1ee4290
Compare
Signed-off-by: Johannes Plötner <johannes.w.m.ploetner@gmail.com>
1ee4290
to
7a2ed6a
Compare
# format system message and conversation history correctly | ||
formatted_messages = "" | ||
for message in request.messages: | ||
formatted_messages += f"<|im_start|>{message.role}\n{message.content}<|im_end|>\n" | ||
|
||
# the LLM will complete the response of the assistant | ||
formatted_messages += "<|im_start|>assistant\n" | ||
response = model.generate( | ||
prompt=formatted_messages, | ||
temp=request.temperature | ||
) | ||
|
||
# the LLM may continue to hallucinate the conversation, but we want only the first response | ||
# so, cut off everything after first <|im_end|> | ||
index = response.find("<|im_end|>") | ||
response_content = response[:index].strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is ChatML hard-coded here? Normally GPT4All models have a customizable prompt template that defaults to the value in models2.json.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done tue to the very specifics of OpenAI's API. OpenAI's ChatCompletions endpoint can receive an array of messages containing the system message, and past user questions/statements and the assistants replies (see https://platform.openai.com/docs/api-reference/chat/create and https://cookbook.openai.com/examples/how_to_format_inputs_to_chatgpt_models):
curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Knock knock."},
{"role": "assistant", "content": "Who's there?"},
{"role": "user", "content": "Orange."},
]
}'
Goal was here to parse the past messages and to provide the LLM with the complete history. From the above example, I am creating a result like this:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Knock knock.<|im_end|>
<|im_start|>assistant
Who's there?<|im_end|>
<|im_start|>user
Orange.<|im_end|>
<|im_start|>assistant
IMHO we would a different representation of prompt templates in model2.json to be able to reliably parse and use those for this specific use case... Until then, I decided to hardcode ChatML.
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> Signed-off-by: johannesploetner <52075191+johannesploetner@users.noreply.github.com>
Describe your changes
The
/v1/chat/completions
endpoint was not implemented in gpt4all-api (only returned an "Echo" of the original message, as mentioned in #1700). This PR implements the endpoint for CPU mode and adds an appropriate test.Issue ticket number and link
#1700
Checklist before requesting a review
Demo
try the openai.ChatCompletion.create() function (as described here: https://cookbook.openai.com/examples/how_to_format_inputs_to_chatgpt_models)
Before the PR, we would just get an "Echo: " of the last message. After the PR, the we actually get a result.
Steps to Reproduce
test_chat_completion()
in gpt4all_api/app/tests/test_endpoints.pyNotes