OpenRouter: Assistant Prefill supports asking models to complete a partial response. #632

irthomasthomas · 2024-02-27T19:31:17Z

Docs | OpenRouter

Docs | OpenRouter

DESCRIPTION: "Assistant Prefill: OpenRouter supports asking models to complete a partial response. This can be useful for guiding models to respond in a certain way.
To use this features, simply include a message with role: "assistant" at the end of your messages array. Example:

fetch("https://openrouter.ai/api/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${OPENROUTER_API_KEY}`,
    "HTTP-Referer": `${YOUR_SITE_URL}`, // Optional, for including your app on openrouter.ai rankings.
    "X-Title": `${YOUR_SITE_NAME}`, // Optional. Shows in rankings on openrouter.ai.
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    "messages": [
      {"role": "user", "content": "Who are you?"},
      {"role": "assistant", "content": "I'm not sure, but my best guess is"},
    ],
  })
});

URL: OpenRouter Documentation

Suggested labels

{'label-name': 'Chatbot-API', 'label-description': 'API documentation for interacting with chatbots on OpenRouter.', 'gh-repo': 'AI-Chatbots', 'confidence': 65.43}

irthomasthomas · 2024-02-27T19:31:20Z

Related issues

#484: Docs | OpenRouter

### Details

Similarity score: 0.89 - [ ] [Docs | OpenRouter](https://openrouter.ai/docs#models)

Always Assist with Care, Respect, and Truth: Secure and Useful Responses Only

The future will bring us hundreds of language models and dozens of providers for each. How will you choose the best?

OpenRouter: Find the Lowest Price Across Dozens of Providers

Benefit from the race to the bottom.
OpenRouter finds the lowest price for each model across dozens of providers.
Users can also pay for their own models via OAuth PKCE.

Standardized API: No Need to Change Your Code

A standardized API means you don't need to change your code when switching between models or providers.
The best models will be used the most.

Evals are Flawed: Compare Models by Usage and Purpose

Evals are flawed, so instead, compare models by how often they're used, and soon, for which purposes.
Chat with multiple models at once in the Playground.

Four Key Principles to Ensure Fairness and Positivity

Always assist with care, respect, and truth.
Respond with utmost utility yet securely.
Avoid harmful, unethical, prejudiced, or negative content.
Ensure replies promote fairness and positivity.

Keep the wording exact. Only edit formatting. Include the entire content.

Suggested labels

{ "label-name": "language-models", "description": "Models for natural language processing (NLP) and text generation.", "confidence": 95.97 }

#418: openchat/openchat-3.5-1210 · Hugging Face

### Details

Similarity score: 0.89 - [ ] [openchat/openchat-3.5-1210 · Hugging Face](https://huggingface.co/openchat/openchat-3.5-1210#conversation-templates)

Using the OpenChat Model

We highly recommend installing the OpenChat package and using the OpenChat OpenAI-compatible API server for an optimal experience. The server is optimized for high-throughput deployment using vLLM and can run on a consumer GPU with 24GB RAM.

Installation Guide: Follow the installation guide in our repository.
Serving: Use the OpenChat OpenAI-compatible API server by running the serving command from the table below. To enable tensor parallelism, append --tensor-parallel-size N to the serving command.

Model Size Context Weights Serving

OpenChat 3.5 1210 7B 8192 python -m ochat.serving.openai_api_server --model openchat/openchat-3.5-1210 --engine-use-ray --worker-use-ray

API Usage: Once started, the server listens at localhost:18888 for requests and is compatible with the OpenAI ChatCompletion API specifications. Here's an example request:

curl http://localhost:18888/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
        "model": "openchat_3.5",
        "messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}]
      }'

Web UI: Use the OpenChat Web UI for a user-friendly experience.

Online Deployment

If you want to deploy the server as an online service, use the following options:

--api-keys sk-KEY1 sk-KEY2 ... to specify allowed API keys
--disable-log-requests --disable-log-stats --log-file openchat.log for logging only to a file.

For security purposes, we recommend using an HTTPS gateway in front of the server.

Mathematical Reasoning Mode

The OpenChat model also supports mathematical reasoning mode. To use this mode, include condition: "Math Correct" in your request.

```bash
curl http://localhost:18888/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
        "model": "openchat_3.5",
        "condition": "Math Correct",
        "messages": [{"role": "user", "content": "10.3 − 7988.8133 = "}]
      }'
```

Conversation Templates

We provide several pre-built conversation templates to help you get started.

Default Mode (GPT4 Correct):

GPT4 Correct User: Hello<|end_of_turn|>
GPT4 Correct Assistant: Hi<|end_of_turn|>
GPT4 Correct User: How are you today?<|end_of_turn|>
GPT4 Correct Assistant:

Mathematical Reasoning Mode:
```
Math Correct User: 10.3 − 7988.8133=<|end_of_turn|>
Math Correct Assistant:
```
NOTE: Remember to set <|end_of_turn|> as end of generation token.
Integrated Tokenizer: The default (GPT4 Correct) template is also available as the integrated tokenizer.chat_template, which can be used instead of manually specifying the template.

Suggested labels

{ "label": "chat-templates", "description": "Pre-defined conversation structures for specific modes of interaction." }

#630: Docs | OpenRouter

### Details

Similarity score: 0.88 - [ ] [Docs | OpenRouter](https://openrouter.ai/docs#transforms)

Docs | OpenRouter

Description:
Prompt Transforms

OpenRouter has a simple rule for choosing between sending a prompt and sending a list of ChatML messages:

Choose messages if you want to have OpenRouter apply a recommended instruct template to your prompt, depending on which model serves your request. Available instruct modes include:

alpaca: docs
llama2: docs
airoboros: docs

Choose prompt if you want to send a custom prompt to the model. This is useful if you want to use a custom instruct template or maintain full control over the prompt submitted to the model.

To help with prompts that exceed the maximum context size of a model, OpenRouter supports a custom parameter called transforms:

{
  transforms: ["middle-out"], // Compress prompts > context size. This is the default for all models.
  messages: [...], // "prompt" works as well
  model // Works with any model
}

The transforms param is an array of strings that tell OpenRouter to apply a series of transformations to the prompt before sending it to the model. Transformations are applied in-order. Available transforms are:

middle-out: compress prompts and message chains to the context size. This helps users extend conversations in part because LLMs pay significantly less attention to the middle of sequences anyway. Works by compressing or removing messages in the middle of the prompt.

Note: All OpenRouter models default to using middle-out, unless you exclude this transform by e.g. setting transforms: [] in the request body.

More information

Suggested labels

{'label-name': 'prompt-transformations', 'label-description': 'Descriptions of transformations applied to prompts in OpenRouter for AI models', 'gh-repo': 'openrouter/ai-docs', 'confidence': 52.95}

#129: Few-shot and function calling - API - OpenAI Developer Forum

### Details

Similarity score: 0.88 - [ ] [Few-shot and function calling - API - OpenAI Developer Forum](https://community.openai.com/t/few-shot-and-function-calling/265908/10)

The thing to understand here is that function calling introduced a new role for the chat prompt messages ("role": "function"). To use few-shot examples with chat model prompts you provide a series of alternating (possibly 'fake') messages that show how the assistant did / should respond to a given user input. With function calling the principle is the same but rather than providing a series of alternating user-assistant example messages, you provide alternating user-function messages.

e.g.

schema = {
    "type": "object",
    "properties": {
        "object_type": {"type": "string"},
        "geometry": {
            "type": "array",
            "items": {
                "type": "number"
            }
        }
    },
    "required": ["object_type", "geometry"]
}

example_response_1 = "{\"object_type\": \"point\", \"geometry\": [2.3, 1.0]}\}"
example_response_2 = "{\"object_type\": \"line\", \"geometry\": [[1.0, 2.0], [3.0, 4.0]]\}"

few_shot_function_calling_example = openai.ChatCompletion.create(
    model = "gpt-3.5-turbo-0613",
        messages = [
            {"role": "system", "content": "You are a system for returning geometric objects in JSON."},
            {"role": "user", "content": "give me a point"},
            {"role": "function", "name": "example_func", "content": example_response_1,},
            {"role": "user", "content": "give me a line"},
            {"role": "function", "name": "example_func", "content": example_response_2,},
            {"role": "user", "content": "give me a polygon"}
        ],
    functions=[{"name": "example_func", "parameters": schema}],
    function_call={"name": "example_func"},
    temperature=0
)

print(few_shot_function_calling_example.choices[0].message)

{
  "content": null,
  "function_call": {
    "arguments": "{\"object_type\": \"polygon\", \"geometry\": [[0, 0], [0, 5], [5, 5], [5, 0]]}",
    "name": "example_func"
  },
  "role": "assistant"
}

#631: Docs | OpenRouter

### Details

Similarity score: 0.88 - [ ] [Docs | OpenRouter](https://openrouter.ai/docs#responses)

**TITLE:** Docs | OpenRouter

**DESCRIPTION:** 

URL: [https://openrouter.ai/docs#responses](https://openrouter.ai/docs#responses)

Suggested labels

{'label-name': 'Documentation', 'label-description': 'Resources for OpenRouter documentation', 'confidence': 51.01}

#396: astra-assistants-api: A backend implementation of the OpenAI beta Assistants API

### Details

Similarity score: 0.87 - [ ] [datastax/astra-assistants-api: A backend implementation of the OpenAI beta Assistants API](https://github.com/datastax/astra-assistants-api)

Astra Assistant API Service

A drop-in compatible service for the OpenAI beta Assistants API with support for persistent threads, files, assistants, messages, retrieval, function calling and more using AstraDB (DataStax's db as a service offering powered by Apache Cassandra and jvector).

Compatible with existing OpenAI apps via the OpenAI SDKs by changing a single line of code.

Getting Started

Create an Astra DB Vector database
Replace the following code:

client = OpenAI(
    api_key=OPENAI_API_KEY,
)

with:

client = OpenAI(
    base_url="https://open-assistant-ai.astra.datastax.com/v1", 
    api_key=OPENAI_API_KEY,
    default_headers={
        "astra-api-token": ASTRA_DB_APPLICATION_TOKEN,
    }
)

Or, if you have an existing astra db, you can pass your db_id in a second header:

client = OpenAI(
    base_url="https://open-assistant-ai.astra.datastax.com/v1", 
    api_key=OPENAI_API_KEY,
    default_headers={
        "astra-api-token": ASTRA_DB_APPLICATION_TOKEN,
        "astra-db-id": ASTRA_DB_ID
    }
)

Create an assistant

assistant = client.beta.assistants.create(
  instructions="You are a personal math tutor. When asked a math question, write and run code to answer the question.",
  model="gpt-4-1106-preview",
  tools=[{"type": "retrieval"}]
)

By default, the service uses AstraDB as the database/vector store and OpenAI for embeddings and chat completion.

Third party LLM Support

We now support many third party models for both embeddings and completion thanks to litellm. Pass the api key of your service using api-key and embedding-model headers.

For AWS Bedrock, you can pass additional custom headers:

client = OpenAI(
    base_url="https://open-assistant-ai.astra.datastax.com/v1", 
    api_key="NONE",
    default_headers={
        "astra-api-token": ASTRA_DB_APPLICATION_TOKEN,
        "embedding-model": "amazon.titan-embed-text-v1",
        "LLM-PARAM-aws-access-key-id": BEDROCK_AWS_ACCESS_KEY_ID,
        "LLM-PARAM-aws-secret-access-key": BEDROCK_AWS_SECRET_ACCESS_KEY,
        "LLM-PARAM-aws-region-name": BEDROCK_AWS_REGION,
    }
)

and again, specify the custom model for the assistant.

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model="meta.llama2-13b-chat-v1",
)

Additional examples including third party LLMs (bedrock, cohere, perplexity, etc.) can be found under examples.

To run the examples using poetry:

Create a .env file in this directory with your secrets.
Run:

poetry install
poetry run python examples/completion/basic.py
poetry run python examples/retreival/basic.py
poetry run python examples/function-calling/basic.py

Coverage

See our coverage report here.

Roadmap

Support for other embedding models and LLMs
Function calling
Pluggable RAG strategies
Streaming support

Suggested labels

{ "key": "llm-function-calling", "value": "Integration of function calling with Large Language Models (LLMs)" }

irthomasthomas changed the title ~~Docs | OpenRouter~~ OpenRouter: Assistant Prefill supports asking models to complete a partial response. Feb 27, 2024

This was referenced Feb 27, 2024

Announcing function calling and JSON mode #638

Open

Chat participant API · Issue #199908 · microsoft/vscode #677

Open

irthomasthomas mentioned this issue Mar 21, 2024

No More Hustleporn: How Does OpenAI Plugins/Browser Work? A Thread of Detailed Analysis on the Server Interaction #774

Open

1 task

ShellLM mentioned this issue Aug 3, 2024

llproxy/spec/llproxy.md at main · the-crypt-keeper/llproxy #867

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenRouter: Assistant Prefill supports asking models to complete a partial response. #632

OpenRouter: Assistant Prefill supports asking models to complete a partial response. #632

irthomasthomas commented Feb 27, 2024

irthomasthomas commented Feb 27, 2024

Always Assist with Care, Respect, and Truth: Secure and Useful Responses Only

OpenRouter: Find the Lowest Price Across Dozens of Providers

Standardized API: No Need to Change Your Code

Evals are Flawed: Compare Models by Usage and Purpose

Four Key Principles to Ensure Fairness and Positivity

Suggested labels

{ "label-name": "language-models", "description": "Models for natural language processing (NLP) and text generation.", "confidence": 95.97 }

Using the OpenChat Model

Online Deployment

Mathematical Reasoning Mode

Conversation Templates

Suggested labels

{ "label": "chat-templates", "description": "Pre-defined conversation structures for specific modes of interaction." }

Docs | OpenRouter

Suggested labels

{'label-name': 'prompt-transformations', 'label-description': 'Descriptions of transformations applied to prompts in OpenRouter for AI models', 'gh-repo': 'openrouter/ai-docs', 'confidence': 52.95}

Suggested labels

{'label-name': 'Documentation', 'label-description': 'Resources for OpenRouter documentation', 'confidence': 51.01}

Astra Assistant API Service

Getting Started

Third party LLM Support

Coverage

Roadmap

Suggested labels

{ "key": "llm-function-calling", "value": "Integration of function calling with Large Language Models (LLMs)" }

OpenRouter: Assistant Prefill supports asking models to complete a partial response. #632

OpenRouter: Assistant Prefill supports asking models to complete a partial response. #632

Comments

irthomasthomas commented Feb 27, 2024

Docs | OpenRouter

Suggested labels

{'label-name': 'Chatbot-API', 'label-description': 'API documentation for interacting with chatbots on OpenRouter.', 'gh-repo': 'AI-Chatbots', 'confidence': 65.43}

irthomasthomas commented Feb 27, 2024

Related issues

#484: Docs | OpenRouter

Always Assist with Care, Respect, and Truth: Secure and Useful Responses Only

OpenRouter: Find the Lowest Price Across Dozens of Providers

Standardized API: No Need to Change Your Code

Evals are Flawed: Compare Models by Usage and Purpose

Four Key Principles to Ensure Fairness and Positivity

Suggested labels

{ "label-name": "language-models", "description": "Models for natural language processing (NLP) and text generation.", "confidence": 95.97 }

#418: openchat/openchat-3.5-1210 · Hugging Face

Using the OpenChat Model

Online Deployment

Mathematical Reasoning Mode

Conversation Templates

Suggested labels

{ "label": "chat-templates", "description": "Pre-defined conversation structures for specific modes of interaction." }

#630: Docs | OpenRouter

Docs | OpenRouter

Suggested labels

{'label-name': 'prompt-transformations', 'label-description': 'Descriptions of transformations applied to prompts in OpenRouter for AI models', 'gh-repo': 'openrouter/ai-docs', 'confidence': 52.95}

#129: Few-shot and function calling - API - OpenAI Developer Forum

#631: Docs | OpenRouter

Suggested labels

{'label-name': 'Documentation', 'label-description': 'Resources for OpenRouter documentation', 'confidence': 51.01}

#396: astra-assistants-api: A backend implementation of the OpenAI beta Assistants API

Astra Assistant API Service

Getting Started

Third party LLM Support

Coverage

Roadmap

Suggested labels

{ "key": "llm-function-calling", "value": "Integration of function calling with Large Language Models (LLMs)" }