Configuration

Overview

LSP-AI is configured by passing initializationOptions to the language server at startup. There are three configurable keys:

memory
models
chat
completion

Memory

LSP-AI keeps track of the text in all opened files during the editor session and builds prompts using this text. The memory key configures the method LSP-AI uses to track text and build prompts. Currently an empty file_store object is only one valid option for memory:

{
  "memory": {
    "file_store": {}
  }
}

There will soon be more options that allow for the usage of vector storage backends such as PostgresML. This will enable more powerful context building for prompts, and even future features like semantic search over the codebase.

Models

At server initialization, LSP-AI configures models per the models key specification. These models are then used during textDocument/completion, textDocument/generation and textDocument/codeAction requests.

There are currently four different types of configurable models:

llama.cpp models
Ollama models
OpenAI API compatible models
Anthropic API compatible models
Mistral AI API compatible models

The type of model is specified by setting the type parameter.

llama.cpp

LSP-AI binds directly to the llama.cpp library and runs LLMs locally.

{
  "models": {
    "model1": {
      "type": "llama_cpp",
      "repository": "stabilityai/stable-code-3b",
      "name": "stable-code-3b-Q5_K_M.gguf",
      "n_ctx": 2048,
      "n_gpu_layers": 1000
    }
  }
}

Parameters:

repository the HuggingFace repository the model is located in
name the name of the model file
file_path the path to a gguf file to use (either provide file_path or repository and name)
n_ctx the maximum number of tokens the model can process at once
n_gpu_layers the number of layers to offload onto the GPU
max_requests_per_second rate limits requests (some editors like to send a lot of completion requests)

Ollama Models

LSP-AI uses the Ollama API.

{
  "models": {
    "model1": {
      "type": "ollama",
      "model": "deepseek-coder"
    }
  }
}

Parameters:

model the model to use
chat_endpoint the chat endpoint to use. Defaults to http://localhost:11434/api/chat
generate_endpoint the generate endpoint to use. Defaults to http://localhost:11434/api/generate
max_requests_per_second rate limits requests (some editors like to send a lot of completion requests)

OpenAI Compatible APIs

LSP-AI works with any OpenAI compatible API. This means LSP-AI will work with OpenAI and any model hosted behind a compatible API. We recommend considering Groq, OpenRouter, or Fireworks AI for hosted model inference though we are sure there are other good providers out there.

Using an API provider means parts of your code may be sent to the provider in the form of a prompt. If you do not want to potentially expose your code to 3rd parties we recommend using the llama.cpp backend

{
  "models": {
    "model1": {
      "type": "open_ai",
      "chat_endpoint": "https://api.groq.com/openai/v1/chat/completions",
      "model": "llama3-70b-8192",
      "auth_token_env_var_name": "GROQ_API_KEY"
    }
  }
}

Parameters:

completions_endpoint is the endpoint for text completion
chat_endpoint is the chat endpoint to use
model specifies which model to use
auth_token_env_var_name is the environment variable name to get the authentication token from. See auth_token for more authentication options
auth_token is the authentication token to use. This can be used in place of auth_token_env_var_name
max_requests_per_second rate limits requests (some editors like to send a lot of completion requests)

Anthropic Compatible APIs

LSP-AI works with any Anthropic compatible API. This means LSP-AI will work with Anthropic and any model hosted behind a compatible API.

Using an API provider means parts of your code may be sent to the provider in the form of a LLM prompt. If you do not want to potentially expose your code to 3rd parties we recommend using the llama.cpp backend.

{
  "models": {
    "model1": {
      "type": "anthropic",
      "chat_endpoint": "https://api.anthropic.com/v1/messages",
      "model": "claude-3-haiku-20240307",
      "auth_token_env_var_name": "ANTHROPIC_API_KEY"
    }
  }
}

Parameters:

chat_endpoint is the chat endpoint to use
model specifies which model to use
auth_token_env_var_name is the environment variable name to get the authentication token from. See auth_token for more authentication options
auth_token is the authentication token to use. This can be used in place of auth_token_env_var_name
max_requests_per_second rate limits requests (some editors like to send a lot of completion requests)

Mistral FIM Compatible APIs

LSP-AI works with any Mistral AI FIM compatible API. This means LSP-AI will work with Mistral FIM API models and any other models that use the same FIM API.

Using an API provider means parts of your code may be sent to the provider in the form of a LLM prompt. If you do not want to potentially expose your code to 3rd parties we recommend using the llama.cpp backend.

{
  "models": {
    "model1": {
      "type": "mistral_fim",
      "fim_endpoint": "https://api.mistral.ai/v1/fim/completion",
      "model": "codestral-latest",
      "auth_token_env_var_name": "MISTRAL_API_KEY"
    }
  }
}

Parameters:

fim_endpoint is the endpoint for FIM
model specifies which model to use
auth_token_env_var_name is the environment variable name to get the authentication token from. See auth_token for more authentication options
auth_token is the authentication token to use. This can be used in place of auth_token_env_var_name
max_requests_per_second rate limits requests (some editors like to send a lot of completion requests)

In-Editor Chatting

LSP-AI brings support for chatting directly with models in your editor utilizing code actions.

To enable this provide the chat key in the initializationOptions. Ensure that the model(s) you specify are instruction tuned.

{
  "models": {
    "model1": {
      "type": "anthropic",
      "chat_endpoint": "https://api.anthropic.com/v1/messages",
      "model": "claude-3-5-sonnet-20240620",
      "auth_token_env_var_name": "ANTHROPIC_API_KEY"
    }
  },
  "chat": [
    {
      "trigger": "!C",
      "action_display_name": "Chat",
      "model": "model1",
      "parameters": {
        "max_context": 4096,
        "max_tokens": 1024,
        "system": "You are a code assistant chatbot. The user will ask you for assistance coding and you will do you best to answer succinctly and accurately"
      }
    },
    {
      "trigger": "!CC",
      "action_display_name": "Chat with context",
      "model": "model1",
      "parameters": {
        "max_context": 4096,
        "max_tokens": 1024,
        "system": "You are a code assistant chatbot. The user will ask you for assistance coding and you will do you best to answer succinctly and accurately given the code context:\n\n{CONTEXT}"
      }
    }
  ]
}

In this example we are initializing two new code actions that perform chat. One called "Chat" has a simple prompt that does not include any context from our code base. The second called "Chat with context" has a slightly more complex prompt that will include context from the code base.

Note that to actually have context in the "Chat with context" action you need to use a Memory Backend that provides context (not the FileStore backend).

See the In-Editor Chatting wiki page for more information on how in-editor chatting works.

Parameters:

trigger is a string required to be present to enable the chat code action. It can be left blank and the code action can be used anywhere
action_display_name is the text displayed in the menu by your editor
model is the model to use for chatting. Ensure this model is instruction tuned
parameters are the model a parameters. Refer to the specific model backend you are using for a detailed list of available parameters

Custom Actions

LSP-AI enables the creation of custom code actions. To enable this provide the actions key in the initializationOptions.

Here is an example of two custom actions. One that performs completion using chain of thought prompting, and one that performs refactoring using chain of thought prompting.

{
  "models": {
    "model1": {
      "type": "anthropic",
      "chat_endpoint": "https://api.anthropic.com/v1/messages",
      "model": "claude-3-5-sonnet-20240620",
      "auth_token_env_var_name": "ANTHROPIC_API_KEY"
    }
  },
  "actions": [
    {
      "action_display_name": "Complete",
      "model": "model1",
      "parameters": {
        "max_context": 4096,
        "max_tokens": 4096,
        "system": "You are an AI coding assistant. Your task is to complete code snippets. The user's cursor position is marked by \"<CURSOR>\". Follow these steps:\n\n1. Analyze the code context and the cursor position.\n2. Provide your chain of thought reasoning, wrapped in <reasoning> tags. Include thoughts about the cursor position, what needs to be completed, and any necessary formatting.\n3. Determine the appropriate code to complete the current thought, including finishing partial words or lines.\n4. Replace \"<CURSOR>\" with the necessary code, ensuring proper formatting and line breaks.\n5. Wrap your code solution in <answer> tags.\n\nYour response should always include both the reasoning and the answer. Pay special attention to completing partial words or lines before adding new lines of code.\n\n<examples>\n<example>\nUser input:\n--main.py--\n# A function that reads in user inpu<CURSOR>\n\nResponse:\n<reasoning>\n1. The cursor is positioned after \"inpu\" in a comment describing a function that reads user input.\n2. We need to complete the word \"input\" in the comment first.\n3. After completing the comment, we should add a new line before defining the function.\n4. The function should use Python's built-in `input()` function to read user input.\n5. We'll name the function descriptively and include a return statement.\n</reasoning>\n\n<answer>t\ndef read_user_input():\n    user_input = input(\"Enter your input: \")\n    return user_input\n</answer>\n</example>\n\n<example>\nUser input:\n--main.py--\ndef fibonacci(n):\n    if n <= 1:\n        return n\n    else:\n        re<CURSOR>\n\n\nResponse:\n<reasoning>\n1. The cursor is positioned after \"re\" in the 'else' clause of a recursive Fibonacci function.\n2. We need to complete the return statement for the recursive case.\n3. The \"re\" already present likely stands for \"return\", so we'll continue from there.\n4. The Fibonacci sequence is the sum of the two preceding numbers.\n5. We should return the sum of fibonacci(n-1) and fibonacci(n-2).\n</reasoning>\n\n<answer>turn fibonacci(n-1) + fibonacci(n-2)</answer>\n</example>\n</examples>",
        "messages": [
          {
            "role": "user",
            "content": "{CODE}"
          }
        ]
      },
      "post_process": {
        "extractor": "(?s)<answer>(.*?)</answer>"
      }
    },
    {
      "action_display_name": "Refactor",
      "model": "model1",
      "parameters": {
        "max_context": 4096,
        "max_tokens": 4096,
        "system": "You are an AI coding assistant specializing in code refactoring. Your task is to analyze the given code snippet and provide a refactored version. Follow these steps:\n\n1. Analyze the code context and structure.\n2. Identify areas for improvement, such as code efficiency, readability, or adherence to best practices.\n3. Provide your chain of thought reasoning, wrapped in <reasoning> tags. Include your analysis of the current code and explain your refactoring decisions.\n4. Rewrite the entire code snippet with your refactoring applied.\n5. Wrap your refactored code solution in <answer> tags.\n\nYour response should always include both the reasoning and the refactored code.\n\n<examples>\n<example>\nUser input:\ndef calculate_total(items):\n    total = 0\n    for item in items:\n        total = total + item['price'] * item['quantity']\n    return total\n\n\nResponse:\n<reasoning>\n1. The function calculates the total cost of items based on price and quantity.\n2. We can improve readability and efficiency by:\n   a. Using a more descriptive variable name for the total.\n   b. Utilizing the sum() function with a generator expression.\n   c. Using augmented assignment (+=) if we keep the for loop.\n3. We'll implement the sum() function approach for conciseness.\n4. We'll add a type hint for better code documentation.\n</reasoning>\n<answer>\nfrom typing import List, Dict\n\ndef calculate_total(items: List[Dict[str, float]]) -> float:\n    return sum(item['price'] * item['quantity'] for item in items)\n</answer>\n</example>\n\n<example>\nUser input:\ndef is_prime(n):\n    if n < 2:\n        return False\n    for i in range(2, n):\n        if n % i == 0:\n            return False\n    return True\n\n\nResponse:\n<reasoning>\n1. This function checks if a number is prime, but it's not efficient for large numbers.\n2. We can improve it by:\n   a. Adding an early return for 2, the only even prime number.\n   b. Checking only odd numbers up to the square root of n.\n   c. Using a more efficient range (start at 3, step by 2).\n3. We'll also add a type hint for better documentation.\n4. The refactored version will be more efficient for larger numbers.\n</reasoning>\n<answer>\nimport math\n\ndef is_prime(n: int) -> bool:\n    if n < 2:\n        return False\n    if n == 2:\n        return True\n    if n % 2 == 0:\n        return False\n    \n    for i in range(3, int(math.sqrt(n)) + 1, 2):\n        if n % i == 0:\n            return False\n    return True\n</answer>\n</example>\n</examples>",
        "messages": [
          {
            "role": "user",
            "content": "{SELECTED_TEXT}"
          }
        ]
      },
      "post_process": {
        "extractor": "(?s)<answer>(.*?)</answer>"
      }
    }
  ]
}

Notice that both of these prompts use chain of thought prompting. with extractors that remove the text from between <answer> and </answer>.

Also notice the use of {SELECTED_TEXT} in the Refactor code action. {SELECTED_TEXT} is a prompt key that is replaced with the user's selected text. More on prompting is available at the Prompting wiki page.

Note: If text is selected when calling an action, the output of the action will replace the currently selected text.

Parameters:

action_display_name is the text displayed in the menu by your editor
model is the model to use for for the action
parameters are the model's parameters. Refer to the specific model backend you are using for a detailed list of available parameters
post_process are the parameters for post processing. See the Post Processing section for more

Code Completion

Using code-completions is not recommended. It works well, but we highly recommend using actions from the above section. While it is easier to get started using completions, writing custom actions provides for a better, faster, and more powerful developer experience. You can write actions that perform code completion.

LSP-AI is a language server that provides support for completions. To use this feature, provide a completion key in the initializationOptions. You can disable completions by leaving out the completion key.

{
  "completion": {
    "model": "model1",
    "parameters": {}
  }
}

The model key specifies which model to use during a completion request. The value of model must be a key specified in the models key. Notice we specify the model as model1 which is the key we used in all examples above. The choice of model1 was arbitrary and can be any valid string.

The available keys in the parameters object depend on the type of model specified and which features you want enabled for the model.

Instruction

Instruction is enabled by supplying the messages key in the parameters object.

{
  "completion": {
    "model": "model1",
    "parameters": {
      "messages": [
        {
          "role": "system",
          "content": "Test"
        },
        {
          "role": "user",
          "content": "Test {CONTEXT} - {CODE}"
        }
      ],
      "max_context": 1024
    }
  }
}

Note that {CONTEXT} and {CODE} are replaced by LSP-AI with the context and code. These values are supplied by the memory backend. The file_store backend leaves the {CONTEXT} blank, and provides {CODE} around the cursor controlled by the max_context. To see the prompts being generated by LSP-AI enable debugging.

If the messages key is provided and you are using an OpenAI compatible API, be sure to provide the chat_endpoint and that the model is instruction tuned.

FIM

FIM is enabled by supplying the fim key in the parameters object.

{
  "completion": {
    "model": "model1",
    "parameters": {
      "fim": {
        "start": "<fim_prefix>",
        "middle": "<fim_suffix>",
        "end": "<fim_middle>"
      },
      "max_context": 1024
    }
  }
}

With the file_store backend, this will prepend start, insert middle at the cursor, and append end to the code around the cursor.

Note that Mistral AI FIM API compatible models don't require the fim parameters. Instead, when using mistral_fim models, FIM is enabled automatically and we do not do augment the code with special FIM tokens as the API itself presumably handles this.

Text Completion

If both messages and fim are omitted, the model performs text completion. Be sure to provide the completions_endpoint if using an OpenAI compatible API. In this case, the file_store backend will take max_context tokens before the cursor as the prompt.

Parameters:

model is the model to use for for completion
parameters are the model's parameters. Refer to the specific model backend you are using for a detailed list of available parameters
post_process are the parameters for post processing. See the Post Processing section for more

Post Processing

LSP-AI can do some post processing to the responses generated by LLMs. By default LSP-AI will remove duplicate characters from the start and end of the generated text. In other words, it will try to remove characters that you have already typed and the LLM echos back to you. This is primarily only useful when dealing with code completions.

This can be disabled by setting remove_duplicate_start and remove_duplicate_end to false. By default these are set to true.

{
  "completion": {
    "model": "model1",
    "parameters": {
      "max_context": 1024
    },
    "post_process": {
      "remove_duplicate_start": false,
      "remove_duplicate_end": false
    }
  }
}

The more powerful use case for post processing is using extractors. Extractors are regex that run over the returned output of the LLM and are used to extract matches. This is primarily useful when dealing with chain of thought prompting or some technique that causes the LLM to output more text than what we want in our editor. For example, we can tell the LLM to output its entire chain of thought but use an extractor to remove only the final answer so our editor is not filled with the entire output.

Here is an example action that completes code after the user's cursor with Anthropic using chain of thought prompting.

{
  "models": {
    "model1": {
      "type": "anthropic",
      "chat_endpoint": "https://api.anthropic.com/v1/messages",
      "model": "claude-3-5-sonnet-20240620",
      "auth_token_env_var_name": "ANTHROPIC_API_KEY"
    }
  },
  "actions": [
    {
      "action_display_name": "Complete",
      "model": "model1",
      "parameters": {
        "max_context": 4096,
        "max_tokens": 4096,
        "system": "You are an AI coding assistant. Your task is to complete code snippets. The user's cursor position is marked by \"<CURSOR>\". Follow these steps:\n\n1. Analyze the code context and the cursor position.\n2. Provide your chain of thought reasoning, wrapped in <reasoning> tags. Include thoughts about the cursor position, what needs to be completed, and any necessary formatting.\n3. Determine the appropriate code to complete the current thought, including finishing partial words or lines.\n4. Replace \"<CURSOR>\" with the necessary code, ensuring proper formatting and line breaks.\n5. Wrap your code solution in <answer> tags.\n\nYour response should always include both the reasoning and the answer. Pay special attention to completing partial words or lines before adding new lines of code.\n\n<examples>\n<example>\nUser input:\n--main.py--\n# A function that reads in user inpu<CURSOR>\n\nResponse:\n<reasoning>\n1. The cursor is positioned after \"inpu\" in a comment describing a function that reads user input.\n2. We need to complete the word \"input\" in the comment first.\n3. After completing the comment, we should add a new line before defining the function.\n4. The function should use Python's built-in `input()` function to read user input.\n5. We'll name the function descriptively and include a return statement.\n</reasoning>\n\n<answer>t\ndef read_user_input():\n    user_input = input(\"Enter your input: \")\n    return user_input\n</answer>\n</example>\n\n<example>\nUser input:\n--main.py--\ndef fibonacci(n):\n    if n <= 1:\n        return n\n    else:\n        re<CURSOR>\n\n\nResponse:\n<reasoning>\n1. The cursor is positioned after \"re\" in the 'else' clause of a recursive Fibonacci function.\n2. We need to complete the return statement for the recursive case.\n3. The \"re\" already present likely stands for \"return\", so we'll continue from there.\n4. The Fibonacci sequence is the sum of the two preceding numbers.\n5. We should return the sum of fibonacci(n-1) and fibonacci(n-2).\n</reasoning>\n\n<answer>turn fibonacci(n-1) + fibonacci(n-2)</answer>\n</example>\n</examples>",
        "messages": [
          {
            "role": "user",
            "content": "{CODE}"
          }
        ]
      },
      "post_process": {
        "extractor": "(?s)<answer>(.*?)</answer>"
      }
    }
  ]
}

Notice the system prompt tells the LLM to output its final answer between <answer> and </answer> and the extractor has regex that matches anything between those two tags.

Other Model Parameters

The parameters for models are dependent on the backend being used.

llama.cpp:

max_tokens restricts the number of tokens the model generates
chat_template the jinja template to use. Currently we use MiniJinja as the backend.
chat_format the chat template format to use. This is directly forwarded to llama.cpp's apply_chat_template function.

Ollama:

options passes additional options to the model. See Ollama docs for more info.
template - see Ollama docs
system - see Ollama docs
keep_alive - see Ollama docs

OpenAI:

max_tokens restricts the number of tokens to generate
top_p - see OpenAI docs
presence_penalty - see OpenAI docs
frequency_penalty - see OpenAI docs
temperature - see OpenAI docs

Anthropic:

system - see Anthropic system prompts
max_tokens restricts the number of tokens to generate
top_p - see Anthropic docs
temperature - see Anthropic docs

Mistral FIM:

max_tokens restricts the number of tokens to generate
min_tokens - the minimum number of tokens to generate
temperature - see Mistral AI docs
top_p - see Mistral AI docs
stop - see Mistral AI docs

Example Configurations

llama.cpp

In-Editor Chatting

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "llama_cpp",
      "repository": "QuantFactory/Meta-Llama-3-70B-Instruct-GGUF-v2",
      "name": "Meta-Llama-3-70B-Instruct-v2.Q5_K_M.gguf",
      "n_ctx": 6000
    }
  },
  "chat": [
    {
      "trigger": "!C",
      "action_display_name": "Chat",
      "model": "model1",
      "parameters": {
        "max_context": 4096,
        "max_tokens": 1024,
        "messages": [
          {
            "role": "system",
            "content": "You are a code assistant chatbot. The user will ask you for assistance coding and you will do you best to answer succinctly and accurately"
          }
        ]
      }
    }
  ]
}

FIM Code Completion

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "llama_cpp",
      "repository": "stabilityai/stable-code-3b",
      "name": "stable-code-3b-Q5_K_M.gguf",
      "n_ctx": 2048
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "fim": {
        "start": "<fim_prefix>",
        "middle": "<fim_suffix>",
        "end": "<fim_middle>"
      },
      "max_context": 2000,
      "max_tokens": 32
    }
  }
}

Instruction Code Completion

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "llama_cpp",
      "repository": "QuantFactory/Meta-Llama-3-70B-Instruct-GGUF-v2",
      "name": "Meta-Llama-3-70B-Instruct-v2.Q5_K_M.gguf",
      "n_ctx": 2048
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_context": 1800,
      "max_tokens": 32,
      "messages": [
        {
          "role": "system",
          "content": "Instructions:\n- You are an AI programming assistant.\n- Given a piece of code with the cursor location marked by \"<CURSOR>\", replace \"<CURSOR>\" with the correct code or comment.\n- First, think step-by-step.\n- Describe your plan for what to build in pseudocode, written out in great detail.\n- Then output the code replacing the \"<CURSOR>\"\n- Ensure that your completion fits within the language context of the provided code snippet (e.g., Python, JavaScript, Rust).\n\nRules:\n- Only respond with code or comments.\n- Only replace \"<CURSOR>\"; do not include any previously written code.\n- Never include \"<CURSOR>\" in your response\n- If the cursor is within a comment, complete the comment meaningfully.\n- Handle ambiguous cases by providing the most contextually appropriate completion.\n- Be consistent with your responses."
        },
        {
          "role": "user",
          "content": "def greet(name):\n    print(f\"Hello, {<CURSOR>}\")"
        },
        {
          "role": "assistant",
          "content": "name"
        },
        {
          "role": "user",
          "content": "function sum(a, b) {\n    return a + <CURSOR>;\n}"
        },
        {
          "role": "assistant",
          "content": "b"
        },
        {
          "role": "user",
          "content": "fn multiply(a: i32, b: i32) -> i32 {\n    a * <CURSOR>\n}"
        },
        {
          "role": "assistant",
          "content": "b"
        },
        {
          "role": "user",
          "content": "# <CURSOR>\ndef add(a, b):\n    return a + b"
        },
        {
          "role": "assistant",
          "content": "Adds two numbers"
        },
        {
          "role": "user",
          "content": "# This function checks if a number is even\n<CURSOR>"
        },
        {
          "role": "assistant",
          "content": "def is_even(n):\n    return n % 2 == 0"
        },
        {
          "role": "user",
          "content": "{CODE}"
        }
      ]
    }
  }
}

Code Completion

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "llama_cpp",
      "repository": "TheBloke/deepseek-coder-6.7B-instruct-GGUF",
      "name": "deepseek-coder-6.7b-instruct.Q5_K_S.gguf",
      "n_ctx": 2048
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_context": 2000,
      "max_tokens": 32
    }
  }
}

Ollama

In-Editor Chatting

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "ollama",
      "model": "llama3"
    }
  },
  "chat": [
    {
      "trigger": "!C",
      "action_display_name": "Chat",
      "model": "model1",
      "parameters": {
        "max_context": 4096,
        "max_tokens": 1024,
        "messages": [
          {
            "role": "system",
            "content": "You are a code assistant chatbot. The user will ask you for assistance coding and you will do you best to answer succinctly and accurately"
          }
        ]
      }
    }
  ]
}

FIM Code Completion

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "ollama",
      "model": "deepseek-coder"
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "fim": {
        "start": "<｜fim▁begin｜>",
        "middle": "<｜fim▁hole｜>",
        "end": "<｜fim▁end｜>"
      },
      "max_context": 2000,
      "options": {
        "num_predict": 32
      }
    }
  }
}

Instruction Code Completion

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "ollama",
      "model": "llama3"
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_context": 1800,
      "options": {
        "num_predict": 32
      },
      "system": "Instructions:\n- You are an AI programming assistant.\n- Given a piece of code with the cursor location marked by \"<CURSOR>\", replace \"<CURSOR>\" with the correct code or comment.\n- First, think step-by-step.\n- Describe your plan for what to build in pseudocode, written out in great detail.\n- Then output the code replacing the \"<CURSOR>\"\n- Ensure that your completion fits within the language context of the provided code snippet (e.g., Python, JavaScript, Rust).\n\nRules:\n- Only respond with code or comments.\n- Only replace \"<CURSOR>\"; do not include any previously written code.\n- Never include \"<CURSOR>\" in your response\n- If the cursor is within a comment, complete the comment meaningfully.\n- Handle ambiguous cases by providing the most contextually appropriate completion.\n- Be consistent with your responses.",
      "messages": [
        {
          "role": "user",
          "content": "def greet(name):\n    print(f\"Hello, {<CURSOR>}\")"
        },
        {
          "role": "assistant",
          "content": "name"
        },
        {
          "role": "user",
          "content": "function sum(a, b) {\n    return a + <CURSOR>;\n}"
        },
        {
          "role": "assistant",
          "content": "b"
        },
        {
          "role": "user",
          "content": "fn multiply(a: i32, b: i32) -> i32 {\n    a * <CURSOR>\n}"
        },
        {
          "role": "assistant",
          "content": "b"
        },
        {
          "role": "user",
          "content": "# <CURSOR>\ndef add(a, b):\n    return a + b"
        },
        {
          "role": "assistant",
          "content": "Adds two numbers"
        },
        {
          "role": "user",
          "content": "# This function checks if a number is even\n<CURSOR>"
        },
        {
          "role": "assistant",
          "content": "def is_even(n):\n    return n % 2 == 0"
        },
        {
          "role": "user",
          "content": "{CODE}"
        }
      ]
    }
  }
}

Code Completion

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "ollama",
      "model": "deepseek-coder"
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_context": 2000,
      "options": {
        "num_predict": 32
      }
    }
  }
}

OpenAI Compatible APIs

In-Editor Chatting

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "open_ai",
      "chat_endpoint": "https://api.openai.com/v1/chat/completions",
      "model": "gpt-4o",
      "auth_token_env_var_name": "OPENAI_API_KEY"
    }
  },
  "chat": [
    {
      "trigger": "!C",
      "action_display_name": "Chat",
      "model": "model1",
      "parameters": {
        "max_context": 4096,
        "max_tokens": 1024,
        "messages": [
          {
            "role": "system",
            "content": "You are a code assistant chatbot. The user will ask you for assistance coding and you will do you best to answer succinctly and accurately"
          }
        ]
      }
    }
  ]
}

Instruction Code Completion

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "open_ai",
      "chat_endpoint": "https://api.openai.com/v1/chat/completions",
      "model": "gpt-4o",
      "auth_token_env_var_name": "OPENAI_API_KEY"
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_context": 2048,
      "max_tokens": 128,
      "messages": [
        {
          "role": "system",
          "content": "Instructions:\n- You are an AI programming assistant.\n- Given a piece of code with the cursor location marked by \"<CURSOR>\", replace \"<CURSOR>\" with the correct code or comment.\n- First, think step-by-step.\n- Describe your plan for what to build in pseudocode, written out in great detail.\n- Then output the code replacing the \"<CURSOR>\"\n- Ensure that your completion fits within the language context of the provided code snippet (e.g., Python, JavaScript, Rust).\n\nRules:\n- Only respond with code or comments.\n- Only replace \"<CURSOR>\"; do not include any previously written code.\n- Never include \"<CURSOR>\" in your response\n- If the cursor is within a comment, complete the comment meaningfully.\n- Handle ambiguous cases by providing the most contextually appropriate completion.\n- Be consistent with your responses."
        },
        {
          "role": "user",
          "content": "def greet(name):\n    print(f\"Hello, {<CURSOR>}\")"
        },
        {
          "role": "assistant",
          "content": "name"
        },
        {
          "role": "user",
          "content": "function sum(a, b) {\n    return a + <CURSOR>;\n}"
        },
        {
          "role": "assistant",
          "content": "b"
        },
        {
          "role": "user",
          "content": "fn multiply(a: i32, b: i32) -> i32 {\n    a * <CURSOR>\n}"
        },
        {
          "role": "assistant",
          "content": "b"
        },
        {
          "role": "user",
          "content": "# <CURSOR>\ndef add(a, b):\n    return a + b"
        },
        {
          "role": "assistant",
          "content": "Adds two numbers"
        },
        {
          "role": "user",
          "content": "# This function checks if a number is even\n<CURSOR>"
        },
        {
          "role": "assistant",
          "content": "def is_even(n):\n    return n % 2 == 0"
        },
        {
          "role": "user",
          "content": "{CODE}"
        }
      ]
    }
  }
}

FIM Code Completion

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "open_ai",
      "completions_endpoint": "https://api.fireworks.ai/inference/v1/completions",
      "model": "accounts/fireworks/models/starcoder-16b",
      "auth_token_env_var_name": "FIREWORKS_API_KEY"
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_context": 2048,
      "max_tokens": 128,
      "fim": {
        "start": "<fim_prefix>",
        "middle": "<fim_middle>",
        "end": "<fim_suffix>"
      }
    }
  }
}

Code Completion

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "open_ai",
      "completions_endpoint": "https://api.fireworks.ai/inference/v1/completions",
      "model": "accounts/fireworks/models/starcoder-16b",
      "auth_token_env_var_name": "FIREWORKS_API_KEY"
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_context": 2048,
      "max_tokens": 128
    }
  }
}

Anthropic Compatible APIs

In-Editor Chatting

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "anthropic",
      "chat_endpoint": "https://api.anthropic.com/v1/messages",
      "model": "claude-3-haiku-20240307",
      "auth_token_env_var_name": "ANTHROPIC_API_KEY"
    }
  },
  "chat": [
    {
      "trigger": "!C",
      "action_display_name": "Chat",
      "model": "model1",
      "parameters": {
        "max_context": 4096,
        "max_tokens": 1024,
        "system": "You are a code assistant chatbot. The user will ask you for assistance coding and you will do you best to answer succinctly and accurately"
      }
    }
  ]
}

Instruction Code Completion

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "anthropic",
      "chat_endpoint": "https://api.anthropic.com/v1/messages",
      "model": "claude-3-haiku-20240307",
      "auth_token_env_var_name": "ANTHROPIC_API_KEY"
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_context": 2048,
      "max_tokens": 128,
      "system": "Instructions:\n- You are an AI programming assistant.\n- Given a piece of code with the cursor location marked by \"<CURSOR>\", replace \"<CURSOR>\" with the correct code or comment.\n- First, think step-by-step.\n- Describe your plan for what to build in pseudocode, written out in great detail.\n- Then output the code replacing the \"<CURSOR>\"\n- Ensure that your completion fits within the language context of the provided code snippet (e.g., Python, JavaScript, Rust).\n\nRules:\n- Only respond with code or comments.\n- Only replace \"<CURSOR>\"; do not include any previously written code.\n- Never include \"<CURSOR>\" in your response\n- If the cursor is within a comment, complete the comment meaningfully.\n- Handle ambiguous cases by providing the most contextually appropriate completion.\n- Be consistent with your responses.",
      "messages": [
        {
          "role": "user",
          "content": "def greet(name):\n    print(f\"Hello, {<CURSOR>}\")"
        },
        {
          "role": "assistant",
          "content": "name"
        },
        {
          "role": "user",
          "content": "function sum(a, b) {\n    return a + <CURSOR>;\n}"
        },
        {
          "role": "assistant",
          "content": "b"
        },
        {
          "role": "user",
          "content": "fn multiply(a: i32, b: i32) -> i32 {\n    a * <CURSOR>\n}"
        },
        {
          "role": "assistant",
          "content": "b"
        },
        {
          "role": "user",
          "content": "# <CURSOR>\ndef add(a, b):\n    return a + b"
        },
        {
          "role": "assistant",
          "content": "Adds two numbers"
        },
        {
          "role": "user",
          "content": "# This function checks if a number is even\n<CURSOR>"
        },
        {
          "role": "assistant",
          "content": "def is_even(n):\n    return n % 2 == 0"
        },
        {
          "role": "user",
          "content": "{CODE}"
        }
      ]
    }
  }
}

Mistral FIM Compatible APIs

FIM Code Completion

{
  "memory": {
    "file_store": {}
  },
  "models": {
    "model1": {
      "type": "mistral_fim",
      "fim_endpoint": "https://api.mistral.ai/v1/fim/completion",
      "model": "codestral-latest",
      "auth_token_env_var_name": "MISTRAL_API_KEY"
    }
  },
  "completion": {
    "model": "model1",
    "parameters": {
      "max_tokens": 64
    }
  }
}

Configuration

Overview

Memory

Models

llama.cpp

Ollama Models

OpenAI Compatible APIs

Anthropic Compatible APIs

Mistral FIM Compatible APIs

In-Editor Chatting

Custom Actions

Code Completion

Instruction

FIM

Text Completion

Post Processing

Other Model Parameters

Example Configurations

llama.cpp

In-Editor Chatting

FIM Code Completion

Instruction Code Completion

Code Completion

Ollama

In-Editor Chatting

FIM Code Completion

Instruction Code Completion

Code Completion

OpenAI Compatible APIs

In-Editor Chatting

Instruction Code Completion

FIM Code Completion

Code Completion

Anthropic Compatible APIs

In-Editor Chatting

Instruction Code Completion

Mistral FIM Compatible APIs

FIM Code Completion

Clone this wiki locally