Separate providers for inline completion #669

michaelchia · 2024-03-04T22:24:54Z

Problem

I would like to propose having a separate set of providers for inline completion models, similar to the separation between embedding and llm models. In addition to just allowing users to use a different model for chat and inline completion, generally, the inline completion models are specialized models for inline completion such as starcoder, code-llama, code-gecko, or any of the models from https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard. They also typically have a different interface where they can take in an optional suffix either through a separate parameter or a specific prompt template. It can also be unsafe to assume that any LLM can reliably produce code suitable for inline completion with standard prompt templates and pre/post-processing.

Proposed Solution

Create a new base completion provider class where the handling of the InlineCompletionRequest to produce suggestions can be implemented for each model/provider as the prompt templates, pre/post-processing, and handling of suffix can differ for each provider.

Langchain doesn't seem to provide explicit support for these code completion models (unless I am just unaware), so it might not be possible to rely on langchain in the same way as for general LLMs and embeddings. For example, a model like Google's code-gecko takes in a separate input for suffix, while langchain LLMs only can take in a single input.

Additional context

I'll be willing to work on a PR for this if you'd like me to.

krassowski · 2024-03-04T22:32:23Z

Short-term I would suggest two steps:

make providers for chat and for inline completion separately configurable
allow to tag a provider as only suitable for completion but not for chat (so that it does not display in the selection list for chat) and vice versa

This is because many chat providers, including SOTA models, work reasonably well as completion providers too.

As for prompt templates, the completion and chat prompt templates are separate and configurable on per-provider basis, see:

jupyter-ai/packages/jupyter-ai-magics/jupyter_ai_magics/providers.py

Lines 317 to 321 in e3cd019

    
               def get_chat_prompt_template(self) -> PromptTemplate: 
        
                   """ 
        
                   Produce a prompt template optimised for chat conversation. 
        
                   The template should take two variables: history and input. 
        
                   """

jupyter-ai/packages/jupyter-ai-magics/jupyter_ai_magics/providers.py

Lines 343 to 347 in e3cd019

    
               def get_completion_prompt_template(self) -> PromptTemplate: 
        
                   """ 
        
                   Produce a prompt template optimised for inline code or text completion. 
        
                   The template should take variables: prefix, suffix, language, filename. 
        
                   """

Note that largely arbitrary suffix handling can be applied using the jinja-based prompt template.

Larger refactor will likely be desirable at some point but I would suggest that good rationale for performing such a refactor is presented first (e.g. what cannot be achieved or is problematic with the existing approaches) and the detailed plan agreed before starting the work.

michaelchia · 2024-03-04T22:41:06Z

Thanks for the reply. Could you suggest how I could use the code-gecko model from Google VertexAI with the current implementation? The SDK has a separate param for suffix (see https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/code-completion#code-completion-prompt-python_vertex_ai_sdk). I am not sure how to access that param via langchain.

krassowski · 2024-03-04T23:42:22Z

Well, a hacky but simple idea is that you create a dummy template which looks like:

{prefix}@@@{suffix}

where @@@ is some clever separator which has no chance occurring in real code (probably not @@@ but for sake of simplicity lets use it) and then in _call method of the custom LLM you do:

prefix, suffix = prompt.split('@@@')

and call the API/SDK using these two arguments.

michaelchia · 2024-03-04T23:47:06Z

Yea I was thinking something like that but was hoping I wouldn't need to do that. But sure, that's fine in the meantime. Thanks!

I would be looking forward to the two enhancements you suggested, that would help a lot in the short run.

michaelchia added the enhancement New feature or request label Mar 4, 2024

This was referenced Mar 27, 2024

Allow to swap the DefaultInlineCompletionHandler #702

Open

Distinguish between completion and chat models #711

Merged

krassowski mentioned this issue Apr 5, 2024

Move methods generating completion replies to the provider #717

Merged

dlqqq closed this as completed in #711 May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate providers for inline completion #669

Separate providers for inline completion #669

michaelchia commented Mar 4, 2024

krassowski commented Mar 4, 2024

michaelchia commented Mar 4, 2024

krassowski commented Mar 4, 2024

michaelchia commented Mar 4, 2024

Separate providers for inline completion #669

Separate providers for inline completion #669

Comments

michaelchia commented Mar 4, 2024

Problem

Proposed Solution

Additional context

krassowski commented Mar 4, 2024

michaelchia commented Mar 4, 2024

krassowski commented Mar 4, 2024

michaelchia commented Mar 4, 2024