Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate providers for inline completion #669

Closed
michaelchia opened this issue Mar 4, 2024 · 4 comments · Fixed by #711
Closed

Separate providers for inline completion #669

michaelchia opened this issue Mar 4, 2024 · 4 comments · Fixed by #711
Labels
enhancement New feature or request

Comments

@michaelchia
Copy link
Collaborator

Problem

I would like to propose having a separate set of providers for inline completion models, similar to the separation between embedding and llm models. In addition to just allowing users to use a different model for chat and inline completion, generally, the inline completion models are specialized models for inline completion such as starcoder, code-llama, code-gecko, or any of the models from https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard. They also typically have a different interface where they can take in an optional suffix either through a separate parameter or a specific prompt template. It can also be unsafe to assume that any LLM can reliably produce code suitable for inline completion with standard prompt templates and pre/post-processing.

Proposed Solution

Create a new base completion provider class where the handling of the InlineCompletionRequest to produce suggestions can be implemented for each model/provider as the prompt templates, pre/post-processing, and handling of suffix can differ for each provider.

Langchain doesn't seem to provide explicit support for these code completion models (unless I am just unaware), so it might not be possible to rely on langchain in the same way as for general LLMs and embeddings. For example, a model like Google's code-gecko takes in a separate input for suffix, while langchain LLMs only can take in a single input.

Additional context

I'll be willing to work on a PR for this if you'd like me to.

@michaelchia michaelchia added the enhancement New feature or request label Mar 4, 2024
@krassowski
Copy link
Member

Short-term I would suggest two steps:

  1. make providers for chat and for inline completion separately configurable
  2. allow to tag a provider as only suitable for completion but not for chat (so that it does not display in the selection list for chat) and vice versa

This is because many chat providers, including SOTA models, work reasonably well as completion providers too.

As for prompt templates, the completion and chat prompt templates are separate and configurable on per-provider basis, see:

def get_chat_prompt_template(self) -> PromptTemplate:
"""
Produce a prompt template optimised for chat conversation.
The template should take two variables: history and input.
"""

def get_completion_prompt_template(self) -> PromptTemplate:
"""
Produce a prompt template optimised for inline code or text completion.
The template should take variables: prefix, suffix, language, filename.
"""

Note that largely arbitrary suffix handling can be applied using the jinja-based prompt template.

Larger refactor will likely be desirable at some point but I would suggest that good rationale for performing such a refactor is presented first (e.g. what cannot be achieved or is problematic with the existing approaches) and the detailed plan agreed before starting the work.

@michaelchia
Copy link
Collaborator Author

Thanks for the reply. Could you suggest how I could use the code-gecko model from Google VertexAI with the current implementation? The SDK has a separate param for suffix (see https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/code-completion#code-completion-prompt-python_vertex_ai_sdk). I am not sure how to access that param via langchain.

@krassowski
Copy link
Member

Well, a hacky but simple idea is that you create a dummy template which looks like:

{prefix}@@@{suffix}

where @@@ is some clever separator which has no chance occurring in real code (probably not @@@ but for sake of simplicity lets use it) and then in _call method of the custom LLM you do:

prefix, suffix = prompt.split('@@@')

and call the API/SDK using these two arguments.

@michaelchia
Copy link
Collaborator Author

Yea I was thinking something like that but was hoping I wouldn't need to do that. But sure, that's fine in the meantime. Thanks!

I would be looking forward to the two enhancements you suggested, that would help a lot in the short run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants