Skip to content

interrobangc/codellm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CodeLLM

Tests

CodeLLM is an extensible LLM agent pipeline framework that can be used to build a system of AI agents, tools, and datastores that interact to solve complex problems. It was initially inspired by the groundcrew project. It is initially focused on code and codebase analysis, but could be extended to other domains.

It is still very much a POC and is not ready for use beyond experimentation.

Architecture

The system is composed of a few main components. It is designed to be extensible and pluggable. The main components are:

  • CLI: A simple command line interface to interact with the system.
  • Core: The core of the system. It is responsible for orchestrating the agents and tools.
  • Providers: These are a standard interface for various LLM providers.
  • Tools: These are the tools that the agent uses to gather data. They are responsible for taking a query and returning data. They are not responsible for making the query to the provider.
  • Remix: A web interface that allows you to ask questions about the codebase and see the results.
  • VectorDbs: A vector database that stores embeddings of code files and other data.

Prerequisites

Code

Provider

The system currently supports ollama and openai providers. You will need to have ollama running locally or configure an API key for openai.

Quickstart

Setup

nvm use
npm ci

Datastore

The first step is to start the vector db. This will currently start a local instance of chromadb that listens on http://localhost:8000 persists data to the .chromadb directory in the root of the project.

npm run start:datastore

To stop the datastore:

npm run stop:datastore

To view logs from the datastore:

npm run logs:datastore

Import embeddings

Use the archived chromadb

If you have git-lfs installed, you can use the archived chromadb to import the embeddings. This is the fastest way to get started.

npm run extract:chromadb

Run the importer

The first step is to import the embeddings for your codebase. We use a locally running chromadb instance as a vector db. The initial import will take a while. It can be run again to update the embeddings and will only import updated and new files.

By default it will import ts code from this repository. You can change the CODELLM_IMPORT_PATH environment variable to point to a different codebase, modify the cli/config.yml file, or create a new yaml config file and set the CODELLM_CONFIG environment variable to point to it.

  npm run start:import

Run the cli agent

Ollama

This assumes you have ollama running locally on the default port.

npm start

Anthropic (Claude)

This assumes you have an API key for anthropic set as an environment variable: ANTHROPIC_API_KEY.

CODELLM_PROVIDER=anthropic npm start

Mistral

This assumes you have an API key for mistral set as an environment variable: MISTRAL_API_KEY.

CODELLM_PROVIDER=mistral npm start

OpenAI

This assumes you have an API key for openai set as an environment variable: OPENAI_API_KEY.

CODELLM_PROVIDER=openai npm start

Run the remix app

The remix app is a simple web interface that allows you to ask questions about the codebase and see the results. It is a work in progress.

npm run dev