Skip to content
Alexander Zuev edited this page Oct 19, 2024 · 14 revisions

Kollektiv Roadmap

Roadmap is subject to change.

πŸš‚ v0.2.0 (WIP) - Open-source "Ask AI" for your content

Big idea

0.2.0 will give the user the easiest way to chat with web content (primarily documentation) with a nice, simple web UI (based on Chainlit). It will be offered in two options:

  • open-source solution for DIY enthusiasts - free to use and setup on your own
  • a showcase web app with a set of indexed documents focusing on AI agents, AI, LLM framework - covering the full lifecycle

Deployment

  • Self-hosted: clone repo & setup your self
  • Web-app: hosted solution with +30 AI / LLM / AI-agent focused libraries to work with

User flows

  1. Index web content via @docs command

βš™οΈ Features

  1. Automate ingestion pipeline

    • automate the pipeline from URL to embedding and loading
    • simplify db reset
    • index and do not reload documents that exist
  2. Command support

    • @docs -> parse and load the parsed docs into db
    • @get -> get, parse and store a single URL
  3. Chunking improvements:

    • Add chunk summary generated by LLM
    • Improve chunking quality (token, header preservation)
    • Try out Unstructured.io
  4. Git sync

    • Allow syncing of a Git repo a-la Claude enterprise style
    • @repo -> setup indexing of a github repo

πŸ”Ž Search Relevancy

  1. Improve stability of RAG calls

    • LLM should decide in a smart way when RAG is needed
    • Search re-tries with different query
  2. Setup basic eval suite for RAG

    • Scope: a. Anthropic docs b. Supabase docs c. LlamaIndex docs
    • Evaluation metrics defined (e.g., accuracy, relevance, speed)
    • Automated test suite implemented
    • Baseline performance established for each doc set

πŸ–₯️ UX Improvements

  1. Output streaming
  2. Tool result streaming / output
  3. Improved terminal output (using colorama)
  4. Chainlit UI implementation

πŸ—οΈ Refactoring

  1. Research the need to transition to LangChain [WON'T DO -> LCEL, WHY RE-INVENT THE WHEEL?]
  2. Refactor code to use LlamaIndex for search & retrieval

πŸ“Š Research

  1. Setup basic telemetry for open-source app with public dashboards

Future Releases (Backlog)

V0.3.0 - Chat with a GitHub repo via a web interface V0.4.0 - Confluence agent to chat with company docs V0.5.0 - Open-source plugin that can be installed on any website to instantly give it LLM + RAG over site content, securely, privately

Completed Releases

πŸš‚ V0.1.0 - Initial release

Initial release of Kollektiv (called OmniClaude back then) with the following features:

  • crawling of documentation with FireCrawl
  • custom markdown chunking
  • embedding and storage with ChromaDB
  • custom retrieval with multi-query expansion and re-ranking
  • chat with Sonnet 3.5 with rag search tool