Skip to content

Latest commit

 

History

History
84 lines (56 loc) · 2.23 KB

README.md

File metadata and controls

84 lines (56 loc) · 2.23 KB

#Chat with your documents

This repo is an implementation of a locally hosted chatbot specifically focused on question answering over documents of different formats.

Built with LangChain.

The app leverages LangChain's streaming support and async API to update the page in real time for multiple users.

Getting Started

First, create a new .env file from .env.example and add your OpenAI API key found here.

cp .env.example .env

Prerequisites

  • Node.js (v16 or higher)
  • Yarn
  • wget (on macOS, you can install this with brew install wget)

Next, we'll need to load our data source.

Data Ingestion

Data ingestion happens in two steps.

First, you should save your documents in source_documentsfolder.

Different formats are supported:

  • csv
  • doc/docx
  • enex (Evernote)
  • eml (e-mail)
  • epub
  • html
  • md
  • odt
  • pdf
  • ppt/pptx
  • txt

Next, install dependencies and run the ingestion script:

yarn
cd ingest
pip install -r requirements.txt
python ingest_docs.py

Note: If on Node v16, use NODE_OPTIONS='--experimental-fetch' yarn ingest

This will parse the data, split text, create embeddings, store them in a vectorstore, and then save it to the db/ directory.

We save it to a directory because we only want to run the (expensive) data ingestion process once.

The Next.js server relies on the presence of the db/ directory. Please make sure to run this before moving on to the next step.

Running the Server

Then, run the development server:

yarn dev

Open http://localhost:3000 with your browser to see the result.

Deploying the server

The production version of this repo is hosted on fly. To deploy your own server on Fly, you can use the provided fly.toml and Dockerfile as a starting point.

Inspirations

This repo borrows heavily from