This is an app that let's you ask questions about any data source by leveraging embeddings, vector databases, large language models and last but not least langchains
- Upload any
file
or enter anypath
orurl
- The data source is detected and loaded into text documents
- The text documents are embedded using openai embeddings
- The embeddings are stored as a vector dataset to a datalake
- A langchain is created consisting of a LLM model (
gpt-3.5-turbo
by default) and the embedding database index as retriever - When sending questions to the bot this chain is used as context to answer your questions
- Finally the chat history is cached locally to enable a ChatGPT like Q&A conversation
- As default context this git repository is taken so you can directly start asking question about its functionality without chosing an own data source.
- To run locally or deploy somewhere, execute
cp .env.template .env
and set necessary keys in the newly created secrets file. Another option is to manually set environment variables - Yes, Chad in
DataChad
refers to the well-known meme