gpt-docs-chat README

A simple llamaindex based project to install a local chatbot to answer questions about local files (PDFs or text files).

A local LLM is served up via ollama. RAG (Retrieval Augmented Generation) is provided by llamaindex which requires min 32GB RAM.

privacy: The project does NOT use any remote services like Open AI or Amazon Bedrock, so the documents are kept private.

performance: an NVidia GPU can help speed up the performance (see GPU support section at end)

Example run

=== === ===     [1] Loading LLM... [llama3]     === === ===
=== === ===     [2] Setting up vector store on redis... === === ===
19:48:14 redisvl.index.index INFO   Index already exists, not overwriting.
=== === ===     [3] Reading documents from ./data       === === ===
Ingested 56 Nodes
=== === ===     [4] Starting query loop...      === === ===
How can I help? [to exit, type 'bye' and press ENTER] [for a summary, say 'summary'] >>what data is available for March 2024?

Setup [Unix or Mac or Windows]

note: on Windows, recommend using Ollama directly (on WSL it seems slower even with CUDA for GPU).

install python 3.12
install ollama

ref: https://ollama.com/download

Windows:

download and run the Ollama installer

Unix:

curl -fsSL https://ollama.com/install.sh | sh

note: this also installs graphics card (at least for NVidia)

For Mac see https://github.com/ollama/ollama

Download the LLM and the embedding model:

ollama pull llama3
ollama pull nomic-embed-text

You may need to start ollama:

ollama serve

install pipenv:

python -m pip install pipenv

Install dependencies

pipenv clean
pipenv install --dev

Place your data

Put your text and PDF files under the data folder.

Setup [using redis][better performance]

Adding redis to store vectors helps improve the performance.

Install Docker
Execute redis

First time run: ./run_redis_first.sh
Subsequent runs: ./run_redis_again.sh

Run

WITH redis:

./go-redis.sh

WITHOUT redis:

./go.sh

or

pipenv run python -W "ignore" -m src.main

Debugging

You can set breakpoints by adding Python code:

import pdb
pdb.set_trace()

Trouble-shooting

cannot run via pipenv: try with or without this prefix: python -m. note: the version of Python must match that in the Pipfile file.
llamaindex has exception with 404: check if that model was already pulled via ollama: ollama ls. If the model is missing, then download it via ollama pull <model name>.
cannot see ollama logs:
- on Windows, try killing tha ollama app, and instead run via command line: ollama serve. Also see the ollama docs.
- on Linux, try journalctl -e -u ollama. Also see the ollama docs.
LLM is too slow:
- use redis as vector store (see above)
- to reduce the amount of work done, try editting config.py and set IS_SUMMARY_ENABLED to False (IS_VECTOR_ENABLED should be fast).

References

LLMs

Ollama supports many LLM models - see https://github.com/ollama/ollama?tab=readme-ov-file#model-library

GPU support

GPU support is not absolutely required with smaller LLMs, but if you have an NVidia GPU then it can improve the performance.

With ollama, GPU support should be taken care of. However if you use the LLM 'directly' for example via HuggingFace packages, then you may need to setup CUDA.

Install CUDA [on Ubuntu]

ref https://www.cherryservers.com/blog/install-cuda-ubuntu#step-6-install-gcc

assumption: installing ollama has already installed the NVidia graphics driver.

nvidia-smi

This will output the graphics driver details - note the version of CUDA.

Install gcc:

sudo apt install gcc
gcc -v

Install CUDA toolkit Ubuntu - use NVidia's website to generate the script.

The options for Ubuntu-22.04 are:

The NVidia website will generate an install script.

Adjust the end of the script to use your matching version of CUDA.

Run the script.

Reboot:

sudo reboot now

Edit your bashrc file:

nano ~/.bashrc

Add environment variables (adjust for your version of CUDA)

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64\
                         ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

export CUDA_HOME=/usr/local/cuda-12.4

Press CTRL+O, save, then CTRL+X to exit.

Execute your bashrc to get the new environment variables:

. ~/.bashrc

Test that CUDA is working:

nvcc -V

Install flash-attn

note: this can take a long time!

sudo apt-get install python3.12-dev

pip uninstall -y ninja && pip install ninja

python -m pip install flash-attn --no-build-isolation

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
images		images
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
README.md		README.md
clear-data-cache.sh		clear-data-cache.sh
format.sh		format.sh
go-redis.sh		go-redis.sh
go.sh		go.sh
run_redis_again.sh		run_redis_again.sh
run_redis_first.sh		run_redis_first.sh
start-ollama.ubuntu.sh		start-ollama.ubuntu.sh
stop-ollama.ubuntu.sh		stop-ollama.ubuntu.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gpt-docs-chat README

Example run

Setup [Unix or Mac or Windows]

Install dependencies

Place your data

Setup [using redis][better performance]

Run

Debugging

Trouble-shooting

References

LLMs

GPU support

About

Languages

License

mrseanryan/gpt-docs-chat

Folders and files

Latest commit

History

Repository files navigation

gpt-docs-chat README

Example run

Setup [Unix or Mac or Windows]

Install dependencies

Place your data

Setup [using redis][better performance]

Run

Debugging

Trouble-shooting

References

LLMs

GPU support

About

Topics

Resources

License

Stars

Watchers

Forks

Languages