Lots and lots of LLM agent and RAG demos using crew-ai, langchain & llama-index. They are not supposed to be robust and may not always work. They are there to demonstrate the capabilities of the tools and to provide a starting point for further development.
- Clone this repository.
- Create a virtualenv with python3.11:
virtualenv -p python3.11 venv
(or your favourite way of doing it) - Activate the python environment:
source env/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Copy config_example.yml to config.yml
- Update config.yml with values appropriate to your machine. See the config_example.yml & steps 9/10 for more details.
- Point environment variable
KLEIN_CONFIG
to the config file:export KLEIN_CONFIG=/a/path/config.yml
. - create an API_KEY from OPENAI
- Add the API_KEY to the config.yml file. You might need to export the key as well using
export OPENAI_API_KEY=sgfkjgkjfhkdjfhb
. - For the llama-index & open-api examples add the config for the data import folder, persistence folder & yaml folder.
- There is a sample openAPI yaml file in the data folder, you can use that to test the open-api examples. For the llama-index tests you will need to put some docs files in a folder. Note that the open-api example doesn't really work that well. It is based on the Transcriptomics API with some changes made to persuade the code to run.
- Make sure you have sqlite installed (
brew install sqlite
should do it on a mac) and Ollama downloaded on your machine and running, before running any of the commands - After downloading Ollama download the llama2 model used in the ai-insights-agent and the medicines-discovery-agents llm_sql_agent by running this command
ollama run llama2
. You can use this llm model in any of the examples but you would need to change the code. - To run each application, run one of the following commands in the root directory of the repository
python -m src.app.crewai_examples.ai_insights_agent
- uses openAI. Uses crew-ai agents to converse with each other and gather facts around AI based topicspython -m src.app.crewai_examples.medicines_discovery_agents
- uses openAI. Uses crew-ai agents to converse with each other and gather facts around drug discovery based topicspython -m src.app.llama_examples.llama_index
- uses openAI. Vectorises documents into a local file based store and runs a query over the stored documents. If it complains about embedding token size then delete the existing data persistence dir so it can index from the startpython -m src.app.llama_examples.llama_index_open
- uses llama2. Uses open source model to vectorise documents and runs a query over the stored documents. If it complains about embedding token size then delete the existing data persistence dir so it can index from the startpython -m src.app.llama_examples.llama_pg_vector --method index
- or--method query --query "Ask a question about the indexed docs"
. Vectorise docs using pgvector and query over them. Needs a running pg db with pgvector installed. See section below.python -m src.app.langchain_examples.llm_sql_agent
- uses openAI. Trains an llm about a specific sql schema that you can then free text query over.python -m src.app.langchain_examples.langchain_agent conversation
- uses openAI. Demonstrates a chain of prompts enhanced with a web page as RAGpython -m src.app.langchain_examples.langchain_agent retrieval
- uses openAI. Demonstrates single prompt enhanced with a web page as RAGpython -m src.app.langchain_examples.langchain_agent agent
- uses openAI. Demonstrates using a tool (DuckDuckGo) along with a prompt to answer some simple questions.python -m src.app.langchain_examples.open_api_langchain
- uses openAI. Use the DSP atlas openAPI to query the database. This is a bit of a mess and doesn't really work.python -m src.app.langchain_examples.using_tools_langchain_agent
- uses openAI. Demonstrates how to use a free text query along wih an sql agent to fetch information from an sql database. The code trains the LLM with example sql queries against the coshh db. Ask SE for login details.
Note: If using the using_tools_langchain_agent
to query quantities with something like What is the total quantity of Acetone then you need to make sure that the chemical name matches one in the db including capital letters in the correct place.
- execute docker compose:
cd postgres_files
docker-compose up --build
P.S. If used before then you may need to remove existing volumes Either docker-compose down -v
or docker volume ls
and docker volume rm the-volume-name
before running the command above.
- create a Table on PSQL
docker exec -it postgres_files-postgres-1 /bin/bash
psql -U postgres
Then:
CREATE TABLE cro_vector_db (
id bigserial PRIMARY KEY,
id_cro VARCHAR(50),
cap_description TEXT,
capabilities_vector vector(768)-- number of dimensions
);
- Manually, create the extension on PSQL if it doesn't already exist:
CREATE EXTENSION IF NOT EXISTS vector;
Then exit the psql shell and the docker image.
- Is there a chemical called something like Ethylene what lab and cupboard is it stored in and how much is in each cupboard
- Is there a chemical like Nitrocellulose in the lab and what is the storage temperature
- How many chemicals are there in total excluding expired ones
- Is there a chemical like Nitrocellulose and what is the actual name
- What is the name lab and cupboard of chemicals expiring in the next week
- Are there any chemicals that have expired but have not been archived and what are their names lab and cupboard
- Were any chemicals updated today and list the actual names lab location and cupboard
This project is licensed under the terms of the Apache 2 license, which can be found in the repository as LICENSE.txt