Knowledge graph retrieval to improve multi-hop Q&A performance, optimized with GNN + LLM models.
This repo contains experiments for combining Knowledge Graph Retrieval with GNN+LLM models to improve RAG. Currently leveraging Neo4j, G-Retriever, and the STaRK-Prime dataset for benchmarking.
- RAG on large knowledge graphs that require multi-hop retrieval and reasoning, beyond node classification and link prediction.
- General, extensible 2-part architecture: KG Retrieval & GNN+LLM.
- Efficient, stable inference time and output for real-world use cases.
Install the Neo4j database (and relevant JDK) by following official instructions. You'll also need the Neo4j GenAI plugin.
With the database installed and running, you can load the STaRK-Prime dataset by running the python notebook in data-loading/stark_prime_neo4j_loading.ipynb
.
Alternatively, obtain a database dump at AWS S3 for database version 5.23.
Install all required libraries in requirements.txt
.
Additionally, make sure huggingface-cli authentications are set up for using relevant (Llama2, Llama3) models.
- To train a model with default configurations, run the following command:
python train.py --checkpointing --llama_version llama3.1-8b --retrieval_config_version 0 --algo_config_version 0 --g_retriever_config_version 0 --eval_batch_size 4
- To get result for Pipline, run
eval_pcst_ordering.ipynb
on using the intermediate dataset and g-retriever model. - To exactly reproduce results in the below table, use the
stanford-workshop-2024
branch. Themain
branch contains new incremental changes and improvements.
- For a high-level overview of Neo4j & GenAI, have a look at neo4j.com/genai.
- To learn how to get started using LLMs with Neo4j see this online Graph Academy course which is one of many Neo4j-GenAI courses covering multiple topics ranging from KG construction, to graph+vector search, and building GenAI chatbot applications.
- Pick your GenAI framework of choice to start building your own GenAI applications with Neo4j.
- Check out Neo4j GenAI technical blogs for other worked examples and integrations.