GitHub - TilmanLudewigtHaufe/GraphAugmented-Legal-RAG: Create a knowledge graph out of unstructed legal text - use said knowledge graph in a graph augmented retrieval augmented generation pipeline

Introduction

This project represents a focused effort in the field of legal tech research, where I combine methodologies from natural language processing (NLP), network theory, and machine learning to analyze German legal texts. At its core, the project is structured into two principal components: the generation of a domain-specific knowledge graph and the application of this graph within a Retrieval-Augmented Generation (RAG) system to enhance language model responses.

Objectives

Knowledge Graph Creation: The initial phase involves processing legal texts to develop a comprehensive knowledge graph. This graph visualizes key concepts and their interconnections, offering a detailed map of the legal textual environment.
Context-Enriched RAG System: The ultimate goal of the project is to leverage the knowledge graph in a RAG system. This system enriches and improves the context used in language model responses, ensuring they are grounded in domain-specific legal expertise. It represents a significant step towards more accurate and context-aware responses in legal text analysis.

Key Features

Semantic text splitting for in-depth analysis of legal texts.
Interactive knowledge graph construction from legal documents.
Integration with OpenAI's GPT-3.5 and GPT-4 for cutting-edge NLP capabilities.
Network graph analysis and visualization for elucidating textual relationships.
Use of vector space modeling for analyzing text and graph components.
Contextual analysis techniques, including TF-IDF scoring and cosine similarity, for enhanced understanding.
Application of the knowledge graph in a RAG system for context-rich language model responses.

Installation

Clone the Repository:

git clone https://github.com/TilmanLudewigtHaufe/Graph_Augmented_RAG.git
cd your-repo-name

Set Up Environment:
- Ensure Python 3.8+ is installed.
- Install required packages:
```
pip install -r requirements.txt
```
Environment Variables:
- Create a .env file in the root directory.
- Add OPENAI_API_KEY=your_key_here.
Data Preparation:
- Place your text data (.txt) in the data_input directory.

Usage

Configuration

Before generating the Knowledge Graph or running the Graph-Augmented Retrieval-Augmented Generation, you need to decide and set the splitter.

The splitter is a crucial component that determines how the text data is divided into distinct concepts for the Knowledge Graph. Similarly, in the Graph-Augmented Retrieval-Augmented Generation, the splitter plays a key role in breaking down user queries into manageable chunks for processing.

You can set the splitter in the KG_Creation.py and Graph_RAG_advanced_RAG.py scripts.

Remember, the choice of splitter can significantly impact the performance and results of the system. Therefore, choose a splitter that best suits your data and use case. In the respected code sections three various splitters with different approaches (simple to complex and custom) are provided which can be adjusted to your needs.

Generating the Knowledge Graph

Creating the Knowledge Graph:
- Run the KG_Creation.py script to generate the knowledge graph from your text data.
```
python KG_Creation.py
```
- This process analyzes the text, constructs a graph of interconnected concepts, and saves the graph data.
Visualizing the Knowledge Graph:
- After running KG_Creation.py, open docs/index.html in a web browser.
- This file contains an interactive visualization of the knowledge graph, allowing you to explore the relationships and structures within your data.

Running Graph-Augmented Retrieval-Augmented Generation

Executing the Main Script:
- To engage with the graph-augmented retrieval and generation capabilities, run the Graph_RAG_advanced_RAG.py script.
```
python Graph_RAG_advanced_RAG.py
```
- This script uses the knowledge graph to enhance text retrieval and generation, providing deeper insights and context for user queries.
Interacting with the System:
- Enter queries at the prompt within the Graph_RAG_advanced_RAG.py script interface.
- Use commands like quit, q, or exit to end the interactive session.
Analyzing Outputs:
- The application provides responses based on the knowledge graph and the augmented retrieval mechanism, offering a rich and contextual understanding of the query topics.
- Check the output on the console and any generated files in the data_output directory for detailed insights.

Note

Ensure that the KG_Creation.py script is executed before running Graph_RAG_advanced_RAG.py, as the latter depends on the knowledge graph generated by the former.

License

This project is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. This allows for non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

For more details, see the full text of the license.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
__pycache__		__pycache__
data_input		data_input
data_output		data_output
docs		docs
semantic_chunking_pack		semantic_chunking_pack
system_prompts		system_prompts
vectorstores/kg		vectorstores/kg
.gitignore		.gitignore
Diagramm.drawio		Diagramm.drawio
Flowchart_drawio.png		Flowchart_drawio.png
G-RAG_naive_RAG.py		G-RAG_naive_RAG.py
Graph_RAG_advanced_RAG.py		Graph_RAG_advanced_RAG.py
KG_Creation.py		KG_Creation.py
README.md		README.md
convert_G_to_neo4j.py		convert_G_to_neo4j.py
query_neo4j_with_llama_index.py		query_neo4j_with_llama_index.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Objectives

Key Features

Installation

Usage

Configuration

Generating the Knowledge Graph

Running Graph-Augmented Retrieval-Augmented Generation

Note

License

About

Releases

Packages

Languages

TilmanLudewigtHaufe/GraphAugmented-Legal-RAG

Folders and files

Latest commit

History

Repository files navigation

Introduction

Objectives

Key Features

Installation

Usage

Configuration

Generating the Knowledge Graph

Running Graph-Augmented Retrieval-Augmented Generation

Note

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages