Skip to content

Create a knowledge graph out of unstructed legal text - use said knowledge graph in a graph augmented retrieval augmented generation pipeline

Notifications You must be signed in to change notification settings

TilmanLudewigtHaufe/GraphAugmented-Legal-RAG

Repository files navigation

Introduction

This project represents a focused effort in the field of legal tech research, where I combine methodologies from natural language processing (NLP), network theory, and machine learning to analyze German legal texts. At its core, the project is structured into two principal components: the generation of a domain-specific knowledge graph and the application of this graph within a Retrieval-Augmented Generation (RAG) system to enhance language model responses.

Objectives

  1. Knowledge Graph Creation: The initial phase involves processing legal texts to develop a comprehensive knowledge graph. This graph visualizes key concepts and their interconnections, offering a detailed map of the legal textual environment.

  2. Context-Enriched RAG System: The ultimate goal of the project is to leverage the knowledge graph in a RAG system. This system enriches and improves the context used in language model responses, ensuring they are grounded in domain-specific legal expertise. It represents a significant step towards more accurate and context-aware responses in legal text analysis.

Key Features

  • Semantic text splitting for in-depth analysis of legal texts.
  • Interactive knowledge graph construction from legal documents.
  • Integration with OpenAI's GPT-3.5 and GPT-4 for cutting-edge NLP capabilities.
  • Network graph analysis and visualization for elucidating textual relationships.
  • Use of vector space modeling for analyzing text and graph components.
  • Contextual analysis techniques, including TF-IDF scoring and cosine similarity, for enhanced understanding.
  • Application of the knowledge graph in a RAG system for context-rich language model responses.

Installation

  1. Clone the Repository:

    git clone https://github.com/TilmanLudewigtHaufe/Graph_Augmented_RAG.git
    cd your-repo-name
    
  2. Set Up Environment:

    • Ensure Python 3.8+ is installed.
    • Install required packages:
      pip install -r requirements.txt
      
  3. Environment Variables:

    • Create a .env file in the root directory.
    • Add OPENAI_API_KEY=your_key_here.
  4. Data Preparation:

    • Place your text data (.txt) in the data_input directory.

Usage

Configuration

Before generating the Knowledge Graph or running the Graph-Augmented Retrieval-Augmented Generation, you need to decide and set the splitter.

The splitter is a crucial component that determines how the text data is divided into distinct concepts for the Knowledge Graph. Similarly, in the Graph-Augmented Retrieval-Augmented Generation, the splitter plays a key role in breaking down user queries into manageable chunks for processing.

You can set the splitter in the KG_Creation.py and Graph_RAG_advanced_RAG.py scripts.

Remember, the choice of splitter can significantly impact the performance and results of the system. Therefore, choose a splitter that best suits your data and use case. In the respected code sections three various splitters with different approaches (simple to complex and custom) are provided which can be adjusted to your needs.

Generating the Knowledge Graph

  1. Creating the Knowledge Graph:

    • Run the KG_Creation.py script to generate the knowledge graph from your text data.
      python KG_Creation.py
      
    • This process analyzes the text, constructs a graph of interconnected concepts, and saves the graph data.
  2. Visualizing the Knowledge Graph:

    • After running KG_Creation.py, open docs/index.html in a web browser.
    • This file contains an interactive visualization of the knowledge graph, allowing you to explore the relationships and structures within your data.

Running Graph-Augmented Retrieval-Augmented Generation

  1. Executing the Main Script:

    • To engage with the graph-augmented retrieval and generation capabilities, run the Graph_RAG_advanced_RAG.py script.
      python Graph_RAG_advanced_RAG.py
      
    • This script uses the knowledge graph to enhance text retrieval and generation, providing deeper insights and context for user queries.
  2. Interacting with the System:

    • Enter queries at the prompt within the Graph_RAG_advanced_RAG.py script interface.
    • Use commands like quit, q, or exit to end the interactive session.
  3. Analyzing Outputs:

    • The application provides responses based on the knowledge graph and the augmented retrieval mechanism, offering a rich and contextual understanding of the query topics.
    • Check the output on the console and any generated files in the data_output directory for detailed insights.

Note

  • Ensure that the KG_Creation.py script is executed before running Graph_RAG_advanced_RAG.py, as the latter depends on the knowledge graph generated by the former.

License

This project is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. This allows for non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

For more details, see the full text of the license.


About

Create a knowledge graph out of unstructed legal text - use said knowledge graph in a graph augmented retrieval augmented generation pipeline

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages