RAG Annotation Tool for Ground Truth Creation

The annotation tool can be used to simplify the ground truth creation process for the evaluation of retrieval systems.

It allows for selecting a set of texts that are relevant to a given text / description and supports the following features:

Interactive selection of relevant texts
Metadata display and highlighting of keywords and phrases
Built in BM25 search capabilities to locate and include additional texts from the corpus

Demo

Installation

Clone the repository

git clone https://github.com/makefinks/rag-annotator.git
cd rag-annotator

Install requirements (uv is recommended)

uv sync

alternatively use: pip install -r requirements.txt

Activate the virtual environment

source .venv/bin/activate

How it works

Input preperation

The usage of this tool requires a specific input format (JSON). The detailed format can be seen in utils/ground_truth_schema.json.

Ground Truth Schema Explanation

The input JSON must contain the following main fields:

points: An array of objects, each representing an evaluation point. Each point contains:
- id: Integer identifier for the point.
- title: Title string for the point.
- description: Description or query string.
- keywords (optional): Array of keywords relevant to the points description. Highlighted inside retrieved texts.
- fetched_texts: Array of text objects fetched for this point. Each has:
  - id: Integer identifier for the text.
  - text: The text content.
  - source: The source string.
  - metadata (optional): Additional metadata (e.g., description).
  - highlights (optional): Array of strings to highlight in the text.
- selected_texts: Array of selected text objects (same structure as fetched_texts).
- evaluated: Boolean indicating if the point has been evaluated.
all_texts: An array of all possible text objects in the dataset, each with:
- id: Integer identifier.
- text: The text content.

Refer to app/utils/ground_truth_schema.json for the complete and up-to-date schema.

Usage of the tool

python annotation_tool.py

Upon selecting the file in the prepared format the tool will load the data and display the GUI:

Top Panel: The current index of the evaluation object / point, a title dropdown, and the description / text of the object.
Left Panel: The list of texts that were fetched and are supposed to be selected as relevant.
Right Panel: A BM25 seach bar and result display, that allows you to search for a specific texts for all texts in the dataset.
Bottom Panel: Buttons for Navigation and saving the current state of the annotation.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
app		app
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
annotate_tool.py		annotate_tool.py
example_input.json		example_input.json
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG Annotation Tool for Ground Truth Creation

Demo

Installation

Clone the repository

Install requirements (uv is recommended)

Activate the virtual environment

How it works

Input preperation

Ground Truth Schema Explanation

Usage of the tool

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

makefinks/rag-annotator

Folders and files

Latest commit

History

Repository files navigation

RAG Annotation Tool for Ground Truth Creation

Demo

Installation

Clone the repository

Install requirements (uv is recommended)

Activate the virtual environment

How it works

Input preperation

Ground Truth Schema Explanation

Usage of the tool

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages