JuDDGES

JuDDGES stands for Judicial Decision Data Gathering, Encoding, and Sharing

The JuDDGES project aims to revolutionize the accessibility and analysis of judicial decisions across varied legal systems using advanced Natural Language Processing and Human-In-The-Loop technologies. It focuses on criminal court records from jurisdictions with diverse legal constitutions, including Poland and England & Wales. By overcoming barriers related to resources, language, data, and format inhomogeneity, the project facilitates the development and testing of theories on judicial decision-making and informs judicial policy and practice. Open software and tools produced by the project will enable extensive, flexible meta-annotation of legal texts, benefiting researchers and public legal institutions alike. This initiative not only advances empirical legal research by adopting Open Science principles but also creates the most comprehensive legal research repository in Europe, fostering cross-disciplinary and cross-jurisdictional collaboration.

Usage

Installation

The project requires Python 3.11 and one of the following dependencies:

to install necessary dependencies use available Makefile, you can use python>=3.11: make install
if you want to run evaluation and fine-tuning with unsloth, use the following command inside conda environment: make install_unsloth

Dataset creation

The specific details of dataset creation are available in scripts/README.md.

Inference, fine-tuning and evaluation

All commands for running inference, fine-tuning, and evaluation are declared as stages in the dvc.yaml file (see DVC docs for details). Some stages are set up as a matrix, meaning it runs for a combination of different parameters (e.g. models and random seeds). Moreover, some scripts are configured with the hydra tool. Simpler scripts, such as n-gram-based evaluation, leverage command line arguments instead of hydra configuration. Below, we provide commands to reproduce each of the stages and point to the appropriate configuration files.

Note

To run the following commands, you'll need all dependencies installed and a system with a GPU that has at least 40GB VRAM.

Tip

To introduce a new model, either from hf-hub or a local model/adapter, add its configuration to the configs/model directory.

Tip

To run a stage for a single combination of parameters from the DVC matrix, simply run it with its full name, e.g., predict@Bielik-7B-Instruct-v0.1-42 (check for names with dvc stage list <stage_name>).

Inference

Configuration file: configs/predict.yaml
Platform-specific environment variables:
- CUDA_VISIBLE_DEVICES: GPU device ID
- NUM_PROC: Number of processes to run in parallel

Command:

CUDA_VISIBLE_DEVICES=0 NUM_PROC=10 dvc repro predict

Outputs: LLM predictions (Information extracted by an LLM)

Fine-tuning

Configuration file: configs/fine_tuning.yaml
Platform-specific environment variables:
- CUDA_VISIBLE_DEVICES: GPU device ID
- NUM_PROC: Number of processes to run in parallel

Command:

CUDA_VISIBLE_DEVICES=0 NUM_PROC=10 dvc repro sft_unsloth

Outputs: Trained LLM adapter

Evaluation

N-gram-based evaluation
- Configuration file: n/a (command-line arguments as config)
- Command:
```
dvc repro evaluate
```
- Inputs: Information extracted by an LLM (see Inference section)
- Outputs: Metrics
LLM-as-judge evaluation
- Configuration file: configs/llm_judge.yaml
- Command:
```
dvc repro evaluate_llm_as_judge
```
- Inputs: Information extracted by an LLM (see Inference section)
- Outputs: Metrics

Project details

The JuDDGES project encompasses several Work Packages (WPs) designed to cover all aspects of its objectives, from project management to the open science practices and engaging early career researchers. Below is an overview of the project’s WPs based on the provided information:

WP1: Project Management

Duration: 24 Months

Main Aim: To ensure the project’s successful completion on time and within budget. This includes administrative management, scientific and technological management, quality innovation and risk management, ethical and legal consideration, and facilitating open science.

WP2: Gathering and Human Encoding of Judicial Decision Data

Duration: 22 Months

Main Aim: To establish the data foundation for developing and testing the project’s tools. This involves collating/gathering legal case records and judgments, developing a coding scheme, training human coders, making human-coded data available for WP3, facilitating human-in-loop coding for WP3, and enabling WP4 to make data open and reusable beyond the project team.

WP3: NLP and HITL Machine Learning Methodological Development

Duration: 24 Months

Main Aim: To create a bridge between machine learning (led by WUST and MUHEC) and Open Science facilitation (by ELICO), focusing on the development and deployment of annotation methodologies. This includes baseline information extraction, intelligent inference methods for legal corpus data, and constructing an annotation tool through active learning and human-in-the-loop annotation methods.

WP4: Open Science Practices & Engaging Early Career Researchers

Duration: 12 Months

Main Aim: To implement the Open Science policy of the call and engage with relevant early career researchers (ECRs). Objectives include providing open access to publication data and software, disseminating/exploiting project results, and promoting the project and its findings.

Each WP includes specific tasks aimed at achieving its goals, involving collaboration among project partners and contributing to the overarching aim of the JuDDGES project.

Acknowledgements

The universities involved in the JuDDGES project are:

Wroclaw University of Science and Technology (Poland)
Middlesex University London (UK)
University of Lyon 1 (France).

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.dvc		.dvc
.github/workflows		.github/workflows
configs		configs
dashboards		dashboards
data		data
juddges		juddges
nbs		nbs
nginx		nginx
scripts		scripts
tests		tests
weaviate		weaviate
.dvcignore		.dvcignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_unsloth.txt		requirements_unsloth.txt
settings.ini		settings.ini
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JuDDGES

Usage

Installation

Dataset creation

Inference, fine-tuning and evaluation

Inference

Fine-tuning

Evaluation

Project details

WP1: Project Management

WP2: Gathering and Human Encoding of Judicial Decision Data

WP3: NLP and HITL Machine Learning Methodological Development

WP4: Open Science Practices & Engaging Early Career Researchers

Acknowledgements

About

Releases

Packages

Contributors 5

Languages

pwr-ai/JuDDGES

Folders and files

Latest commit

History

Repository files navigation

JuDDGES

Usage

Installation

Dataset creation

Inference, fine-tuning and evaluation

Inference

Fine-tuning

Evaluation

Project details

WP1: Project Management

WP2: Gathering and Human Encoding of Judicial Decision Data

WP3: NLP and HITL Machine Learning Methodological Development

WP4: Open Science Practices & Engaging Early Career Researchers

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages