rater-metric-calculator

rater-metric-calculator is a Python tool designed to calculate several inter-rater reliability metrics for multi-rater datasets. It provides a comprehensive set of metrics to evaluate agreement among raters, and prints the results to a convenient well-formatted Markdown report. This tool is ideal for researchers working with subjective ratings or annotations.

Features

Calculate various inter-rater reliability metrics:
- Krippendorff's Alpha
- Fleiss' Kappa
- Kendall's W (Coefficient of Concordance)
- Overall Percent Agreement
- Mean Absolute Difference
Generate pairwise comparisons between raters
Calculate rating distributions
Output results in a well-formatted Markdown report
Generate an AI-powered interpretation report using Claude 3.5 Sonnet (optional)

Installation

To install the Rater Metric Calculator, you'll need Python 3.10 or higher. We recommend using Poetry for dependency management.

Clone the repository:

git clone https://github.com/yourusername/rater-metric-calculator.git
cd rater-metric-calculator

Install dependencies using Poetry:
```
poetry install
```

Usage

To use the Rater Metric Calculator, run the following command:

poetry run python rater_metric_calculator/calculate_rater_metrics.py <input_csv_file> <output_md_file> [--use-llm] [--llm-report-file <llm_report_file>] [--prompt-file <prompt_file>]

Replace <input_csv_file> with the path to your CSV file containing rating data, and <output_md_file> with the desired path for the output Markdown report.

To generate an AI-powered interpretation report using Claude 3.5 Sonnet, add the --use-llm flag to the command. When using this flag, you must also specify the --llm-report-file option with the desired path for the LLM report output file.

LLM Report Generation

When the --use-llm flag is used, the script will:

Generate the standard metrics report
Use the Claude 3.5 Sonnet model to analyze the report
Save the AI-generated interpretation to the file specified by --llm-report-file

Example command with LLM report generation:

poetry run python rater_metric_calculator/calculate_rater_metrics.py input_data.csv output_report.md --use-llm --llm-report-file llm_interpretation.md

You can also specify a custom prompt file for the LLM analysis using the --prompt-file option:

poetry run python rater_metric_calculator/calculate_rater_metrics.py input_data.csv output_report.md --use-llm --llm-report-file llm_interpretation.md --prompt-file custom_prompt.txt

If no prompt file is specified, the script will use a default prompt for the analysis.

Note: To use the LLM report generation feature, you need to set the ANTHROPIC_API_KEY environment variable with your Anthropic API key.

Input Format

The input CSV file should have the following format:

Each column represents a rater
Each row represents an item being rated
Ratings should be integers between 1 and 5 (inclusive)

Example:

Rater1,Rater2,Rater3
1,2,1
3,3,2
5,4,5
2,2,3

Output

The script generates a Markdown file containing:

Overall metrics (Krippendorff's Alpha, Fleiss' Kappa, etc.)
Rating distribution for each rater
Pairwise comparisons between raters

You can optionally generate an AI-powered interpretation report using Claude 3.5 Sonnet, which provides insights and analysis based on the calculated metrics.

Running Tests

To run the test suite, use the following command:

poetry run pytest

Contributing

Contributions to the Rater Metric Calculator are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

This project uses several scientific Python libraries, including pandas, numpy, scipy, and statsmodels.
Special thanks to the developers of the krippendorff and mdutils packages.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
rater_metric_calculator		rater_metric_calculator
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rater-metric-calculator

Features

Installation

Usage

LLM Report Generation

Input Format

Output

Running Tests

Contributing

License

Acknowledgments

About

Languages

License

jrrobison1/rater-metric-calculator

Folders and files

Latest commit

History

Repository files navigation

rater-metric-calculator

Features

Installation

Usage

LLM Report Generation

Input Format

Output

Running Tests

Contributing

License

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Languages