You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Evaluation and agreement scripts for the DISCOSUMO project. Each evaluation script takes both manual annotations as automatic summarization output. The formatting of these files is highly project-specific. However, the evaluation functions for precision, recall, ROUGE, Jaccard, Cohen's kappa and Fleiss' kappa may be applicable to other domains too.
quica is a tool to run inter coder agreement pipelines in an easy and effective ways. Multiple measures are run and results are collected in a single table than can be easily exported in Latex
Replication package for the Archetypal Analysis conducted in the paper: Evaluating the Agreement among Technical Debt Measurement Tools: Building an Empirical Benchmark of Technical Debt Liabilities accepted at Springer's EMSE Journal.
Python tool for calculating inter-rater reliability metrics and generating comprehensive reports for multi-rater datasets. Optionally have an LLM create an interpretation report.