privacy-glue

This repository documents PrivacyGLUE; a NLP benchmark consisting of legal-privacy related tasks.

Dependencies 🔍

This repository's code was tested with Python version 3.8.13. To sync dependencies, we recommend creating a virtual environment with the same python version and installing the relevant packages with poetry:
```
$ poetry install
```
Alternatively, install dependencies in the virtual environment using pip:
```
$ pip install -r requirements.txt
```
This repository requires a working installation of Git LFS to access upstream task data. We utilized version 3.2.0 in our implementation.
Optional: If you intend to develop this repository further, we recommend installing pre-commit to utilize local pre-commit hooks for various code-checks.

To prepare the necessary git submodules and data, simply execute:
```
$ bash scripts/prepare.sh
```
Optional: If you intend to further develop this repository, execute the following to initialize pre-commit hooks:
```
$ pre-commit install
```

Task	Type	Study
OPP-115	Multi-label^* sequence classification	Wilson et al. (2016)^***
PI-Extract	Joint multi-class^** sequence tagging	Duc et al. (2021)
Policy-Detection	Binary sequence classification	Amos et al. (2021)
PolicyIE-A	Multi-class^** sequence classification	Ahmad et al. (2021)
PolicyIE-B	Joint multi-class^** sequence tagging	Ahmad et al. (2021)
PolicyQA	Reading comprehension	Ahmad et al. (2021)
PrivacyQA	Binary sequence classification	Ravichander et al. (2019)

^*Multi-label implies that each classification task can have more than one gold standard label

^**Multi-class implies that each classification task can only have one gold standard label out of multiple choices

^***Data splits were not defined in Wilson et al. (2016) and were instead taken from Mousavi et al. (2020)