Skip to content

This repository contains annotated data on inappropriate language in online discussions, generated through a combination of expert annotation, crowd-sourcing, and ChatGPT-based methods.

License

Notifications You must be signed in to change notification settings

BaranBarbarestani/InappropriateLanguageDetection

Repository files navigation

Task description:

This repository contains annotated data on inappropriate language in online discussions, generated through a combination of expert annotation, crowd-sourcing, and ChatGPT-based methods.

annotations:

ChatGPT_explicit: This subfolder contains annotations of explicit inappropriate language identified by ChatGPT.
ExplicitlyInappropriateLanguageInContext: Here, you will find both crowd and expert annotations that highlight instances of explicitly inappropriate language.

codes:

Includes scripts and code used for data processing, analysis, etc.

data:

Holds the raw and processed data used for annotation and analysis. This includes input data in various formats and intermediate data sets generated during processing.

LingoTurk files:

Contains files related to the LingoTurk platform, which was used for collecting annotations. This includes task configurations and instructions.

statistics:

Includes statistical reports and summaries derived from the data set.

the analysis of annotations:

Contains detailed analyses of annotation results, including comparisons between different annotation methods, inter-annotator agreements, error analysis, and insights into annotation discrepancies.

Usage:

Researchers and developers interested in content moderation, natural language processing, and online discourse analysis can benefit from this data set and associated resources.

Citation:

If you use this data set or findings from this repository in your research or projects, please consider citing this repository and our paper.
Citing the paper:
Citing the repository: https://github.com/cltl/InappropriateLanguageDetection

Contact

Please feel free to ask any questions you may have by contacting me via b[dot]barbarestani[at]vu[dot]nl.

About

This repository contains annotated data on inappropriate language in online discussions, generated through a combination of expert annotation, crowd-sourcing, and ChatGPT-based methods.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published