Skip to content

CounterGeDi is a pipeline that aims at controlling the counter speech generated to make it emotional, polite and detoxified. Paper accepted at IJCAI 2022.

License

Notifications You must be signed in to change notification settings

hate-alert/CounterGEDI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🕹️ CounterGeDi: A controllable approach to generate polite, detoxified and emotional counterspeech [Accepted at IJCAI 2022: AI for Good(Special Track)]

For more details about our paper

Punyajoy Saha, Kanishk Singh, Adarsh Kumar, Binny Mathew and Animesh Mukherjee : "CounterGeDi: A controllable approach to generate polite, detoxified and emotional counterspeech"

Arxiv Paper Link

Abstract

Recently, many studies have tried to create generation models to assist counter speakers by providing counterspeech suggestions for combating the explosive proliferation of online hate. However, since these suggestions are from a vanilla generation model, they might not include the appropriate properties required to counter a particular hate speech instance. In this paper, we propose CounterGeDi - an ensemble of generative discriminators (GeDi) to guide the generation of a DialoGPT model toward more polite, detoxified, and emotionally laden counterspeech. We generate counterspeech using three datasets and observe significant improvement across different attribute scores. The politeness and detoxification scores increased by around 15% and 6% respectively, while the emotion in the counterspeech increased by at least 10% across all the datasets. We also experiment with triple-attribute control and observe significant improvement over single attribute results when combining complementing attributes, e.g., politeness, joyfulness and detoxification. In all these experiments, the relevancy of the generated text does not deteriorate due to the application of these controls.

WARNING: The repository contains content that are offensive and/or hateful in nature.

Please cite our paper in any published work that uses any of these resources.

@misc{https://doi.org/10.48550/arxiv.2205.04304,
  doi = {10.48550/ARXIV.2205.04304}, 
  url = {https://arxiv.org/abs/2205.04304},
  author = {Saha, Punyajoy and Singh, Kanishk and Kumar, Adarsh and Mathew, Binny and Mukherjee, Animesh},
  keywords = {Computation and Language (cs.CL), Computers and Society (cs.CY), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {CounterGeDi: A controllable approach to generate polite, detoxified and emotional counterspeech},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution 4.0 International}
}

Folder Description 📂


./Discriminator       --> Contains the codes for the Discriminators used in GeDi Model
./Generation  	      --> Contains the codes for Generation of Results using our proposed Model	
./Utils               --> Contains the utility functions like Preprocessing, Data loading etc

Usage instructions

BaseModel Training for Counterspeech

To train the base model for Counterspeech Generation, run the file Generation_training.py, after updating the task name and other saving related parameters as per the requirement(see comments to get more idea on different path variables to be updated).

Generation

For generation of results, run Generation_gedi.py file. In order to generate the required result file, adjust the parameters in params dictionary in the python file, as per the requirement. For example

# To generate sentences controlled for emotion joy + Politeness:
params = {
     ...
     ...
     'disc_weight':[0.5, 0.5],
     ...
     ...
     'task_name':[('Emotion', 'joy'), ('Politeness', 'polite')],
     ...
}

Similarly you can tweak other papameters to change the results as per the requirement.


Evaluation instructions

For Generation Metrics:

  • We evaluate the generated responses on variety of metrics including BLEU,meteor, diversity and novelty.
  • The methods to compute these scores are described in the Evaluation notebook.ipynb

For Emotions Evaluation:

  • Do git clone https://github.com/monologg/GoEmotions-pytorch
  • Then move the Evaluation notebook-Emotion to the GoEmotions-pytorch folder and set file paths accordingly for running evaluation

For Toxicity Evaluation:

For Grammatical Coherence Evaluation:

  • To evaluate whether the respsonses were grammaticaly coreect or not, we use a pretrained model trained on the corpus of linguistic acceptability(COLA scores).
  • The colab notebook could be accessed here - CounterGedi_COLA_eval.ipynb

Todos

  • Add arxiv paper link.
  • Add link to Proceedings paper.
  • Usage Instruction General
  • Add Evaluation Instruction
  • Remove Redundant Files
  • Add generated result files
👍 The repo is still in active developements. Feel free to create an issue !! 👍

About

CounterGeDi is a pipeline that aims at controlling the counter speech generated to make it emotional, polite and detoxified. Paper accepted at IJCAI 2022.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published