KnowMAN

KnowMAN: Weakly Supervised Multinomial Adversarial Networks

This repository contains code that is used in our paper:
KnowMAN: Weakly Supervised Multinomial Networks - to be published at EMNLP 2021. 🎉
by Luisa März, Ehsaneddin Asgari, Fabienne Braune, Franziska Zimmermann and Benjamin Roth.

For any questions please get in touch

What is KnowMAN about? 🤓

The absence of labeled data for training neural models is often addressed by leveraging knowledge about the specific task, resulting in heuristic but noisy labels. The knowledge is captured in labeling functions, which detect certain regularities or patterns in the training samples and annotate corresponding labels for training. This process of weakly supervised training may result in an over-reliance on the signals captured by the labeling functions and hinder models to exploit other signals or to generalize well.

KnowMAN is an adversarial scheme that enables to control influence of signals associated with specific labeling functions. KnowMAN forces the network to learn representations that are invariant to those signals and to pick up other signals that are more generally associated with an output label. KnowMAN strongly improves results compared to direct weakly supervised learning with a pre-trained transformer language model and a feature-based baseline.

Usage 🚀

Experiments described in our paper can be found in the experiments folder. To run them execute the respective file.
Please make sure that you have downloaded the data files in advance (see datasets section) and adjusted the datafile path in the yaml files!

E.g. run the imdb tfidf training:

python ./experiments/imdb/train_tfidf_imdb.py

E.g. run the spam DistilBERT training:

python ./experiments/spam/train_transformers_spam.py

If you want to change hyperparameters just edit the yaml files in the experiments folder.

Baselines can be found in the baselines folder. To run them please pass the yaml file for the experiment you want to try here.

E.g. run the spouse snorkel training:

python ./baselines/snorkel_training_knodle.py ./experiments/spouse/spouse_tfidf.yaml

Please note that the baselines are only implemented for tf-idf encoding here. The results for DistilBERT baselines can be reproduced by using Knodle.

Datasets 📚

Datasets used in our work:

Spam Dataset - a dataset, based on the YouTube comments dataset from Alberto et al. (2015). Here, the task is to classify whether a text is relevant to the video or holds spam, such as advertisement.
Spouse Dataset - relation extraction dataset is based on the Signal Media One-Million News Articles Dataset from Corney et al. (2016).
IMDb Dataset - a dataset, that consists of short movie reviews. The task is to determine whether a review holds a positive or negative sentiment.

All datasets are part of the the Knodle framework and can be dowloaded here.

Citation 📑

When using our work please cite our Acl Anthology print:

@inproceedings{marz-etal-2021-knowman,
    title = "{K}now{MAN}: Weakly Supervised Multinomial Adversarial Networks",
    author = {M{\"a}rz, Luisa  and
      Asgari, Ehsaneddin  and
      Braune, Fabienne  and
      Zimmermann, Franziska  and
      Roth, Benjamin},
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.751",
    pages = "9549--9557",
    abstract = "The absence of labeled data for training neural models is often addressed by leveraging knowledge about the specific task, resulting in heuristic but noisy labels. The knowledge is captured in labeling functions, which detect certain regularities or patterns in the training samples and annotate corresponding labels for training. This process of weakly supervised training may result in an over-reliance on the signals captured by the labeling functions and hinder models to exploit other signals or to generalize well. We propose KnowMAN, an adversarial scheme that enables to control influence of signals associated with specific labeling functions. KnowMAN forces the network to learn representations that are invariant to those signals and to pick up other signals that are more generally associated with an output label. KnowMAN strongly improves results compared to direct weakly supervised learning with a pre-trained transformer language model and a feature-based baseline.",
}

Acknowledgments 💎

This research was funded by the WWTF though the project “Knowledge-infused Deep Learning for Natural Language Processing” (WWTF Vienna Research Group VRG19-008).

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
KnowMan		KnowMan
baselines		baselines
experiments		experiments
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KnowMAN

KnowMAN: Weakly Supervised Multinomial Adversarial Networks

What is KnowMAN about? 🤓

Usage 🚀

Datasets 📚

Citation 📑

Acknowledgments 💎

About

Releases

Packages

Contributors 2

Languages

LuisaMaerz/KnowMAN

Folders and files

Latest commit

History

Repository files navigation

KnowMAN

KnowMAN: Weakly Supervised Multinomial Adversarial Networks

What is KnowMAN about? 🤓

Usage 🚀

Datasets 📚

Citation 📑

Acknowledgments 💎

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages