We use embeddings techniques like MUSE, LASER, XLM, MutltiBPEemd, fasttext to efficiently transfer knowledge from monolingual test to code-mix text for sentiment analysis of code-mixed text. More information about the methods tried here can be found in here.
All the dependencies of the code are listed in requirements.txt
.
pip install -r requirements.txt
PYTHONIOENCODING=utf-8 python -m laserembeddings download-models
# build the image
docker build -t unsacmt .
# run the container
nvidia-docker run -v $PWD:/app -p 8989:8989 unsacmt
# launch a jupyter notebook
jupyter notebook --ip 0.0.0.0 --port 8989 --allow-root
The Sentiment Analysis data is present is data/cm/
.
The custom fastText embedding is provided here. # TODO
The aligned MUSE embedding is provided here. # TODO
notebooks/archive/*.ipynb
: old notebooks with many more experiments than mentioned in the paper.notebooks/Results.ipynb
: a notebook with all the experimentssrc/utills.py
: code for reading raw data and f1 scoresrc/trainer.py
: code for following training curriculum given the model and datasrc/models.py
: code for simple neural network models used by usesrc/data_prep.py
: code for applying different kinds of embeddings on sentiment analysis dataset
@misc{yadav2020unsupervised,
title={Unsupervised Sentiment Analysis for Code-mixed Data},
author={Siddharth Yadav and Tanmoy Chakraborty},
year={2020},
eprint={2001.11384},
archivePrefix={arXiv},
primaryClass={cs.CL}
}