Unsupervised Sentiment Analysis for Code-mixed Data

We use embeddings techniques like MUSE, LASER, XLM, MutltiBPEemd, fasttext to efficiently transfer knowledge from monolingual test to code-mix text for sentiment analysis of code-mixed text. More information about the methods tried here can be found in here.

Environment

All the dependencies of the code are listed in requirements.txt.

pip

    pip install -r requirements.txt
    PYTHONIOENCODING=utf-8 python -m laserembeddings download-models

docker

    # build the image 
    docker build -t unsacmt .
    
    # run the container
    nvidia-docker run -v $PWD:/app -p 8989:8989 unsacmt
    
    # launch a jupyter notebook
    jupyter notebook --ip 0.0.0.0 --port 8989 --allow-root

Data

The Sentiment Analysis data is present is data/cm/.
The custom fastText embedding is provided here. # TODO
The aligned MUSE embedding is provided here. # TODO

Files Description

notebooks/archive/*.ipynb: old notebooks with many more experiments than mentioned in the paper.
notebooks/Results.ipynb: a notebook with all the experiments
src/utills.py: code for reading raw data and f1 score
src/trainer.py: code for following training curriculum given the model and data
src/models.py: code for simple neural network models used by use
src/data_prep.py: code for applying different kinds of embeddings on sentiment analysis dataset

Citation

@misc{yadav2020unsupervised,
    title={Unsupervised Sentiment Analysis for Code-mixed Data},
    author={Siddharth Yadav and Tanmoy Chakraborty},
    year={2020},
    eprint={2001.11384},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data/cm		data/cm
notebooks		notebooks
src		src
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unsupervised Sentiment Analysis for Code-mixed Data

Environment

pip

docker

Data

Files Description

Citation

About

Releases

Packages

Languages

sedflix/unsacmt

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Sentiment Analysis for Code-mixed Data

Environment

pip

docker

Data

Files Description

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages