CODI
Conversation Disentanglement Microservice

CODI is an accessible and user-friendly REST microservice that can automate the disambiguation of a set of instant messages to form conversations by leveraging state-of-the-art machine learning algorithms.

CODI Web User Interface: Screenshot — Messages (left) and color coded conversations (right) with predicted disentanglement (arrows).

Requirements

To run the webserver and correctly disentangle conversations, you will need the following:

Python (v3.10)
(optional, see how to compile MEGAM) OCaml (v4.12.0)

Conda environment

This project was developed in a custom Conda environment. To recreate such an environment, execute the following commands:

conda env create -f ./environment.yml
conda activate codi

To deactivate and delete the environment, execute the following commands:

conda deactivate
conda remove --name codi --all

How to run

Django secret

To run the webserver locally, you first need to generate a Django secret code — which can be done with the following command (from the root of this repository):

python -c 'from django.core.management.utils import get_random_secret_key; print(get_random_secret_key())'

Once the key has been generated, create a .env file in the repository's root. The .env file must be structured as follows:

DJANGO_DEBUG=True
DJANGO_SECRET_KEY="your-key"

(optional) Compile MEGAM Max Entropy Classifier

We include the MEGAM Max Entropy Classifier's latest version from Hal Daume III. To compile it you can run the following commands:

cd codi/api/utils/megam_0.92
make clean
make depend
make

Run CoDi

You can start a local instance of CoDi with the provided run_server.sh script:

cd scripts
chmod +x ./run_server.sh
./run_server.sh

IDE Run and Debug configurations (PyCharm)

Run and debug configurations are available in the directory .run for users who have PyCharm Professional. To use them, open this repository in PyCharm; it will automatically import the configurations for you.

First, navigate to the Django configuration "Run server" > Environment > Environment variables. Here you need to add a new environment variable with key DJANGO_SECRET_KEY and value your-key and an environment variable with key DJANGO_DEBUG and value True (the key is the same as the one you generated earlier).

The compound configuration "Run server" will compile the megam binary and run the server. If you need to run the server, you can use the Django configuration. "Start server".

Docker image

We also offer a Docker image. The Dockerfile and docker-compose for the image can be found in the project's root directory. This image can also be built using the Docker configuration "Compose" (for PyCharm Professional users).

Before running this configuration, ensure that you have a file named .env.production in the repo's root directory. The file needs to have the same structure as the .env described earlier. In this case, we recommend setting the DJANGO_DEBUG variable to False.

Datasets for training, testing, and validation

datasets contains some example datasets (ANNOT or JSON formats) taken from previously published papers.

datasets/annot/from_previous_papers and datasets/json/from_previous_papers include datasets previously published in:

Elsner, M., & Charniak, E. (2010). Disentangling chat. Computational Linguistics, 36(3), pp. 389-409, ACL, 2010.
Chatterjee, P., Damevski, K., Kraft, N. A., & Pollock, L. (2020). Software-related Slack chats with disentangled conversations. In Proceedings of MSR 2020 (International Conference on Mining Software Repositories), pp. 588-592, ACM, 2020.
Subash, K. M., Kumar, L. P., Vadlamani, S. L., Chatterjee, P., & Baysal, O. (2022). DISCO: A Dataset of Discord Chat Conversations for Software Engineering Research. In Proceedings of MSR 2022 (International Conference on Mining Software Repositories), ACM, 2022.

Example: how to train and disentangle (predict)

On the Hompage of CoDi you can train the model from scratch and disentangle and visualize an example dataset:

Check that the Train operation is selected (already selected by default)
Select Slack as the type of platform you want to train on
Leave all the features enabled
Drag and drop the training set datasets/annot/from_previous_papers/training.annot in the page
Wait for the operation to complete (console debug information will inform you about progress)
When the Train operation is completed you can select a Predict operation and a Discord platform and drag and drop one of the Discord datasets to perform a disentanglement prediction (e.g., datasets/annot/from_previous_papers/clojure_Feb2020-Apr2020.annot)
After a successful validation (try, for example, datasets/json/from_previous_papers/validation.json) you can also check the Statistics box for information about disentanglement performance (e.g., Accuracy, F1-score)

Publications

CODI was presented and used in the following scientific research papers:

Riggio, E., Raglianti, M., & Lanza, M. (2023). Conversation Disentanglement As-a-Service. Proceedings of ICPC 2023 (International Conference on Program Comprehension), in press, IEEE.

License

Distributed under the MIT License. See LICENSE for more information.

Contacts

REVEAL - https://reveal.si.usi.ch

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.run		.run
codi		codi
datasets		datasets
docs		docs
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
LICENSE.txt		LICENSE.txt
README.md		README.md
config.ini		config.ini
docker-compose.yml		docker-compose.yml
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CODI
Conversation Disentanglement Microservice

Requirements

Conda environment

How to run

Django secret

(optional) Compile MEGAM Max Entropy Classifier

Run CoDi

IDE Run and Debug configurations (PyCharm)

Docker image

Datasets for training, testing, and validation

Example: how to train and disentangle (predict)

Publications

License

Contacts

About

Releases 1

Packages

Contributors 2

Languages

License

USIREVEAL/CODI

Folders and files

Latest commit

History

Repository files navigation

CODI Conversation Disentanglement Microservice

Requirements

Conda environment

How to run

Django secret

(optional) Compile MEGAM Max Entropy Classifier

Run CoDi

IDE Run and Debug configurations (PyCharm)

Docker image

Datasets for training, testing, and validation

Example: how to train and disentangle (predict)

Publications

License

Contacts

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

CODI
Conversation Disentanglement Microservice

Packages