Skip to content

iyuge2/M-SENA-Backend

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python 3.6 Torch 1.2 Flask 1.1.2 License

This project is the backend of the M-SENA Platform.

Installation

Docker

We provide a docker image of our platform. See the main repo for instructions.

From Source

1. Clone this Repository

$ git clone https://github.com/iyuge2/M-SENA-Backend.git
$ cd M-SENA-Backend

2. Install Requirements

  • Install system requirements
$ apt install mysql-server default-libmysqlclient-dev libsndfile1 ffmpeg
  • Install python requirements
$ conda create --name sena python=3.8
$ source active sena
$ pip install -r requirements.txt

3. Configure MySQL

  • Login MySQL with root
$ mysql -u root -p
  • Create a database for M-SENA
mysql> CREATE DATABASE sena;
  • Create a user for M-SENA and grant privileges
mysql> CREATE USER sena IDENTIFIED BY 'MyPassword';
mysql> GRANT ALL PRIVILEGES ON sena.* TO sena@`%`;
mysql> FLUSH PRIVILEGES;

4. Configs

  • Edit Constants.py. Alter DATASET_ROOT_DIR, DATASET_SERVER_IP, OPENFACE_FEATURE_PATH, MM_CODES_PATH, MODEL_TMP_SAVE, AL_CODES_PATH and LIVE_TMP_PATH to fit your settings.
  • Edit config.sh. Look for DATABASE_URL and change it to fit your database settings.

5. Datasets

  • Download datasets and locate them under DATASET_ROOT_DIR specified in constants.py
  • Add information in DATASET_ROOT_DIR/config.json file to register the new dataset.
  • Format datasets with MM-Codes/data/DataPre.py
  • For datasets that needs labeling, the config file locates in AL-Codes directory.
$ python MM-Codes/data/DataPre.py --working_dir $PATH_TO_DATASET --openface2Path $PATH_TO_OPENFACE2_FeatureExtraction_TOOL --language cn/en
  • The structure of the DATASET_ROOT_DIR directory is introduced in the next section.

6. Run

$ source config.sh
$ flask run --host=0.0.0.0

Reference

Dataset Structure

The structure of the root dataset directory should look like this:

.
├── config.json
├── MOSEI
│   ├── label.csv
│   ├── Processed
│   └── Raw
├── MOSI
│   ├── label.csv
│   ├── Processed
│   └── Raw
└── SIMS
    ├── label.csv
    ├── Processed
    └── Raw
  • config.json: stating necessary information for all datasets. For example, language, label_path, features, etc. It only works when scanning and updating datasets.
  • **/label.csv: storing detailed information for each video clip in ** dataset, including video_id, clip_id, normal text, label value (Float), annotation (String), mode (training attributes). Besides, we define a field label_by to indicate the label type, which is necessary for labeling based on active learning.

dataset-Label

  • **/Processed: placing feature files. We use pickle to store processed features, which are organized as the following structure. These files are used in MM-Codes.
{
    "train": {
        "raw_text": [],
        "audio": [],
        "vision": [],
        "id": [], # [video_id$_$clip_id, ..., ...]
        "text": [],
        "text_bert": [],
        "audio_lengths": [],
        "vision_lengths": [],
        "annotations": [],
        "classification_labels": [], # Negative(< 0), Neutral(0), Positive(> 0)
        "regression_labels": []
    },
    "valid": {***}, # same as the "train"
    "test": {***}, # same as the "train"
}
  • **/Raw: placing raw videos. The path of each clip should be consistent with label.csv.

We provide the download link for preprocessed SIMS, code: 4aa6, md5: 3befed5d2f6ea63a8402f5875ecb220d, which follows the above requirements. You can get more datasets from CMU-MultimodalSDK.

Code Structure

The source code is organized as follows:

.
├── AL-Codes                # Active learning codes
├── MM-Codes                # MSA algorithm codes
├── app.py                  # Flask main codes
├── config.py               # Basic config
├── config.sh               # Basic config
├── constants.py            # Global variable definition
├── database.py             # Database definition & initialization
├── httpServer.py           # Dataset server (for video previews)
└── requirements.txt        # Python requirements
  • MM-Codes

MSA Code Framework

Based on MMSA, all model and dataset parameters are saved in MM-Codes/config.json.

  • AL-Codes

Labeling based on Active Learning Code Framework

Based on MMSA, all model and dataset parameters are saved in AL-Codes/config.json.

About

Multimodal Sentiment Analysis System

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages