This project is the backend of the M-SENA Platform.
We provide a docker image of our platform. See the main repo for instructions.
$ git clone https://github.com/iyuge2/M-SENA-Backend.git
$ cd M-SENA-Backend
- Install system requirements
$ apt install mysql-server default-libmysqlclient-dev libsndfile1 ffmpeg
- Install python requirements
$ conda create --name sena python=3.8
$ source active sena
$ pip install -r requirements.txt
- Download Bert-Base, Chinese from Google-Bert. Then, convert Tensorflow into pytorch using transformers-cli. Place the converted model under
MM-Codes/pretrained_model
directory. - Install Openface Toolkits
- Login MySQL with root
$ mysql -u root -p
- Create a database for M-SENA
mysql> CREATE DATABASE sena;
- Create a user for M-SENA and grant privileges
mysql> CREATE USER sena IDENTIFIED BY 'MyPassword';
mysql> GRANT ALL PRIVILEGES ON sena.* TO sena@`%`;
mysql> FLUSH PRIVILEGES;
- Edit
Constants.py
. AlterDATASET_ROOT_DIR
,DATASET_SERVER_IP
,OPENFACE_FEATURE_PATH
,MM_CODES_PATH
,MODEL_TMP_SAVE
,AL_CODES_PATH
andLIVE_TMP_PATH
to fit your settings. - Edit
config.sh
. Look forDATABASE_URL
and change it to fit your database settings.
- Download datasets and locate them under
DATASET_ROOT_DIR
specified inconstants.py
- Add information in
DATASET_ROOT_DIR/config.json
file to register the new dataset. - Format datasets with
MM-Codes/data/DataPre.py
- For datasets that needs labeling, the config file locates in
AL-Codes
directory.
$ python MM-Codes/data/DataPre.py --working_dir $PATH_TO_DATASET --openface2Path $PATH_TO_OPENFACE2_FeatureExtraction_TOOL --language cn/en
- The structure of the
DATASET_ROOT_DIR
directory is introduced in the next section.
$ source config.sh
$ flask run --host=0.0.0.0
The structure of the root dataset directory should look like this:
.
├── config.json
├── MOSEI
│ ├── label.csv
│ ├── Processed
│ └── Raw
├── MOSI
│ ├── label.csv
│ ├── Processed
│ └── Raw
└── SIMS
├── label.csv
├── Processed
└── Raw
config.json
: stating necessary information for all datasets. For example,language
,label_path
,features
, etc. It only works when scanning and updating datasets.**/label.csv
: storing detailed information for each video clip in**
dataset, includingvideo_id
,clip_id
,normal text
,label value (Float)
,annotation (String)
,mode (training attributes)
. Besides, we define a fieldlabel_by
to indicate the label type, which is necessary for labeling based on active learning.
**/Processed
: placing feature files. We usepickle
to store processed features, which are organized as the following structure. These files are used inMM-Codes
.
{
"train": {
"raw_text": [],
"audio": [],
"vision": [],
"id": [], # [video_id$_$clip_id, ..., ...]
"text": [],
"text_bert": [],
"audio_lengths": [],
"vision_lengths": [],
"annotations": [],
"classification_labels": [], # Negative(< 0), Neutral(0), Positive(> 0)
"regression_labels": []
},
"valid": {***}, # same as the "train"
"test": {***}, # same as the "train"
}
**/Raw
: placing raw videos. The path of each clip should be consistent withlabel.csv
.
We provide the download link for preprocessed SIMS, code: 4aa6
, md5: 3befed5d2f6ea63a8402f5875ecb220d
, which follows the above requirements. You can get more datasets from CMU-MultimodalSDK.
The source code is organized as follows:
.
├── AL-Codes # Active learning codes
├── MM-Codes # MSA algorithm codes
├── app.py # Flask main codes
├── config.py # Basic config
├── config.sh # Basic config
├── constants.py # Global variable definition
├── database.py # Database definition & initialization
├── httpServer.py # Dataset server (for video previews)
└── requirements.txt # Python requirements
- MM-Codes
MSA Code Framework
Based on MMSA, all model and dataset parameters are saved in MM-Codes/config.json
.
- AL-Codes
Labeling based on Active Learning Code Framework
Based on MMSA, all model and dataset parameters are saved in AL-Codes/config.json
.