Skip to content

fpv-iplab/ENIGMA-51

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The ENIGMA-51 Dataset

This is the official github repository related to the ENIGMA-51 Dataset.

ENIGMA-51 is a new egocentric dataset acquired in an industrial scenario by 19 subjects who followed instructions to complete the repair of electrical boards using industrial tools (e.g., electric screwdriver) and equipments (e.g., oscilloscope). ENIGMA-51 has been annotated with a rich set of annotations which allows to study large variety of tasks, especially tasks related to human-object interactions.

You can download the ENIGMA-51 dataset and its annotations from the project web page.

Citing the ENIGMA-51 Dataset

If you find our work useful in your research, please use the following BibTeX entry for citation.

@inproceedings{ragusa2024enigma,
  title={ENIGMA-51: Towards a Fine-Grained Understanding of Human Behavior in Industrial Scenarios},
  author={Ragusa, Francesco and Leonardi, Rosario and Mazzamuto, Michele and Bonanno, Claudia and Scavo, Rosario and Furnari, Antonino and Farinella, Giovanni Maria},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={4549--4559},
  year={2024}
}

Table of Contents

Model Zoo and Baselines:

Untrimmed Temporal Detection of Human-Object Interactions

Introduction

The instructions below will guide you on replicating the baseline for the Untrimmed Temporal Detection of Human-Object Interactions task or training your own model. The baseline is based on ActionFormer, refer to the official repository for more details.

Download Features, Annotations, and other needed files

  • Download enigma_UAD.tar.gz from this link.
  • The file includes features, action annotations in JSON format, the custom dataset file (.py), and 3 different config files for each task variant (ht_hr, fc_hd, ht_hr_fc_hd).

Details: The features are extracted from a two-stream network pretrained on ActivityNet. Each video chunk is set to a size of 6, and there is no overlapping between adjacent chunks. With a video frame rate of 30, we get 5 chunks per second. For appearance features, we extract data from the Flatten 673 layer of ResNet-200 from the central frame of each chunk. Motion features are extracted from the global pool layer of BN-Inception from optical flow fields computed from the 6 consecutive frames within each chunk. Motion and appearance features are then concatenated.

Needed steps

  • Features and annotations should be placed under ./data/enigma
  • Config files should be placed under ./configs
  • The custom dataset file should be placed under ./libs/datasets
  • In the libs/datasets/__init__.py file, include the import of enigma (the dataset name in the custom dataset file set within @register_dataset()).
  • In the eval.py file, replace all the instances of "val_split" with "test_split".

The folder structure should look like this:

This folder
│   README.md
│   ...  
|
└───configs/
│    └───enigma_ht_hr_fc_hd.json
│    └───enigma_ht_hr.json
│    └───enigma_fc_hd.json
│    └───...
|
└───data/
│    enigma/
│    │   └───annotations
│    │        └───enigma_ht_hr_fc_hd.json
│    │        └───enigma_ht_hr.json
│    │        └───enigma_fc_hd.json
│    │   └───features   
│    └───...
|
└───libs/
|     └───datasets
|     |      └───enigma.py
│     |      └───...
│     └───...
│   ...

Training and Evaluation

  • Choose the config file for training ActionFormer on ENIGMA-51.
  • Train the ActionFormer network. This will create an experiment folder under ./ckpt that stores training config, logs, and checkpoints.
python ./train.py ./configs/enigma_ht_hr.yaml --output reproduce
  • Save the predictions of the trained model by running this script.
python ./eval.py ./configs/enigma_ht_hr.yaml ./ckpt/enigma_ht_hr_reproduce --saveonly
  • To evaluate the trained model, you should run the mp_mAP.py file, specifying the path to the prediction file, and the path to the testing ground truth file. For more details, please refer to the mp_mAP.py.py file.

Evaluating on Our Pre-trained Model

We also provide the pre-trained models for the 3 different variants of the task (ht_hr, fc_hd, ht_hr_fc_hd). The models with the relative configs can be downloaded from this link. To evaluate the pre-trained model, please follow the steps listed below.

  • Move the config files to the config folder or specify the right path in the script below.
  • Create a folder ./pretrained, then a folder for each task variant and move the weight file under them.
  • The folder structure should look like
This folder
│   README.md
│   ...  
│
└───pretrained/
│    └───enigma/
│    |   └───ht_hr
│    |    └───...    
│    |   └───fc_hd
│    |    └───... 
│    |   └───ht_hr_fc_hd
│    |    └───...
│    |    
│    └───...
|
└───libs
│
│   ...
  • Save the predictions of the trained model by running this script.
python ./eval.py ./configs/enigma_ht_hr.yaml ./pretrained/enigma/ht_hr --saveonly
  • To evaluate the trained model, you should run the mp_mAP.py file, specifying the path to the prediction file, and the path to the testing ground truth file. For more details, please refer to the mp_mAP.py.py file.

Egocentric Human-Object Interaction Detection

The instructions below will guide you on replicating the baseline for the Egocentric Human-Object Interaction Detection task. The baseline is based on egoism-hoi, refer to the official repository for more details.

To train the model enter the following command:

python train.py --config PATH_OF_CFG --train_json PATH_OF_EHOI_TRAIN_ANNS --test_json PATH_OF_EHOI_VAL_ANNS --test_dataset_names enigma_val

To test the models run the command below:

python test.py --dataset_json PATH_TEST_ANNS --dataset_images PATH_TEST_IMGS --weights_path WEIGHTS_PATH

We provided the best model trained on the Training Set of the ENIGMA-51 Dataset.

architecture model config
egoism-hoi link configs/cfg_ehoi.yaml

Short-Term Object Interaction Anticipation

StillFast model

We provided the best model trained on the Training Set of the ENIGMA-51 Dataset.

architecture model config
StillFast link configs/STA_config.yaml

Please, refer to the official page of StillFast for additional details.

NLU of Intents and Entities

The instructions below will guide you on replicating the baseline for the NLU of Intents and Entities task.

Set up a new Conda environment and a brand new RASA project running the commands below:

conda create -n rasaenv python=3.7
conda activate rasaenv
pip3 install -U --user pip && pip install rasa==3.4
pip3 install -U spacy
python3 -m spacy download en_core_web_sm
rasa init

In your RASA project folder, overwrite the config.yml and domain.yml with our provided configuration and domain files. Inside the data folder, put your training/test data and overwrite the rules.yml with our provided rules file. Put our model inside the models folder.

We provided the best model trained on the Training Set of the ENIGMA-51 utterances annotations.

architecture model config
DIETClassifier link configs/NLU_config.yml

To train the model enter the following command:

rasa train nlu --nlu data/your_training_data.yml -c config/nlu_config.yml

To test the models run the command below:

rasa test nlu --nlu data/your_test_data.yml -c config/nlu_config.yml

Please, refer to the official page of RASA CLI for additional details.

Visualization Script for Object and Hand Keypoints with Mask Annotations

This script is designed to visualize object and hand keypoints with mask annotations using preprocessed data files. It utilizes various libraries like OpenCV, NumPy, Matplotlib, and Torch to load and display the data. The script assumes that you have already downoloaded the required JSON and npy files, as it loads them to visualize the annotations.

Hands keypoint Object and hand mask

Prerequisites🛠️

Before running the script, ensure you have the following:

Setup🔧

  1. Create a new Conda environment and install the required packages:
conda create -n your_env_name python=3.x
conda activate your_env_name
conda install matplotlib numpy torch torchvision opencv

Replace your_env_name with the desired environment name, and replace 3.x with the desired Python version (e.g., 3.7, 3.8, or 3.9).💻

Usage 🚀

Activate the Conda environment:

conda activate your_env_name

Make sure to replace your_env_name with the environment name you created. The script will load the JSON and npy files and visualize the annotations on a sample image.📑🔍

Outputs 🖼️

The script will display a plot containing the following: Blue circles representing hand keypoints on a resized sample image from the dataset. 👉🔵 Colored polygons representing object masks with class-specific colors. 🎨🔴🟢🟡🟣

Note 📝

The script uses random selection to display annotations for a random key from the dataset. If you want to visualize annotations for a specific key, modify the "random_key" variable in the script to the desired key. 🎲 The class_colors dictionary can be modified to map class IDs to your preferred colors. 🎨🔤 Feel free to modify the script as per your requirements, such as customizing colors, filtering keypoints, or adjusting image sizes. Happy visualizing! 🎉🔍

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published