The ENIGMA-51 Dataset

This is the official github repository related to the ENIGMA-51 Dataset.

ENIGMA-51 is a new egocentric dataset acquired in an industrial scenario by 19 subjects who followed instructions to complete the repair of electrical boards using industrial tools (e.g., electric screwdriver) and equipments (e.g., oscilloscope). ENIGMA-51 has been annotated with a rich set of annotations which allows to study large variety of tasks, especially tasks related to human-object interactions.

You can download the ENIGMA-51 dataset and its annotations from the project web page.

Citing the ENIGMA-51 Dataset

If you find our work useful in your research, please use the following BibTeX entry for citation.

@inproceedings{ragusa2024enigma,
  title={ENIGMA-51: Towards a Fine-Grained Understanding of Human Behavior in Industrial Scenarios},
  author={Ragusa, Francesco and Leonardi, Rosario and Mazzamuto, Michele and Bonanno, Claudia and Scavo, Rosario and Furnari, Antonino and Farinella, Giovanni Maria},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={4549--4559},
  year={2024}
}

Model Zoo and Baselines:

Untrimmed Temporal Detection of Human-Object Interactions
Egocentric Human-Object Interaction Detection
Short-Term Object Interaction Anticipation
NLU of Intents and Entities

Untrimmed Temporal Detection of Human-Object Interactions

Introduction

The instructions below will guide you on replicating the baseline for the Untrimmed Temporal Detection of Human-Object Interactions task or training your own model. The baseline is based on ActionFormer, refer to the official repository for more details.

Download Features, Annotations, and other needed files

Download enigma_UAD.tar.gz from this link.
The file includes features, action annotations in JSON format, the custom dataset file (.py), and 3 different config files for each task variant (ht_hr, fc_hd, ht_hr_fc_hd).

Details: The features are extracted from a two-stream network pretrained on ActivityNet. Each video chunk is set to a size of 6, and there is no overlapping between adjacent chunks. With a video frame rate of 30, we get 5 chunks per second. For appearance features, we extract data from the Flatten 673 layer of ResNet-200 from the central frame of each chunk. Motion features are extracted from the global pool layer of BN-Inception from optical flow fields computed from the 6 consecutive frames within each chunk. Motion and appearance features are then concatenated.

Needed steps

Features and annotations should be placed under ./data/enigma
Config files should be placed under ./configs
The custom dataset file should be placed under ./libs/datasets
In the libs/datasets/__init__.py file, include the import of enigma (the dataset name in the custom dataset file set within @register_dataset()).
In the eval.py file, replace all the instances of "val_split" with "test_split".

The folder structure should look like this:

This folder
│   README.md
│   ...  
|
└───configs/
│    └───enigma_ht_hr_fc_hd.json
│    └───enigma_ht_hr.json
│    └───enigma_fc_hd.json
│    └───...
|
└───data/
│    enigma/
│    │   └───annotations
│    │        └───enigma_ht_hr_fc_hd.json
│    │        └───enigma_ht_hr.json
│    │        └───enigma_fc_hd.json
│    │   └───features   
│    └───...
|
└───libs/
|     └───datasets
|     |      └───enigma.py
│     |      └───...
│     └───...
│   ...

Training and Evaluation

Choose the config file for training ActionFormer on ENIGMA-51.
Train the ActionFormer network. This will create an experiment folder under ./ckpt that stores training config, logs, and checkpoints.

python ./train.py ./configs/enigma_ht_hr.yaml --output reproduce

Save the predictions of the trained model by running this script.

python ./eval.py ./configs/enigma_ht_hr.yaml ./ckpt/enigma_ht_hr_reproduce --saveonly

To evaluate the trained model, you should run the mp_mAP.py file, specifying the path to the prediction file, and the path to the testing ground truth file. For more details, please refer to the mp_mAP.py.py file.

Evaluating on Our Pre-trained Model

We also provide the pre-trained models for the 3 different variants of the task (ht_hr, fc_hd, ht_hr_fc_hd). The models with the relative configs can be downloaded from this link. To evaluate the pre-trained model, please follow the steps listed below.

Move the config files to the config folder or specify the right path in the script below.
Create a folder ./pretrained, then a folder for each task variant and move the weight file under them.
The folder structure should look like

This folder
│   README.md
│   ...  
│
└───pretrained/
│    └───enigma/
│    |   └───ht_hr
│    |    └───...    
│    |   └───fc_hd
│    |    └───... 
│    |   └───ht_hr_fc_hd
│    |    └───...
│    |    
│    └───...
|
└───libs
│
│   ...

Save the predictions of the trained model by running this script.

python ./eval.py ./configs/enigma_ht_hr.yaml ./pretrained/enigma/ht_hr --saveonly

To evaluate the trained model, you should run the mp_mAP.py file, specifying the path to the prediction file, and the path to the testing ground truth file. For more details, please refer to the mp_mAP.py.py file.

Egocentric Human-Object Interaction Detection

The instructions below will guide you on replicating the baseline for the Egocentric Human-Object Interaction Detection task. The baseline is based on egoism-hoi, refer to the official repository for more details.

To train the model enter the following command:

python train.py --config PATH_OF_CFG --train_json PATH_OF_EHOI_TRAIN_ANNS --test_json PATH_OF_EHOI_VAL_ANNS --test_dataset_names enigma_val

To test the models run the command below:

python test.py --dataset_json PATH_TEST_ANNS --dataset_images PATH_TEST_IMGS --weights_path WEIGHTS_PATH

We provided the best model trained on the Training Set of the ENIGMA-51 Dataset.

architecture	model	config
egoism-hoi	link	configs/cfg_ehoi.yaml

Short-Term Object Interaction Anticipation

StillFast model

We provided the best model trained on the Training Set of the ENIGMA-51 Dataset.

architecture	model	config
StillFast	link	configs/STA_config.yaml

Please, refer to the official page of StillFast for additional details.

NLU of Intents and Entities

The instructions below will guide you on replicating the baseline for the NLU of Intents and Entities task.

Set up a new Conda environment and a brand new RASA project running the commands below:

conda create -n rasaenv python=3.7
conda activate rasaenv
pip3 install -U --user pip && pip install rasa==3.4
pip3 install -U spacy
python3 -m spacy download en_core_web_sm
rasa init

In your RASA project folder, overwrite the config.yml and domain.yml with our provided configuration and domain files. Inside the data folder, put your training/test data and overwrite the rules.yml with our provided rules file. Put our model inside the models folder.

We provided the best model trained on the Training Set of the ENIGMA-51 utterances annotations.

architecture	model	config
DIETClassifier	link	configs/NLU_config.yml

To train the model enter the following command:

rasa train nlu --nlu data/your_training_data.yml -c config/nlu_config.yml

To test the models run the command below:

rasa test nlu --nlu data/your_test_data.yml -c config/nlu_config.yml

Please, refer to the official page of RASA CLI for additional details.

Visualization Script for Object and Hand Keypoints with Mask Annotations

This script is designed to visualize object and hand keypoints with mask annotations using preprocessed data files. It utilizes various libraries like OpenCV, NumPy, Matplotlib, and Torch to load and display the data. The script assumes that you have already downoloaded the required JSON and npy files, as it loads them to visualize the annotations.

Hands keypoint	Object and hand mask

Prerequisites🛠️

Before running the script, ensure you have the following:

Python 3.x installed. 🐍
Conda package manager installed (https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html).

Setup🔧

Create a new Conda environment and install the required packages:

conda create -n your_env_name python=3.x
conda activate your_env_name
conda install matplotlib numpy torch torchvision opencv

Replace your_env_name with the desired environment name, and replace 3.x with the desired Python version (e.g., 3.7, 3.8, or 3.9).💻

Usage 🚀

Activate the Conda environment:

conda activate your_env_name

Make sure to replace your_env_name with the environment name you created. The script will load the JSON and npy files and visualize the annotations on a sample image.📑🔍

Outputs 🖼️

The script will display a plot containing the following: Blue circles representing hand keypoints on a resized sample image from the dataset. 👉🔵 Colored polygons representing object masks with class-specific colors. 🎨🔴🟢🟡🟣

Note 📝

The script uses random selection to display annotations for a random key from the dataset. If you want to visualize annotations for a specific key, modify the "random_key" variable in the script to the desired key. 🎲 The class_colors dictionary can be modified to map class IDs to your preferred colors. 🎨🔤 Feel free to modify the script as per your requirements, such as customizing colors, filtering keypoints, or adjusting image sizes. Happy visualizing! 🎉🔍

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
Images		Images
UAD/mp_mAP		UAD/mp_mAP
Visualization Script		Visualization Script
configs		configs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The ENIGMA-51 Dataset

Citing the ENIGMA-51 Dataset

Table of Contents

Model Zoo and Baselines:

Untrimmed Temporal Detection of Human-Object Interactions

Introduction

Download Features, Annotations, and other needed files

Training and Evaluation

Evaluating on Our Pre-trained Model

Egocentric Human-Object Interaction Detection

Short-Term Object Interaction Anticipation

StillFast model

NLU of Intents and Entities

Visualization Script for Object and Hand Keypoints with Mask Annotations

Prerequisites🛠️

Setup🔧

Usage 🚀

Outputs 🖼️

Note 📝

About

Releases

Packages

Contributors 5

Languages

fpv-iplab/ENIGMA-51

Folders and files

Latest commit

History

Repository files navigation

The ENIGMA-51 Dataset

Citing the ENIGMA-51 Dataset

Table of Contents

Model Zoo and Baselines:

Untrimmed Temporal Detection of Human-Object Interactions

Introduction

Download Features, Annotations, and other needed files

Training and Evaluation

Evaluating on Our Pre-trained Model

Egocentric Human-Object Interaction Detection

Short-Term Object Interaction Anticipation

StillFast model

NLU of Intents and Entities

Visualization Script for Object and Hand Keypoints with Mask Annotations

Prerequisites🛠️

Setup🔧

Usage 🚀

Outputs 🖼️

Note 📝

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages