Cascaded Iterative Transformer for Jointly Predicting Facial Landmark, Occlusion Probability and Head Pose

Official PyTorch implementation of the IJCV'23 paper

Abstract

Landmark detection under large pose with occlusion has been one of the challenging problems in the field of facial analysis. Recently, many works have predicted pose or occlusion together in the multi-task learning (MTL) paradigm, trying to tap into their dependencies and thus alleviate this issue. However, such implicit dependencies are weakly interpretable and inconsistent with the way humans exploit inter-task coupling relations, i.e., accommodating the induced explicit effects. This is one of the essentials that hinders their performance. To this end, in this paper, we propose a Cascaded Iterative Transformer (CIT) to jointly predict facial landmark, occlusion probability, and pose. The proposed CIT, besides implicitly mining task dependencies in a shared encoder, innovatively employs a cost-effective and portability-friendly strategy to pass the decoders’ predictions as prior knowledge to human-like exploit the coupling-induced effects. Moreover, to the best of our knowledge, no dataset contains all these task annotations simultaneously, so we introduce a new dataset termed MERL-RAV-FLOP based on the MERL-RAV dataset. We conduct extensive experiments on several challenging datasets (300W-LP, AFLW2000-3D, BIWI, COFW, and MERL-RAV-FLOP) and achieve remarkable results.

Citation

If you find this work useful, please consider citing:

@article{li2023cascaded,
  title={Cascaded Iterative Transformer for Jointly Predicting Facial Landmark, Occlusion Probability and Head Pose},
  author={Li, Yaokun and Tan, Guang and Gou, Chao},
  journal={International Journal of Computer Vision},
  pages={1--16},
  year={2023},
  publisher={Springer}

In addition, we highly recommend citing the great MERL-RAV, AFLW dataset, and face_alignment, who have been very supportive of our work.

Dataset

Download
Instruction
Contact

Download

We have already prepared our dataset. If you need it, please send an email to liyk58@mail2.sysu.edu.cn to obtain the download link.

introduction

We propose the MERL-RAV-FLOP dataset based on the MERL-RAV dataset. From the perspective of efficiency, we adopt a simple and efficient semi-automated annotation process, i.e., automatic annotation followed by manual annotation, which is as follows:

First, we follow the MERL-RAV instruction to download the AFLW dataset and prepare the original MERL-RAV Dataset. The directory structure of the prepared MERL-RAV dataset is shown below:

|--MERL_RAV_Dataset 
        |--merl_rav_organized
        |        |-- frontal
        |        |      |--testset
        |        |      |      |--image00019.jpg
        |        |      |      |--image00019.pts
        |        |      |      |--...
        |        |      |--trainset
        |        |
        |        |-- left 
        |        |      |--testset
        |        |      |--trainset
        |        |
        |        |-- lefthalf
        |        |      |--testset
        |        |      |--trainset
        |        |
        |        |-- right
        |        |      |--testset
        |        |      |--trainset
        |        |
        |        |-- righthalf
        |               |--testset
        |               |--trainset
        |
        |--merl_rav_labels
        |--aflw
        |--common_functions.py
        |--organize_merl_rav_using_aflw_and_our_labels.py

Next, we use face_alignment and the AFLW dataset to acquire pose and landmarks automatically. The concept behind this is to get the coarse-grained annotations automatically, and then get the fine-grained annotations manually, which saves our time in such a semi-automated way.

Specifically, we match the file_id of the MERL-RAV dataset with the faces in the AFLW dataset to select the correct faces with pose, and then use face_alignment to predict the same 68 keypoints as MERL-RAV for the selected samples, which is done to coarse-grain fill in the missing landmarks in MERL-RAV dataset. This process is detailed in make_files.py (see the downloaded file) and is achieved by executing the following command. Note that before executing this command, you need to put the three pre-processed files Faces, FacePose, and FaceRect about the AFLW dataset into the aflw folder (again, these files are in the downloaded file).

python make_files.py

After above, the merl_rav_organized folder will be in the following form. The train_files.txt and test_files.txt are the list of training and testing files for the MERL-RAV-FLOP dataset, and the extra xxx.npy files for each sample in each folder are the corresponding annotation files for the samples (xxx.pts files are no longer used).

|--merl_rav_organized
         |-- frontal
         |      |--testset
         |      |      |--image00019.jpg
         |      |      |--image00019.npy
         |      |      |--image00019.pts
         |      |      |--...
         |      |--trainset
         |
         |-- left 
         |-- lefthalf
         |-- right
         |-- righthalf
         |-- train_files.txt
         |-- test_files.txt

Finally, after performing the above automated pre-processing, we cleaned the data manually, by checking and correcting the landmark and visibility annotations that are incorrect in the above process. After our manual modifications, the final train_files.txt, test_files.txt, and .npy files are shown in the download link above. The directories are shown below.

|--MERL_RAV_FLOP
|        |-- frontal
|        |      |--testset
|        |      |      |--image00019.jpg
|        |      |      |--image00019.npy
|        |      |      |--...
|        |      |--trainset
|        |
|        |-- left 
|        |-- lefthalf
|        |-- right
|        |-- righthalf
|        |-- train_files.txt
|        |-- test_files.txt
|--make_files.py        
|--FacePose 
|--FaceRect 
|--Faces

Each .npy file is an array with size of (209,). Taking image00019.npy as an illustration, the meaning represented by each dimension of this array is shown below:

import numpy as np
annotation = np.load("image00019.npy")
x_coordinates = annotation[:68]
y_coordinates = annotation[68:136]
bbox_w, bbox_h = annotation[-2], annotation[-1]
pose = annotation[-5:-2]
visibility = annotation[-73:-5]

Contact

Feel free to contact liyk58@mail2.sysu.edu.cn if you have any doubts or questions.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
assets		assets
dataset		dataset
models		models
LICENSE		LICENSE
README.md		README.md
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cascaded Iterative Transformer for Jointly Predicting Facial Landmark, Occlusion Probability and Head Pose

Official PyTorch implementation of the IJCV'23 paper

Abstract

Citation

Dataset

Download

introduction

Contact

About

Releases

Packages

Languages

License

Iron-LYK/CIT

Folders and files

Latest commit

History

Repository files navigation

Cascaded Iterative Transformer for Jointly Predicting Facial Landmark, Occlusion Probability and Head Pose

Official PyTorch implementation of the IJCV'23 paper

Abstract

Citation

Dataset

Download

introduction

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages