Skip to content

Latest commit



156 lines (130 loc) · 5.66 KB

File metadata and controls

156 lines (130 loc) · 5.66 KB

Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection/Object Detection

by Xubin Zhong, Changxing Ding, Zijian Li and Shaoli Huang.

This repository contains the official implementation of the paper "Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection", which is accepted to ECCV2022.

To the best of our knowledge, HQM is the first approach that promotes the robustness of DETR-based models from the perspective of hard example mining. Moreover, HQM is plug-and-play and can be readily applied to many DETR-based HOI detection methods.

New performance on CDN !!!

An efficient code implemenation of GBS on CDN is available /code_path/CDN/exp/ Adding GBS, CDN-S can achieve 32.29 mAP within 60 epochs.



Our implementation uses external libraries such as NumPy, PyTorch and 8 2080Ti GPUs.You can resolve the dependencies with the following command.

pip install numpy
pip install -r requirements.txt



HICO-DET dataset can be downloaded here. After finishing downloading, unpack the tarball (hico_20160224_det.tar.gz) to the data directory.

Instead of using the original annotations files, we use the annotation files provided by the PPDM authors. The annotation files can be downloaded from here. The downloaded annotation files have to be placed as follows.

 |─ data
 │   └─ hico_20160224_det
 |       |─ annotations
 |       |   |─ trainval_hico.json
 |       |   |─ test_hico.json
 |       |   └─ corre_hico.npy
 :       :

 |─ params
 │   └─ detr-r50-pre.pth

Pre-trained parameters

The annotations file and pre-trained weights can be downloaded here.


python -m torch.distributed.launch \
    --nproc_per_node=8  \
    --use_env \ \
    --hoi \
    --dataset_file hico_gt \
    --model_name HQM \
    --hoi_path data/hico_20160224_det/ \
    --num_obj_classes 80 \
    --num_verb_classes 117 \
    --backbone resnet50 \
    --set_cost_bbox 2.5 \
    --set_cost_giou 1 \
    --bbox_loss_coef 2.5 \
    --giou_loss_coef 1 \
    --find_unused_parameters \


You can conduct the evaluation with trained parameters as follows. The trained parameters are available here.

python -m torch.distributed.launch \
    --nproc_per_node=8  \
    --use_env \ \
    --hoi \
    --dataset_file hico_gt \
    --model_name HQM \
    --hoi_path data/hico_20160224_det/ \
    --num_obj_classes 80 \
    --num_verb_classes 117 \
    --backbone resnet50 \
    --set_cost_bbox 2.5 \
    --set_cost_giou 1 \
    --bbox_loss_coef 2.5 \
    --giou_loss_coef 1 \
    --find_unused_parameters \
    --AJL \
    --eval \
    --resume params/checkpoint_best.pth

The results are like below:

"test_mAP": 0.313470564574163, "test_mAP rare": 0.26546478777620686, "test_mAP non-rare": 0.32780995244887723

test_mAP, test_mAP rare, and test_mAP non-rare are the results of the default full, rare, and non-rare setting, respectively.


HOI Detection HICO-DET.

Full (D) Rare (D) Non-rare (D) Full(KO) Rare (KO) Non-rare (KO)
HOTR + HQM (ResNet50) 25.69 24.70 25.98 28.24 27.35 28.51
QPIC + HQM (ResNet50) 31.34 26.54 32.78 34.09 29.63 35.42
CDN-S + HQM (ResNet50) 32.47 28.15 33.76 35.17 30.73 36.50

D: Default, KO: Known object

HOI Detection V-COCO.

Scenario 1
ours (ResNet50) 63.6

Object Detection COCO.

AP AP_0.5 AP_0.75 AP_S AP_M AP_L
SMCA 35.08 56.47 35.91 15.14 38.01 54.51
SMCA + HQM 36.48 57.02 38.19 16.48 40.62 54.91


Please consider citing our papers if it helps your research.

author = {Zhong, Xubin and Ding, Changxing and Li, Zijian and Huang, Shaoli},
title = {Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection},
year = {2022},

    author    = {Qu, Xian and Ding, Changxing and Li, Xingao and Zhong, Xubin and Tao, Dacheng},
    title     = {Distillation Using Oracle Queries for Transformer-Based Human-Object Interaction Detection},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {19558-19567}

  title={Accelerating DETR convergence via semantic-aligned matching},
  author={Zhang, Gongjie and Luo, Zhipeng and Yu, Yingchen and Cui, Kaiwen and Lu, Shijian},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
