Skip to content

Commit

Permalink
Add a new object detector, Lite-DINO (#2457)
Browse files Browse the repository at this point in the history
* Initial implementation of Lite DETR

* Update model config for lite dino

* Add norm to intermediate layer of ffn

* Change FFN's norm order and add enc_scale attribute to encoder's layers

* Merge with incremental recipe

* Add model pretrained weight path

* Update model info and add intg tests

* Update docs

* Update CHANGELOG

* Change num iters
  • Loading branch information
jaegukhyun authored Aug 31, 2023
1 parent fc6386c commit 8045480
Show file tree
Hide file tree
Showing 13 changed files with 648 additions and 9 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ All notable changes to this project will be documented in this file.
- Add ONNX metadata to detection, instance segmantation, and segmentation models (<https://github.com/openvinotoolkit/training_extensions/pull/2418>)
- Add a new feature to configure input size(<https://github.com/openvinotoolkit/training_extensions/pull/2420>)
- Introduce the OTXSampler and AdaptiveRepeatDataHook to achieve faster training at the small data regime (<https://github.com/openvinotoolkit/training_extensions/pull/2428>)
- Add a new object detector Lite-DINO(<https://github.com/openvinotoolkit/training_extensions/pull/2457>)

### Enhancements

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,8 @@ In addition to these models, we supports experimental models for object detectio
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------------------+-----------------+
| `Custom_Object_Detection_Gen3_DINO <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/detection/configs/detection/resnet50_dino/template_experimental.yaml>`_ | DINO | 235 | 182.0 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------------------+-----------------+
| `Custom_Object_Detection_Gen3_Lite_DINO <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/detection/configs/detection/resnet50_litedino/template_experimental.yaml>`_ | Lite-DINO | 140 | 190.0 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------------------+-----------------+
| `Custom_Object_Detection_Gen3_ResNeXt101_ATSS <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/detection/configs/detection/resnext101_atss/template_experimental.yaml>`_ | ResNeXt101-ATSS | 434.75 | 344.0 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------------------+-----------------+
| `Object_Detection_YOLOX_S <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/detection/configs/detection/cspdarknet_yolox_s/template_experimental.yaml>`_ | YOLOX_S | 33.51 | 46.0 |
Expand All @@ -110,6 +112,7 @@ In addition to these models, we supports experimental models for object detectio
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------------------+-----------------+

`Deformable_DETR <https://arxiv.org/abs/2010.04159>`_ is `DETR <https://arxiv.org/abs/2005.12872>`_ based model, and it solves slow convergence problem of DETR. `DINO <https://arxiv.org/abs/2203.03605>`_ improves Deformable DETR based methods via denoising anchor boxes. Current SOTA models for object detection are based on DINO.
`Lite-DINO <https://arxiv.org/abs/2303.07335>`_ is efficient structure for DINO. It reduces FLOPS of transformer's encoder which takes the highest computational costs.
Although transformer based models show notable performance on various object detection benchmark, CNN based model still show good performance with proper latency.
Therefore, we added a new experimental CNN based method, ResNeXt101-ATSS. ATSS still shows good performance among `RetinaNet <https://arxiv.org/abs/1708.02002>`_ based models. We integrated large ResNeXt101 backbone to our Custom ATSS head, and it shows good transfer learning performance.
In addition, we added a YOLOX variants to support users' diverse situations.
Expand Down Expand Up @@ -154,6 +157,8 @@ We trained each model with a single Nvidia GeForce RTX3090.
+----------------------------+------------------+-----------+-----------+-----------+-----------+--------------+
| ResNet50-DINO | 49.0 (66.4) | 47.2 | 99.5 | 62.9 | 93.5 | 99.1 |
+----------------------------+------------------+-----------+-----------+-----------+-----------+--------------+
| ResNet50-Lite-DINO | 48.1 (64.4) | 47.0 | 99.0 | 62.5 | 93.6 | 99.4 |
+----------------------------+------------------+-----------+-----------+-----------+-----------+--------------+
| YOLOX_S | 40.3 (59.1) | 37.1 | 93.6 | 54.8 | 92.7 | 98.8 |
+----------------------------+------------------+-----------+-----------+-----------+-----------+--------------+
| YOLOX_L | 49.4 (67.1) | 44.5 | 94.6 | 55.8 | 91.8 | 99.0 |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ def _custom_grid_sample(im: torch.Tensor, grid: torch.Tensor, align_corners: boo
Returns:
torch.Tensor: A tensor with sampled points, shape (N, C, Hg, Wg)
"""
device = im.device
n, c, h, w = im.shape
gn, gh, gw, _ = grid.shape
assert n == gn
Expand Down Expand Up @@ -113,14 +114,14 @@ def _custom_grid_sample(im: torch.Tensor, grid: torch.Tensor, align_corners: boo
x0, x1, y0, y1 = x0 + 1, x1 + 1, y0 + 1, y1 + 1

# Clip coordinates to padded image size
x0 = torch.where(x0 < 0, torch.tensor(0), x0)
x0 = torch.where(x0 > padded_w - 1, torch.tensor(padded_w - 1), x0)
x1 = torch.where(x1 < 0, torch.tensor(0), x1)
x1 = torch.where(x1 > padded_w - 1, torch.tensor(padded_w - 1), x1)
y0 = torch.where(y0 < 0, torch.tensor(0), y0)
y0 = torch.where(y0 > padded_h - 1, torch.tensor(padded_h - 1), y0)
y1 = torch.where(y1 < 0, torch.tensor(0), y1)
y1 = torch.where(y1 > padded_h - 1, torch.tensor(padded_h - 1), y1)
x0 = torch.where(x0 < 0, torch.tensor(0).to(device), x0)
x0 = torch.where(x0 > padded_w - 1, torch.tensor(padded_w - 1).to(device), x0)
x1 = torch.where(x1 < 0, torch.tensor(0).to(device), x1)
x1 = torch.where(x1 > padded_w - 1, torch.tensor(padded_w - 1).to(device), x1)
y0 = torch.where(y0 < 0, torch.tensor(0).to(device), y0)
y0 = torch.where(y0 > padded_h - 1, torch.tensor(padded_h - 1).to(device), y0)
y1 = torch.where(y1 < 0, torch.tensor(0).to(device), y1)
y1 = torch.where(y1 > padded_h - 1, torch.tensor(padded_h - 1).to(device), y1)

im_padded = im_padded.view(n, c, -1)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from .custom_atss_detector import CustomATSS
from .custom_deformable_detr_detector import CustomDeformableDETR
from .custom_dino_detector import CustomDINO
from .custom_lite_dino import CustomLiteDINO
from .custom_maskrcnn_detector import CustomMaskRCNN
from .custom_maskrcnn_tile_optimized import CustomMaskRCNNTileOptimized
from .custom_single_stage_detector import CustomSingleStageDetector
Expand All @@ -19,6 +20,7 @@
__all__ = [
"CustomATSS",
"CustomDeformableDETR",
"CustomLiteDINO",
"CustomDINO",
"CustomMaskRCNN",
"CustomSingleStageDetector",
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
"""OTX Lite-DINO Class for object detection."""

# Copyright (C) 2023 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
#

from mmdet.models.builder import DETECTORS

from otx.algorithms.common.utils.logger import get_logger
from otx.algorithms.detection.adapters.mmdet.models.detectors import CustomDINO

logger = get_logger()


@DETECTORS.register_module()
class CustomLiteDINO(CustomDINO):
"""Custom Lite-DINO <https://arxiv.org/pdf/2303.07335.pdf> for object detection."""

def load_state_dict_pre_hook(self, model_classes, ckpt_classes, ckpt_dict, *args, **kwargs):
"""Modify official lite dino version's weights before weight loading."""
super(CustomDINO, self).load_state_dict_pre_hook(model_classes, ckpt_classes, ckpt_dict, *args, *kwargs)
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,13 @@

from .dino import CustomDINOTransformer
from .dino_layers import CdnQueryGenerator, DINOTransformerDecoder
from .lite_detr_layers import EfficientTransformerEncoder, EfficientTransformerLayer, SmallExpandFFN

__all__ = ["CustomDINOTransformer", "DINOTransformerDecoder", "CdnQueryGenerator"]
__all__ = [
"CustomDINOTransformer",
"DINOTransformerDecoder",
"CdnQueryGenerator",
"EfficientTransformerEncoder",
"EfficientTransformerLayer",
"SmallExpandFFN",
]
Loading

0 comments on commit 8045480

Please sign in to comment.