Implement AutoAugment for Detection #6224

lezwon · 2022-06-30T13:33:07Z

🚀 The feature

Implement Learning Data Augmentation Strategies for Object Detection
Refers to: #3817

Motivation, pitch

Good to have augmentation in Torchvision

Alternatives

No response

Additional context

No response

cc @vfdev-5 @datumbox

lezwon · 2022-06-30T13:35:55Z

@datumbox @vfdev-5 Based on our last conversation I suppose it would be best to start with the bbox_only_* augmentations for AA right? I could get started with implementing these in torchvision.prototype. Do let me know if that's alright.

Rotate_Only_BBoxes
ShearX_Only_BBoxes
ShearY_Only_BBoxes
TranslateX_Only_BBoxes
TranslateY_Only_BBoxes
Flip_Only_BBoxes
Solarize_Only_BBoxes
Equalize_Only_BBoxes
Cutout_Only_BBoxes

datumbox · 2022-06-30T15:01:53Z

@lezwon thanks a lot! It's worth coordinating with @vfdev-5 because he is literally landing changes at the same API.

Victor, how do you propose to do this? Would you prefer to do an implementation on References (similar to what we did with CopyPaste) and move it later on the new API?

vfdev-5 · 2022-06-30T15:15:00Z

@datumbox @lezwon if we see AA detection code similar to classification one <=> single transformation class calling functional ops inside, then <op>_Only_BBoxes could be implemented as a special function that produces a list of image crops and we can apply <op>_image_tensor to them. I think we can go directly with new API.

By the way, what does Cutout_Only_BBoxes ? Erase data from target bbox ?

lezwon · 2022-06-30T15:29:23Z

@vfdev-5 Yep. Cutout does refer to erasing patches of data from the target box.

ain-soph · 2022-06-30T23:37:20Z

@vfdev-5
Here is a sample Cutout Implementation for image classification.
https://github.com/ain-soph/trojanzoo/blob/85730f629eddc17dcbc48218274b74c31daf5f99/trojanvision/utils/transform.py#L198-L226

torchvision doesn’t implement this because there is already a nearly equivalent RandomErasing. The only difference is for the boundary cases. And there is randomness of cutout area in RandomErasing. The class arguments are quite different as well.

lezwon · 2022-08-03T08:21:23Z

@vfdev-5 I'm sorry for the delay. I have been a bit occupied with other tasks and haven't been able to give any time to the implementation. I don't think I can pick this issue up anytime soon either. It would be best to assign someone else to it.

ain-soph · 2022-08-11T18:32:15Z

Any help needed? I’m interested to implement this.

datumbox · 2022-08-11T19:40:34Z

@ain-soph I think it would be awesome if you implement it.

Just checking again with @vfdev-5 and @pmeier that they don't have any concern.

vfdev-5 · 2022-08-11T20:02:20Z

No problems from my side

ain-soph · 2022-08-15T18:49:36Z

@vfdev-5 @datumbox Anyone gives me some guidance to start?

Should I refer to tensorflow implementation?

I see previously @lezwon claimed that the plan is to implement them in torchvision.prototype with:

Shall I follow this idea?

vfdev-5 · 2022-08-15T19:51:35Z

@ain-soph please check AA classes for classification task implemented with prototype API:
https://github.com/pytorch/vision/blob/main/torchvision/prototype/transforms/_auto_augment.py
The idea is to create something similar for detection. As most of prototype functional ops already support bboxes, you can easily write basic ops. As Only_BBoxes ops please check my previous comment: #6224 (comment)

ain-soph · 2022-08-25T18:26:10Z

@vfdev-5
Sorry for asking some beginner's questions about object detection... I wonder whether we shall make the ops processing both image and bbox coordinates like what we implement in reference training script

vision/references/detection/transforms.py

Lines 30 to 45 in becaba0

    
           class RandomHorizontalFlip(T.RandomHorizontalFlip): 
        
               def forward( 
        
                   self, image: Tensor, target: Optional[Dict[str, Tensor]] = None 
        
               ) -> Tuple[Tensor, Optional[Dict[str, Tensor]]]: 
        
                   if torch.rand(1) < self.p: 
        
                       image = F.hflip(image) 
        
                       if target is not None: 
        
                           _, _, width = F.get_dimensions(image) 
        
                           target["boxes"][:, [0, 2]] = width - target["boxes"][:, [2, 0]] 
        
                           if "masks" in target: 
        
                               target["masks"] = target["masks"].flip(-1) 
        
                           if "keypoints" in target: 
        
                               keypoints = target["keypoints"] 
        
                               keypoints = _flip_coco_person_keypoints(keypoints, width) 
        
                               target["keypoints"] = keypoints 
        
                   return image, target

I see tensorflow is doing so in https://github.com/tensorflow/tpu/blob/c75705856290a4119d609110956442449d73e0a5/models/official/detection/utils/autoaugment_utils.py#L1030-L1062

datumbox · 2022-08-26T08:45:16Z

@ain-soph that's exactly what we need to do. We should have already all the ops excluding perhaps the *_Only_BBoxes. Here are examples of the codebase for these implementations:

vision/torchvision/prototype/transforms/functional/_geometry.py

Lines 259 to 268 in becaba0

    
           def _affine_bounding_box_xyxy( 
        
               bounding_box: torch.Tensor, 
        
               image_size: Tuple[int, int], 
        
               angle: float, 
        
               translate: Optional[List[float]] = None, 
        
               scale: Optional[float] = None, 
        
               shear: Optional[List[float]] = None, 
        
               center: Optional[List[float]] = None, 
        
               expand: bool = False, 
        
           ) -> torch.Tensor:

ain-soph · 2022-09-02T06:29:26Z

I've got a helper function to deal with this, so that all *_Only_Bboxes transforms could just call this helper function and pass the corresponding transform function. Any comment?

Or is there any way to avoid the for loop? Will vmap help?

def _transform_only_bboxes(
    img: torch.Tensor,
    bounding_box: torch.Tensor,
    format: features.BoundingBoxFormat,
    transform: Callable[..., torch.Tensor],
    **kwargs,
) -> torch.Tensor:
    bounding_box = convert_bounding_box_format(
        bounding_box, old_format=format, new_format=features.BoundingBoxFormat.XYXY
    ).view(-1, 4)

    new_img = img.clone()
    for bbox in bounding_box:
        bbox_crop_img = new_img[..., bbox[0]:bbox[2], bbox[1]:bbox[3]]
        bbox_crop_img.fill_(transform(bbox_crop_img, **kwargs))
    return new_img


def horizontal_flip_only_bboxes(
    img: torch.Tensor,
    bounding_box: torch.Tensor,
    format: features.BoundingBoxFormat,
) -> torch.Tensor:
    return _transform_only_bboxes(img, bounding_box, format, transform=horizontal_flip_image_tensor)

datumbox · 2022-09-02T16:54:46Z

@ain-soph So far the kernels process each input independently. Aka they don't receive together img and bounding_box as inputs, but instead usually contain one of them along with configuration that helps the operation. In this specific example you provide, the bbox seems to operate as configuration rather than input. This is an interesting approach. A few other things to keep in mind about the low-level kernels is that they must be JIT-scriptable. I don't think the idiom you propose here is.

@vfdev-5 What are your thoughts on the above? Any alternative ideas on how these kernels should be structured?

vfdev-5 · 2022-09-09T15:40:00Z

As it is about auto augment, we may not need to put such op into low-level ops and just code a transform:

import torch
from torchvision.prototype.transforms import Transform
from torchvision.prototype.transforms.functional import horizontal_flip
from torchvision.prototype.transforms._utils import query_bounding_box
from torchvision.prototype.features import Image, BoundingBox


class AADet(Transform):
    
    def _get_params(self, sample):
        
        bbox = None
        if torch.rand(()) > 0.2:        
            bbox = query_bounding_box(sample)
            bbox = bbox.to_format(format="XYXY")

        return dict(bbox=bbox, op="hflip")
    
    def _transform_image_in_bboxes(self, fn, fn_kwrgs, image, bboxes):
        new_img = img.clone()
        for bbox in bboxes:
            bbox_img = new_img[..., bbox[1]:bbox[3], bbox[0]:bbox[2]]
            out_bbox_img = fn(bbox_img, **fn_kwrgs)
            new_img[..., bbox[1]:bbox[3], bbox[0]:bbox[2]] = out_bbox_img
        return new_img    

    def _transform(self, inpt, params):                
        if isinstance(inpt, Image):
            if params["op"] == "hflip" and params["bbox"] is not None:
                return self._transform_image_in_bboxes(horizontal_flip, {}, inpt, params["bbox"])
        return inpt

Usage:

image_size = (64, 76)

bboxes = [
    [10, 15, 25, 35],
    [50, 5, 70, 22],
    [45, 46, 56, 62],
    [4, 50, 10, 60],    
]
labels = [1, 2, 3, 4]

img = torch.zeros(1, 3, *image_size)
for in_box, label in zip(in_boxes, labels):
    img[..., in_box[1]:in_box[3], in_box[0]:in_box[2]] = \
        (torch.arange(23, 23 + 3 * (in_box[3]-in_box[1]) * (in_box[2]-in_box[0])) % 200).reshape(1, 3, in_box[3]-in_box[1], in_box[2]-in_box[0])

img = Image(img)
bboxes = BoundingBox(bboxes, format="XYXY", image_size=image_size)

sample = [img, bboxes]

t = AADet()
out = t(sample)

import matplotlib.pyplot as plt

plt.figure()
plt.subplot(121)
plt.imshow(img[0, ...].permute(1, 2, 0) / 255.0)
plt.subplot(122)
plt.imshow(out[0][0, ...].permute(1, 2, 0) / 255.0)

datumbox · 2022-09-14T14:45:16Z

@ain-soph I was hoping you could confirm that Victor's reply unblocked you and you are able to continue the feature. Please let me know if there are more outstanding questions. Thank you!

ain-soph · 2022-09-15T06:29:23Z

Yes, I'll open a draft PR this weekend using Victor's format.

Just one question about the hyper-parameters. For example, for the rotate, where shall we set the degree parameter? argument in class __init__ or as forward argument?

vfdev-5 · 2022-09-15T07:15:17Z

For example, for the rotate, where shall we set the degree parameter?

I think it can same as AA for classification: https://github.com/pytorch/vision/blob/main/torchvision/prototype/transforms/_auto_augment.py

datumbox mentioned this issue Jul 4, 2022

[RFC] Batteries Included - Phase 2 #5410

Closed

24 tasks

datumbox mentioned this issue Jul 27, 2022

[RFC] Batteries Included - Phase 3 #6323

Open

16 tasks

datumbox added the module: transforms label Jul 27, 2022

zhiqwang mentioned this issue Aug 1, 2022

[RFC] Support YOLOX detection model #6341

Open

1 task

datumbox added the help wanted label Aug 3, 2022

datumbox assigned ain-soph Aug 12, 2022

ain-soph mentioned this issue Aug 13, 2022

add headings to weights table. #6139

Closed

ain-soph linked a pull request Sep 19, 2022 that will close this issue

Implement AutoAugment for Detection #6609

Open

datumbox linked a pull request Sep 20, 2022 that will close this issue

Implement AutoAugment for Detection #6609

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement AutoAugment for Detection #6224

Implement AutoAugment for Detection #6224

lezwon commented Jun 30, 2022 •

edited by pytorch-bot bot

Loading

lezwon commented Jun 30, 2022

datumbox commented Jun 30, 2022

vfdev-5 commented Jun 30, 2022 •

edited

Loading

lezwon commented Jun 30, 2022

ain-soph commented Jun 30, 2022 •

edited

Loading

lezwon commented Aug 3, 2022

ain-soph commented Aug 11, 2022

datumbox commented Aug 11, 2022

vfdev-5 commented Aug 11, 2022

ain-soph commented Aug 15, 2022 •

edited

Loading

vfdev-5 commented Aug 15, 2022

ain-soph commented Aug 25, 2022 •

edited

Loading

datumbox commented Aug 26, 2022

ain-soph commented Sep 2, 2022 •

edited

Loading

datumbox commented Sep 2, 2022

vfdev-5 commented Sep 9, 2022

datumbox commented Sep 14, 2022

ain-soph commented Sep 15, 2022

vfdev-5 commented Sep 15, 2022

Implement AutoAugment for Detection #6224

Implement AutoAugment for Detection #6224

Comments

lezwon commented Jun 30, 2022 • edited by pytorch-bot bot Loading

🚀 The feature

Motivation, pitch

Alternatives

Additional context

lezwon commented Jun 30, 2022

datumbox commented Jun 30, 2022

vfdev-5 commented Jun 30, 2022 • edited Loading

lezwon commented Jun 30, 2022

ain-soph commented Jun 30, 2022 • edited Loading

lezwon commented Aug 3, 2022

ain-soph commented Aug 11, 2022

datumbox commented Aug 11, 2022

vfdev-5 commented Aug 11, 2022

ain-soph commented Aug 15, 2022 • edited Loading

vfdev-5 commented Aug 15, 2022

ain-soph commented Aug 25, 2022 • edited Loading

datumbox commented Aug 26, 2022

ain-soph commented Sep 2, 2022 • edited Loading

datumbox commented Sep 2, 2022

vfdev-5 commented Sep 9, 2022

datumbox commented Sep 14, 2022

ain-soph commented Sep 15, 2022

vfdev-5 commented Sep 15, 2022

lezwon commented Jun 30, 2022 •

edited by pytorch-bot bot

Loading

vfdev-5 commented Jun 30, 2022 •

edited

Loading

ain-soph commented Jun 30, 2022 •

edited

Loading

ain-soph commented Aug 15, 2022 •

edited

Loading

ain-soph commented Aug 25, 2022 •

edited

Loading

ain-soph commented Sep 2, 2022 •

edited

Loading