-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement AutoAugment for Detection #6224
Comments
@datumbox @vfdev-5 Based on our last conversation I suppose it would be best to start with the
|
@datumbox @lezwon if we see AA detection code similar to classification one <=> single transformation class calling functional ops inside, then By the way, what does Cutout_Only_BBoxes ? Erase data from target bbox ? |
@vfdev-5 Yep. Cutout does refer to erasing patches of data from the target box. |
@vfdev-5 torchvision doesn’t implement this because there is already a nearly equivalent RandomErasing. The only difference is for the boundary cases. And there is randomness of cutout area in RandomErasing. The class arguments are quite different as well. |
@vfdev-5 I'm sorry for the delay. I have been a bit occupied with other tasks and haven't been able to give any time to the implementation. I don't think I can pick this issue up anytime soon either. It would be best to assign someone else to it. |
Any help needed? I’m interested to implement this. |
No problems from my side |
@vfdev-5 @datumbox Anyone gives me some guidance to start? Should I refer to tensorflow implementation? I see previously @lezwon claimed that the plan is to implement them in
Shall I follow this idea? |
@ain-soph please check AA classes for classification task implemented with prototype API: |
@vfdev-5 vision/references/detection/transforms.py Lines 30 to 45 in becaba0
I see tensorflow is doing so in https://github.com/tensorflow/tpu/blob/c75705856290a4119d609110956442449d73e0a5/models/official/detection/utils/autoaugment_utils.py#L1030-L1062 |
@ain-soph that's exactly what we need to do. We should have already all the ops excluding perhaps the vision/torchvision/prototype/transforms/functional/_geometry.py Lines 259 to 268 in becaba0
|
I've got a helper function to deal with this, so that all Or is there any way to avoid the for loop? Will def _transform_only_bboxes(
img: torch.Tensor,
bounding_box: torch.Tensor,
format: features.BoundingBoxFormat,
transform: Callable[..., torch.Tensor],
**kwargs,
) -> torch.Tensor:
bounding_box = convert_bounding_box_format(
bounding_box, old_format=format, new_format=features.BoundingBoxFormat.XYXY
).view(-1, 4)
new_img = img.clone()
for bbox in bounding_box:
bbox_crop_img = new_img[..., bbox[0]:bbox[2], bbox[1]:bbox[3]]
bbox_crop_img.fill_(transform(bbox_crop_img, **kwargs))
return new_img
def horizontal_flip_only_bboxes(
img: torch.Tensor,
bounding_box: torch.Tensor,
format: features.BoundingBoxFormat,
) -> torch.Tensor:
return _transform_only_bboxes(img, bounding_box, format, transform=horizontal_flip_image_tensor) |
@ain-soph So far the kernels process each input independently. Aka they don't receive together @vfdev-5 What are your thoughts on the above? Any alternative ideas on how these kernels should be structured? |
As it is about auto augment, we may not need to put such op into low-level ops and just code a transform: import torch
from torchvision.prototype.transforms import Transform
from torchvision.prototype.transforms.functional import horizontal_flip
from torchvision.prototype.transforms._utils import query_bounding_box
from torchvision.prototype.features import Image, BoundingBox
class AADet(Transform):
def _get_params(self, sample):
bbox = None
if torch.rand(()) > 0.2:
bbox = query_bounding_box(sample)
bbox = bbox.to_format(format="XYXY")
return dict(bbox=bbox, op="hflip")
def _transform_image_in_bboxes(self, fn, fn_kwrgs, image, bboxes):
new_img = img.clone()
for bbox in bboxes:
bbox_img = new_img[..., bbox[1]:bbox[3], bbox[0]:bbox[2]]
out_bbox_img = fn(bbox_img, **fn_kwrgs)
new_img[..., bbox[1]:bbox[3], bbox[0]:bbox[2]] = out_bbox_img
return new_img
def _transform(self, inpt, params):
if isinstance(inpt, Image):
if params["op"] == "hflip" and params["bbox"] is not None:
return self._transform_image_in_bboxes(horizontal_flip, {}, inpt, params["bbox"])
return inpt Usage: image_size = (64, 76)
bboxes = [
[10, 15, 25, 35],
[50, 5, 70, 22],
[45, 46, 56, 62],
[4, 50, 10, 60],
]
labels = [1, 2, 3, 4]
img = torch.zeros(1, 3, *image_size)
for in_box, label in zip(in_boxes, labels):
img[..., in_box[1]:in_box[3], in_box[0]:in_box[2]] = \
(torch.arange(23, 23 + 3 * (in_box[3]-in_box[1]) * (in_box[2]-in_box[0])) % 200).reshape(1, 3, in_box[3]-in_box[1], in_box[2]-in_box[0])
img = Image(img)
bboxes = BoundingBox(bboxes, format="XYXY", image_size=image_size)
sample = [img, bboxes]
t = AADet()
out = t(sample)
import matplotlib.pyplot as plt
plt.figure()
plt.subplot(121)
plt.imshow(img[0, ...].permute(1, 2, 0) / 255.0)
plt.subplot(122)
plt.imshow(out[0][0, ...].permute(1, 2, 0) / 255.0) |
@ain-soph I was hoping you could confirm that Victor's reply unblocked you and you are able to continue the feature. Please let me know if there are more outstanding questions. Thank you! |
Yes, I'll open a draft PR this weekend using Victor's format. Just one question about the hyper-parameters. For example, for the rotate, where shall we set the |
I think it can same as AA for classification: https://github.com/pytorch/vision/blob/main/torchvision/prototype/transforms/_auto_augment.py |
🚀 The feature
Implement Learning Data Augmentation Strategies for Object Detection
Refers to: #3817
Motivation, pitch
Good to have augmentation in Torchvision
Alternatives
No response
Additional context
No response
cc @vfdev-5 @datumbox
The text was updated successfully, but these errors were encountered: