-
Notifications
You must be signed in to change notification settings - Fork 462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add target to resize transform for aspect ratio training (detection task) #823
Changes from 5 commits
f863474
935386f
53130d9
a439f7c
e236956
8ee4d43
2ce8609
f9e8f5c
d4b9350
c07fff1
794fe0d
a029e47
de09f20
35c8b39
4c7dc53
a40ca1c
b8dfb34
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,8 +4,9 @@ | |
# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details. | ||
|
||
import math | ||
from typing import Any, Dict, Tuple, Union | ||
from typing import Any, Dict, Optional, Tuple, Union | ||
|
||
import numpy as np | ||
import torch | ||
from PIL.Image import Image | ||
from torch.nn.functional import pad | ||
|
@@ -27,7 +28,12 @@ def __init__( | |
self.preserve_aspect_ratio = preserve_aspect_ratio | ||
self.symmetric_pad = symmetric_pad | ||
|
||
def forward(self, img: torch.Tensor) -> torch.Tensor: | ||
def forward( | ||
self, | ||
img: torch.Tensor, | ||
target: Optional[np.ndarray] = None, | ||
) -> Union[torch.Tensor, Tuple[torch.Tensor, np.ndarray]]: | ||
|
||
target_ratio = self.size[0] / self.size[1] | ||
actual_ratio = img.shape[-2] / img.shape[-1] | ||
if not self.preserve_aspect_ratio or (target_ratio == actual_ratio): | ||
|
@@ -41,11 +47,27 @@ def forward(self, img: torch.Tensor) -> torch.Tensor: | |
|
||
# Scale image | ||
img = F.resize(img, tmp_size, self.interpolation) | ||
raw_shape = img.shape[-2:] | ||
# Pad (inverted in pytorch) | ||
_pad = (0, self.size[1] - img.shape[-1], 0, self.size[0] - img.shape[-2]) | ||
if self.symmetric_pad: | ||
half_pad = (math.ceil(_pad[1] / 2), math.ceil(_pad[3] / 2)) | ||
_pad = (half_pad[0], _pad[1] - half_pad[0], half_pad[1], _pad[3] - half_pad[1]) | ||
|
||
# In case boxes are provided, resize boxes if needed (for detection task if preserve aspect ratio) | ||
if target is not None: | ||
if self.preserve_aspect_ratio: | ||
# Get absolute coords | ||
if target.shape[1:] == (4,): | ||
target[:, [0, 2]] *= raw_shape[-1] / self.size[1] | ||
target[:, [1, 3]] *= raw_shape[-2] / self.size[0] | ||
elif target.shape[1:] == (4, 2): | ||
target[..., 0] *= raw_shape[-1] / self.size[1] | ||
target[..., 1] *= raw_shape[-2] / self.size[0] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the aspect ratio is preserved, there might be some shifting (padding) to be done on the target as well I'd argue I might be missing something but it looks like it's missing the padding step There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I checked with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. but if the preserve aspect ratio does effectively perform: scaling then padding Either way, it looks like the show samples isn't yielding desired outputs (perhaps because of another transfo): for the left and right sample at least, I can easily see how to preserve the aspect ratio but removing part of the padding. And also, why are the content no longer centered? 🤔 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I perfectly agree with you about getting the shape & target before padding. But in the next lines, the image is padded, but the target remains unchanged and I'm concerned about that part (again I might be missing something) 🙃 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the second point, I think I should indeed add the boxes to the transformation There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh yeah, I see, now it makes sense! Either way, I would argue the first and last samples have an issue: they are padded on 2 sides (we could have expanded the content more while preserving the aspect ratio). Is that because of the double resize? If so, we should find a better way, and I may have a suggestion, say we want to resize to (W, H):
What do you think? Also: should we try to make it centered though? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tested all the options in both tf & pt, by default I put conservation of the aspect ratio & symmetric padding in the scripts |
||
else: | ||
raise AssertionError | ||
return pad(img, _pad), target | ||
|
||
return pad(img, _pad) | ||
|
||
def __repr__(self) -> str: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it, I think we should have:
And I'm not sure what would be the best, but probably having
Resize
(image only), andSampleResize
(image + target), would be relevant. What do you think?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can split the function with 1 for the targets and 1 for the image, but I am not sure it is relevant to have a
Resize
and aSampleResize
, because if you want to keep the aspect ratio of the image while resizing, you will almost always pad (except if you give only 1 target size), and this will modify the targets. TheResize
function won't be able to preserve the aspect ratio of the images, and this would require to differentiate the function we are using in the training scripts and in the preprocessor regarding the option selected by the user (preserve_aspect_ratio
or not). This can even be dangerous if someone tries to modify the aspect ratio inResize
without changing the targets I think, do yo agree ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And for a differentiation of the target and image function, I think it will complexify the code because we use many attributes of the
Resize
class to resize both the image and the target (self.symmetric_pad
,self.preserve_aspect_ratio
,self.output_size
) which would be passed as arguments and we have also many shared computations for the image and the target that would need to be done twice (or added in the signature of the functions but I think it will make the code less understandable):offset
,raw_shape,
andimg.shape
is also used in the target computation, what do you think @fg-mindee ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About splitting the two transforms, the reason behind this is that the target, depending on the task, will not necessarily be modified. So either we can add a lot of cases (for each target type) in the Resize transform, or we could have a Resize that only transforms the image, and
SampleResize
that inherits from it and transforms the target.I agree this would be dangerous for something to add this without changing the target, but transforms will only be played with by people willing to train models. So I would argue it's safe to assume they either use our default training script or have a good understanding of what is going on under the hood 🤷♂️
However, I fully agree about the duplication of symmetric pad / computation duplication for some aspects. I don't have an ideal suggestion for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the
SampleResize
transform, you want it to inherit fromResize
, but since they won't have the same signature so I don't really see why we do that ? @fg-mindeeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can always refactor this later on anyway! Feel free to implement it the way you prefer (the optional target passing you suggested is probably the best one)