feat: add target to resize transform for aspect ratio training (detection task) #823

charlesmindee · 2022-02-16T14:50:37Z

This PR adds the optional arg target in the Resize transform to train detection models preserving the aspect ratio which leads to much consistent results.
The double resize in the script is mandatory because first we resize and pad to keep the aspect ratio (and we modify the labels as well), then we rotate everything, and finally we need to resize the image so that each image has the same size and can be batched.
Any feedback is welcome!

codecov · 2022-02-16T16:11:49Z

Codecov Report

Merging #823 (b8dfb34) into main (0310d6c) will decrease coverage by 1.11%.
The diff coverage is 71.82%.

@@            Coverage Diff             @@
##             main     #823      +/-   ##
==========================================
- Coverage   95.99%   94.87%   -1.12%     
==========================================
  Files         131      133       +2     
  Lines        5042     5193     +151     
==========================================
+ Hits         4840     4927      +87     
- Misses        202      266      +64

Flag	Coverage Δ
unittests	`94.87% <71.82%> (-1.12%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
doctr/transforms/modules/tensorflow.py	`82.41% <53.19%> (-10.64%)`	⬇️
doctr/transforms/modules/pytorch.py	`72.22% <53.33%> (-26.09%)`	⬇️
doctr/transforms/functional/base.py	`97.10% <97.61%> (+3.35%)`	⬆️
doctr/io/pdf.py	`100.00% <100.00%> (+1.66%)`	⬆️
doctr/io/reader.py	`100.00% <100.00%> (ø)`
doctr/models/preprocessor/tensorflow.py	`96.15% <100.00%> (ø)`
doctr/transforms/functional/pytorch.py	`100.00% <100.00%> (ø)`
doctr/transforms/functional/tensorflow.py	`100.00% <100.00%> (ø)`
doctr/models/predictor/base.py	`81.81% <0.00%> (-18.19%)`	⬇️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2581daa...b8dfb34. Read the comment docs.

fg-mindee

Thanks for the PR!

Two concerns I have:

I might be wrong but it looks like the target isn't being padded after the scaling (when preserving aspect ratio)? have you checked what it looks like with show-samples? (mind adding a screenshot if so? 🙏 )
I added a comment about the transformation order

Let me know what you think!

fg-mindee · 2022-02-16T17:10:13Z

doctr/transforms/modules/pytorch.py

+                if self.preserve_aspect_ratio:
+                    # Get absolute coords
+                    if target.shape[1:] == (4,):
+                        target[:, [0, 2]] *= raw_shape[-1] / self.size[1]
+                        target[:, [1, 3]] *= raw_shape[-2] / self.size[0]
+                    elif target.shape[1:] == (4, 2):
+                        target[..., 0] *= raw_shape[-1] / self.size[1]
+                        target[..., 1] *= raw_shape[-2] / self.size[0]


If the aspect ratio is preserved, there might be some shifting (padding) to be done on the target as well I'd argue

I might be missing something but it looks like it's missing the padding step

I checked with show-samples, it should be OK.
What is done here is just a changing of referential, we need to multiply the coordinate of each point by the ratio old_shape / new_shape, the old shape being the shape before padding (for instance, if you pad half of the picture, you want to divide by 2 all x and y coordinates)

but if the preserve aspect ratio does effectively perform: scaling then padding
the target needs to be transformed with an affine function, not just a scaling one? Or am I missing something that is already being performed?

Either way, it looks like the show samples isn't yielding desired outputs (perhaps because of another transfo): for the left and right sample at least, I can easily see how to preserve the aspect ratio but removing part of the padding. And also, why are the content no longer centered? 🤔

I perfectly agree with you about getting the shape & target before padding. But in the next lines, the image is padded, but the target remains unchanged and I'm concerned about that part (again I might be missing something) 🙃

For the second point, I think I should indeed add the boxes to the transformation

Oh yeah, I see, now it makes sense!
So I assume that on the first sample, it was padded on the bottom or on the right, but the rotation made it look like it's from the top right?

Either way, I would argue the first and last samples have an issue: they are padded on 2 sides (we could have expanded the content more while preserving the aspect ratio). Is that because of the double resize? If so, we should find a better way, and I may have a suggestion, say we want to resize to (W, H):

the first resize should resize the minimum side of that image to max(W,H), (just scaling the image so that the shortest side is equal to max(W, H)) without padding anything

we then do a rotation while adding expand=True

then we do a resize, this time by preserving aspect ratio (thus padding)

What do you think?

Also: should we try to make it centered though?

This should be OK now!
aspect ratio and no symmetric padding:

aspect ratio and symmetric padding:

no aspect ratio:

I tested all the options in both tf & pt, by default I put conservation of the aspect ratio & symmetric padding in the scripts

references/detection/train_pytorch.py

fg-mindee

Almost there!

So since we already have a few flags related to this, I think it would be best to avoid having a new flag in the transform:

if preserve_aspect_ratio is True and the original & target shapes don't have the same aspect ratio, then there will have to be padding
to scale the image, in other libraries, they usually pass an integer instead of a tuple and desired shape. I'd argue we could have a good use for a similar mechanism

What do you think?

charlesmindee · 2022-02-18T09:44:34Z

I am not sure I understand your second point, can you develop please ?
For the first point, we can have a target size of 1024, 1024 and a different source aspect ratio, we reshape preserving the aspect ratio to (1024, X) or (X, 1024) but we still don't want to pad to avoid adding unnecessary information on the image before the rotation (for now this is the case in references/detection/train_tensorflow.py l 204) ?

fg-mindee · 2022-02-18T17:26:00Z

Sure!
in other libraries, if you pass output_size=(H, W) it will do the same as we do here. But you can also pass output_size=H and it will scale the image so that the biggest side of the image gets H pixels. My suggestion here is to do almost the same but with the shortest side to avoid information loss. Does that make sense?

fg-mindee

Thanks! I left a few other comments

doctr/models/preprocessor/pytorch.py

doctr/models/preprocessor/tensorflow.py

fg-mindee · 2022-02-23T11:24:38Z

doctr/transforms/modules/pytorch.py

+        self,
+        img: torch.Tensor,
+        target: Optional[np.ndarray] = None,
+    ) -> Union[torch.Tensor, Tuple[torch.Tensor, np.ndarray]]:


Thinking about it, I think we should have:

one function to resize the image

one function to resize the target

use them in the module implementation
And I'm not sure what would be the best, but probably having Resize (image only), and SampleResize (image + target), would be relevant. What do you think?

Yes we can split the function with 1 for the targets and 1 for the image, but I am not sure it is relevant to have a Resize and a SampleResize, because if you want to keep the aspect ratio of the image while resizing, you will almost always pad (except if you give only 1 target size), and this will modify the targets. The Resize function won't be able to preserve the aspect ratio of the images, and this would require to differentiate the function we are using in the training scripts and in the preprocessor regarding the option selected by the user (preserve_aspect_ratio or not). This can even be dangerous if someone tries to modify the aspect ratio in Resize without changing the targets I think, do yo agree ?

And for a differentiation of the target and image function, I think it will complexify the code because we use many attributes of the Resize class to resize both the image and the target (self.symmetric_pad, self.preserve_aspect_ratio, self.output_size) which would be passed as arguments and we have also many shared computations for the image and the target that would need to be done twice (or added in the signature of the functions but I think it will make the code less understandable): offset, raw_shape, and img.shape is also used in the target computation, what do you think @fg-mindee ?

About splitting the two transforms, the reason behind this is that the target, depending on the task, will not necessarily be modified. So either we can add a lot of cases (for each target type) in the Resize transform, or we could have a Resize that only transforms the image, and SampleResize that inherits from it and transforms the target.

I agree this would be dangerous for something to add this without changing the target, but transforms will only be played with by people willing to train models. So I would argue it's safe to assume they either use our default training script or have a good understanding of what is going on under the hood 🤷‍♂️

However, I fully agree about the duplication of symmetric pad / computation duplication for some aspects. I don't have an ideal suggestion for this

For the SampleResize transform, you want it to inherit from Resize, but since they won't have the same signature so I don't really see why we do that ? @fg-mindee

We can always refactor this later on anyway! Feel free to implement it the way you prefer (the optional target passing you suggested is probably the best one)

fg-mindee

Minor adjustments to do!

Also, perhaps in a later PR, but our Resize transform is starting to get complex. We should add an illustration similarly to https://mindee.github.io/doctr/transforms.html#doctr.transforms.RandomRotate

doctr/transforms/modules/pytorch.py

fg-mindee · 2022-03-09T10:32:05Z

doctr/transforms/modules/pytorch.py

+        self,
+        img: torch.Tensor,
+        target: Optional[np.ndarray] = None,
+    ) -> Union[torch.Tensor, Tuple[torch.Tensor, np.ndarray]]:


We can always refactor this later on anyway! Feel free to implement it the way you prefer (the optional target passing you suggested is probably the best one)

doctr/transforms/modules/pytorch.py

doctr/transforms/modules/tensorflow.py

fg-mindee

Cheers 🙏

feat: add target to resize for aspect ratio training

f863474

charlesmindee added this to the 0.5.1 milestone Feb 16, 2022

charlesmindee self-assigned this Feb 16, 2022

charlesmindee requested a review from fg-mindee February 16, 2022 14:57

charlesmindee added 2 commits February 16, 2022 17:01

fix: tests

935386f

fix: sorting

53130d9

charlesmindee added 2 commits February 16, 2022 17:55

fix: typo

a439f7c

fix: typo

e236956

fg-mindee reviewed Feb 16, 2022

View reviewed changes

charlesmindee added 2 commits February 17, 2022 17:32

fix: symmetric padding case

8ee4d43

fix: unrotated case

2ce8609

charlesmindee requested a review from fg-mindee February 18, 2022 08:21

fg-mindee reviewed Feb 18, 2022

View reviewed changes

fix: args of resize

f9e8f5c

charlesmindee requested a review from fg-mindee February 22, 2022 11:17

fix: tests

d4b9350

fg-mindee reviewed Feb 23, 2022

View reviewed changes

charlesmindee added 4 commits February 23, 2022 17:03

Merge branch 'main' into ratio_det

c07fff1

Merge branch 'main' into ratio_det

794fe0d

fix: preprocessor files

a029e47

fix: validation set aspect ratio

de09f20

Merge branch 'main' into ratio_det

35c8b39

fg-mindee suggested changes Mar 9, 2022

View reviewed changes

fix: requested changes

4c7dc53

charlesmindee requested a review from fg-mindee March 9, 2022 11:24

charlesmindee added 2 commits March 9, 2022 14:50

fix: equested changes

a40ca1c

fix: style

b8dfb34

fg-mindee approved these changes Mar 9, 2022

View reviewed changes

charlesmindee merged commit a2b31cc into main Mar 9, 2022

charlesmindee deleted the ratio_det branch March 9, 2022 16:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add target to resize transform for aspect ratio training (detection task) #823

feat: add target to resize transform for aspect ratio training (detection task) #823

charlesmindee commented Feb 16, 2022

codecov bot commented Feb 16, 2022 •

edited

Loading

fg-mindee left a comment

fg-mindee Feb 16, 2022

charlesmindee Feb 16, 2022

charlesmindee Feb 16, 2022

fg-mindee Feb 16, 2022

fg-mindee Feb 16, 2022

charlesmindee Feb 17, 2022

fg-mindee Feb 17, 2022

charlesmindee Feb 17, 2022

charlesmindee Feb 17, 2022

charlesmindee Feb 18, 2022

fg-mindee left a comment

charlesmindee commented Feb 18, 2022

fg-mindee commented Feb 18, 2022

fg-mindee left a comment

fg-mindee Feb 23, 2022

charlesmindee Feb 25, 2022 •

edited

Loading

charlesmindee Feb 25, 2022 •

edited

Loading

fg-mindee Mar 7, 2022

charlesmindee Mar 8, 2022

fg-mindee Mar 9, 2022

fg-mindee left a comment

fg-mindee Mar 9, 2022

fg-mindee left a comment

feat: add target to resize transform for aspect ratio training (detection task) #823

feat: add target to resize transform for aspect ratio training (detection task) #823

Conversation

charlesmindee commented Feb 16, 2022

codecov bot commented Feb 16, 2022 • edited Loading

Codecov Report

fg-mindee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fg-mindee left a comment

Choose a reason for hiding this comment

charlesmindee commented Feb 18, 2022

fg-mindee commented Feb 18, 2022

fg-mindee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charlesmindee Feb 25, 2022 • edited Loading

Choose a reason for hiding this comment

charlesmindee Feb 25, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fg-mindee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fg-mindee left a comment

Choose a reason for hiding this comment

codecov bot commented Feb 16, 2022 •

edited

Loading

charlesmindee Feb 25, 2022 •

edited

Loading

charlesmindee Feb 25, 2022 •

edited

Loading