Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rotate page #488

Merged
merged 66 commits into from
Dec 6, 2021
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
1af0fc1
feat: integrate image rotation before using predictor
Rob192 Jul 19, 2021
811fd6a
Merge branch 'main' into rotate_page
Rob192 Jul 19, 2021
6f5c183
Merge branch 'main' into rotate_page
Rob192 Sep 21, 2021
c2b18d7
feat: add rotate_document functionality
Rob192 Sep 22, 2021
312179f
fix: remove min_angle from rotate_page
Rob192 Sep 22, 2021
b6bc74f
merge
Rob192 Sep 30, 2021
6fe1bc6
fix: correct models.predictor.tensorflow
Rob192 Sep 30, 2021
a6f2ff1
fix: minor corrections
Rob192 Oct 1, 2021
ccda4d0
feat: Rotate back images and boxes after straightening
Rob192 Oct 4, 2021
7d4ed75
fix: correct typo
Rob192 Oct 4, 2021
a303b23
fix: merge two functions rotate_image
Rob192 Oct 5, 2021
7a78263
fix: do not rotate back pages but only boxes
Rob192 Oct 5, 2021
eb341ac
fix: typos
Rob192 Oct 6, 2021
eeff2d6
fix: add more testing for remap_boxes in cases of boxes with an angle…
Rob192 Oct 6, 2021
16f3489
fix: remove the cropping after rotation of the image
Rob192 Oct 14, 2021
eb063f1
Merge branch 'main' of https://github.com/mindee/doctr
Rob192 Oct 25, 2021
b9ec27e
Merge branch 'main' into rotate_page
Rob192 Oct 25, 2021
f7fcf90
fix: correct model/_utils.py
Rob192 Oct 25, 2021
a04ab4f
Merge branch 'main' of https://github.com/mindee/doctr
Rob192 Oct 28, 2021
8ec9eab
Merge branch 'main' into rotate_page
Rob192 Oct 28, 2021
cf9ab0d
fix: do not use resolve_lines and resolve_boxes as it does not work w…
Rob192 Oct 28, 2021
8a31014
fix: remove expand in geometry.rotate_boxes
Rob192 Oct 29, 2021
e457cf0
fix: reformat code
Rob192 Oct 29, 2021
9975c82
fix: reformat expand from function signature
Rob192 Oct 29, 2021
32d53e4
fix: rename keep_original_size to preserve_aspect_ratio
Rob192 Oct 29, 2021
6821442
fix: vectorize box transformation
Rob192 Oct 29, 2021
988e2d0
Merge branch 'main' into rotate_page
Rob192 Nov 16, 2021
573f13f
fix: minor modifications + remove test_bbox_to_rbbox
Rob192 Nov 16, 2021
290e8ed
fix: add the straighten_pages to the latest codebase
Rob192 Nov 19, 2021
1775ebf
feat: add the straighten_pages to the pytorch predictor
Rob192 Nov 19, 2021
816168c
Merge branch 'main' into rotate_page
Rob192 Nov 19, 2021
f199044
feat: add testing for the straighten_pages parameter
Rob192 Nov 19, 2021
887ed25
fix: in case no angle is found in estimate_orientation return 0
Rob192 Nov 19, 2021
789a9c2
Merge branch 'main' into rotate_page
Rob192 Nov 24, 2021
239c508
fix: make sure boxes are outputted from _process_predictions
Rob192 Nov 24, 2021
52461dc
fix: update docstrings in OCRPredictor
Rob192 Nov 24, 2021
b6f8cca
fix: create a copy of boxes inside rotate_boxes
Rob192 Nov 24, 2021
a9f3d6e
fix: update docstring for rotate_image
Rob192 Nov 24, 2021
ea69de6
fix: add comments inside remap_boxes
Rob192 Nov 24, 2021
1a72a8c
fix: change testing in test_estimate_orientation
Rob192 Nov 24, 2021
d658be4
fix: change testing in test_estimate_orientation
Rob192 Nov 24, 2021
4711797
Merge branch 'main' into rotate_page
Rob192 Nov 25, 2021
e5ed562
Merge branch 'main' into rotate_page
Rob192 Nov 26, 2021
9a6c658
fix: delete imports not used
Rob192 Nov 29, 2021
a9cbe04
fix: styling
Rob192 Nov 29, 2021
ebdc320
fix: change assertion in test_utils_geometry.py
Rob192 Nov 30, 2021
d16fba3
fix: keep check with if expand in rotate_image
Rob192 Nov 30, 2021
e344e5c
fix: change rotate_boxes signature
Rob192 Nov 30, 2021
2e11dc8
fix: use loc_preds instead of boxes
Rob192 Nov 30, 2021
bb6bc79
fix: wrong test in remap boxes
Rob192 Nov 30, 2021
258b18b
Merge branch 'main' into rotate_page
Rob192 Nov 30, 2021
b22309d
add unit tests for pytorch
Rob192 Dec 1, 2021
495ad8c
add unit tests for remap_boxes and estimate_orientation
Rob192 Dec 1, 2021
98b44f6
fix: styling
Rob192 Dec 1, 2021
28f1fd8
fix: isort
Rob192 Dec 1, 2021
1d61de7
fix: remove unnecessary fixture
Rob192 Dec 1, 2021
faac0bd
fix: add testing for pytorch predictor
Rob192 Dec 2, 2021
938c9f2
fix: styling
Rob192 Dec 2, 2021
30a70f2
fix: correct testing for ocrpredictor with pytorch
Rob192 Dec 2, 2021
a7c0d55
fix: correct imports for testing
Rob192 Dec 2, 2021
ce23100
fix: isort
Rob192 Dec 2, 2021
8525b14
fix: make sure that expand in rotate_image is keeping the same image …
Rob192 Dec 3, 2021
7bcf639
fix: styling
Rob192 Dec 3, 2021
2205737
fix: use absolute centers for rotate_boxes
Rob192 Dec 5, 2021
484451b
fix: calculation of image_center and documentation
Rob192 Dec 5, 2021
66fad67
fix: remove default value for orig_shape in rotate_boxes
Rob192 Dec 5, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 49 additions & 3 deletions doctr/models/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,13 @@

import numpy as np
import cv2
from math import floor
from typing import List
from math import floor, ceil
from typing import List, Optional, Tuple
Rob192 marked this conversation as resolved.
Show resolved Hide resolved
from statistics import median_low

__all__ = ['estimate_orientation', 'extract_crops', 'extract_rcrops', 'get_bitmap_angle']
from doctr.utils import compute_expanded_shape
Rob192 marked this conversation as resolved.
Show resolved Hide resolved

__all__ = ['estimate_orientation', 'extract_crops', 'extract_rcrops', 'get_bitmap_angle', 'rotate_image']
Rob192 marked this conversation as resolved.
Show resolved Hide resolved


def extract_crops(img: np.ndarray, boxes: np.ndarray, channels_last: bool = True) -> List[np.ndarray]:
Expand Down Expand Up @@ -188,3 +190,47 @@ def get_bitmap_angle(bitmap: np.ndarray, n_ct: int = 20, std_max: float = 3.) ->
angle = 90 + angle

return angle


def rotate_image(
Rob192 marked this conversation as resolved.
Show resolved Hide resolved
image: np.ndarray,
angle: float,
expand: bool = False,
mask_shape: Optional[Tuple[int, int]] = None
) -> np.ndarray:
"""Rotate an image counterclockwise by an given angle.

Args:
image: numpy tensor to rotate
angle: rotation angle in degrees, between -90 and +90
expand: whether the image should be padded before the rotation
mask_shape: applies a mask on the image of the specified shape given in absolute pixels

Returns:
Rotated array, padded by 0 by default.
"""

# Compute the expanded padding
if expand:
exp_shape = compute_expanded_shape(image.shape[:-1], angle)
h_pad, w_pad = int(max(0, ceil(exp_shape[0] - image.shape[0]))), int(max(0, ceil(exp_shape[1] - image.shape[1])))
exp_img = np.pad(image, ((h_pad // 2, h_pad - h_pad // 2), (w_pad // 2, w_pad - w_pad // 2), (0, 0)))
else:
exp_img = image

height, width = exp_img.shape[:2]
rot_mat = cv2.getRotationMatrix2D((width / 2, height / 2), angle, 1.0)
rot_img = cv2.warpAffine(exp_img, rot_mat, (width, height))

if mask_shape is not None:
if len(mask_shape) != 2:
raise ValueError(f"Mask length should be 2, was found at: {len(mask_shape)}")
h_crop, w_crop = int(height - ceil(mask_shape[0])), int(ceil(width - mask_shape[1]))
if h_crop > 0 and w_crop > 0:
rot_img = rot_img[h_crop // 2: - h_crop // 2, w_crop // 2: - w_crop // 2]
elif w_crop <= 0:
rot_img = rot_img[h_crop // 2: - h_crop // 2, ]
elif h_crop <= 0:
rot_img = rot_img[:, w_crop // 2: - w_crop // 2]

return rot_img
22 changes: 21 additions & 1 deletion doctr/models/predictor/tensorflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,12 @@
from typing import List, Any, Union

from doctr.io.elements import Document
from doctr.utils.geometry import rotate_boxes
from doctr.utils.repr import NestedObject
from doctr.models.builder import DocumentBuilder
from doctr.models.detection.predictor import DetectionPredictor
from doctr.models.recognition.predictor import RecognitionPredictor
from doctr.models._utils import estimate_orientation, rotate_image
from .base import _OCRPredictor


Expand All @@ -30,12 +32,14 @@ def __init__(
self,
det_predictor: DetectionPredictor,
reco_predictor: RecognitionPredictor,
rotated_bbox: bool = False
rotated_bbox: bool = False,
straighten_pages: bool = False,
) -> None:

super().__init__()
self.det_predictor = det_predictor
self.reco_predictor = reco_predictor
self.straighten_pages = straighten_pages
self.doc_builder = DocumentBuilder(rotated_bbox=rotated_bbox)

def __call__(
Expand All @@ -48,6 +52,12 @@ def __call__(
if any(page.ndim != 3 for page in pages):
raise ValueError("incorrect input shape: all pages are expected to be multi-channel 2D images.")

# Detect document rotation and rotate pages
if self.straighten_pages:
page_orientations = [estimate_orientation(page) for page in pages]
page_shapes = [page.shape[:-1] for page in pages]
pages = [rotate_image(page, -angle, expand=True) for page, angle in zip(pages, page_orientations)]

Rob192 marked this conversation as resolved.
Show resolved Hide resolved
# Localize text elements
loc_preds = self.det_predictor(pages, **kwargs)
# Crop images, rotate page if necessary
Expand All @@ -56,5 +66,15 @@ def __call__(
word_preds = self.reco_predictor([crop for page_crops in crops for crop in page_crops], **kwargs)

boxes, text_preds = self._process_predictions(loc_preds, word_preds, self.doc_builder.rotated_bbox)

# Rotate back pages and boxes while keeping original image size
if self.straighten_pages:
pages = [rotate_image(page, angle, expand=True, mask_shape=mask) for page, angle, mask in
Rob192 marked this conversation as resolved.
Show resolved Hide resolved
zip(pages, page_orientations, page_shapes)]
rboxes = [rotate_boxes(page_boxes, angle, expand=True, orig_shape=page.shape[:2], mask_shape=mask) for
Rob192 marked this conversation as resolved.
Show resolved Hide resolved
page_boxes, page, angle, mask in zip(boxes, pages, page_orientations, page_shapes)]
boxes = rboxes
self.doc_builder = DocumentBuilder(rotated_bbox=True) # override the current doc_builder

out = self.doc_builder(boxes, text_preds, [page.shape[:2] for page in pages]) # type: ignore[misc]
return out
2 changes: 1 addition & 1 deletion doctr/utils/common_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from pathlib import Path
from typing import Tuple, List, Union

__all__ = ['Point2D', 'BoundingBox', 'RotatedBbox', 'Polygon4P', 'Polygon']
__all__ = ['Point2D', 'BoundingBox', 'RotatedBbox', 'Polygon4P', 'Polygon', 'Bbox']


Point2D = Tuple[float, float]
Expand Down
62 changes: 53 additions & 9 deletions doctr/utils/geometry.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@
# See LICENSE or go to <https://www.apache.org/licenses/LICENSE-2.0.txt> for full license details.

import math
from typing import List, Union, Tuple
from typing import List, Union, Tuple, Optional
import numpy as np
import cv2
from .common_types import BoundingBox, Polygon4P, RotatedBbox
from .common_types import BoundingBox, Polygon4P, RotatedBbox, Bbox
Rob192 marked this conversation as resolved.
Show resolved Hide resolved

__all__ = ['rbbox_to_polygon', 'bbox_to_polygon', 'polygon_to_bbox', 'polygon_to_rbbox',
'resolve_enclosing_bbox', 'resolve_enclosing_bbox', 'fit_rbbox', 'rotate_boxes', 'rotate_abs_boxes',
Expand Down Expand Up @@ -38,6 +38,10 @@ def polygon_to_rbbox(polygon: Polygon4P) -> RotatedBbox:
return fit_rbbox(cnt)


def bbox_to_rbbox(bbox: Bbox) -> RotatedBbox:
return (bbox[0] + bbox[2]) / 2, (bbox[1] + bbox[3]) / 2, bbox[2] - bbox[0], bbox[3] - bbox[1], 0
Rob192 marked this conversation as resolved.
Show resolved Hide resolved


def resolve_enclosing_bbox(bboxes: Union[List[BoundingBox], np.ndarray]) -> Union[BoundingBox, np.ndarray]:
"""Compute enclosing bbox either from:

Expand Down Expand Up @@ -129,10 +133,41 @@ def rotate_abs_boxes(boxes: np.ndarray, angle: float, img_shape: Tuple[int, int]
return rotated_boxes


def remap_boxes(boxes: np.ndarray, orig_shape: Tuple[int, int], dest_shape: Tuple[int, int]) -> np.ndarray:
Rob192 marked this conversation as resolved.
Show resolved Hide resolved
""" Remaps a batch of RotatedBbox (x, y, w, h, alpha) expressed for an origin_shape to a destination_shape,
This does not impact the absolute shape of the boxes
Rob192 marked this conversation as resolved.
Show resolved Hide resolved

Args:
boxes: (N, 5) array of RELATIVE RotatedBbox (x, y, w, h, alpha)
orig_shape: shape of the origin image
dest_shape: shape of the destination image

Returns:
A batch of rotated boxes (N, 5): (x, y, w, h, alpha) expressed in the destination referencial

"""

if len(dest_shape) != 2:
raise ValueError(f"Mask length should be 2, was found at: {len(dest_shape)}")
if len(orig_shape) != 2:
raise ValueError(f"Image_shape length should be 2, was found at: {len(orig_shape)}")
Rob192 marked this conversation as resolved.
Show resolved Hide resolved
orig_width, orig_height = orig_shape
dest_width, dest_height = dest_shape
mboxes = boxes.copy()
mboxes[:, 0] = ((boxes[:, 0] * orig_height) + (dest_height - orig_height) / 2) / dest_height
mboxes[:, 1] = ((boxes[:, 1] * orig_width) + (dest_width - orig_width) / 2) / dest_width
mboxes[:, 2] = boxes[:, 2] * orig_height / dest_height
mboxes[:, 3] = boxes[:, 3] * orig_width / dest_width
return mboxes


def rotate_boxes(
boxes: np.ndarray,
angle: float = 0.,
Rob192 marked this conversation as resolved.
Show resolved Hide resolved
min_angle: float = 1.
min_angle: float = 1.,
Rob192 marked this conversation as resolved.
Show resolved Hide resolved
expand: bool = False,
orig_shape: Optional[Tuple[int, int]] = None,
mask_shape: Optional[Tuple[int, int]] = None,
Rob192 marked this conversation as resolved.
Show resolved Hide resolved
) -> np.ndarray:
"""Rotate a batch of straight bounding boxes (xmin, ymin, xmax, ymax) of an angle,
if angle > min_angle, around the center of the page.
Expand All @@ -141,28 +176,37 @@ def rotate_boxes(
boxes: (N, 4) array of RELATIVE boxes
angle: angle between -90 and +90 degrees
min_angle: minimum angle to rotate boxes
expand: whether the image should be padded before the rotation
orig_shape: shape of the origin image
mask_shape: shape of the mask if the image is cropped after the rotation

Returns:
A batch of rotated boxes (N, 5): (x, y, w, h, alpha) or a batch of straight bounding boxes
"""
# Change format of the boxes to rotated boxes
boxes = np.apply_along_axis(bbox_to_rbbox, 1, boxes)
# If small angle, return boxes (no rotation)
if abs(angle) < min_angle or abs(angle) > 90 - min_angle:
return boxes
if expand:
exp_shape = compute_expanded_shape(orig_shape, angle)
boxes = remap_boxes(boxes, orig_shape=orig_shape, dest_shape=exp_shape)
orig_shape = exp_shape # in case a mask is used afterwards
# Compute rotation matrix
angle_rad = angle * np.pi / 180. # compute radian angle for np functions
rotation_mat = np.array([
[np.cos(angle_rad), -np.sin(angle_rad)],
[np.sin(angle_rad), np.cos(angle_rad)]
], dtype=boxes.dtype)
# Compute unrotated boxes
x_unrotated, y_unrotated = (boxes[:, 0] + boxes[:, 2]) / 2, (boxes[:, 1] + boxes[:, 3]) / 2
width, height = boxes[:, 2] - boxes[:, 0], boxes[:, 3] - boxes[:, 1]
# Rotate centers
centers = np.stack((x_unrotated, y_unrotated), axis=-1)
rotated_centers = .5 + np.matmul(centers - .5, np.transpose(rotation_mat))
centers = np.stack((boxes[:, 0], boxes[:, 1]), axis=-1)
rotated_centers = .5 + np.matmul(centers - .5, rotation_mat)
x_center, y_center = rotated_centers[:, 0], rotated_centers[:, 1]
# Compute rotated boxes
rotated_boxes = np.stack((x_center, y_center, width, height, angle * np.ones_like(boxes[:, 0])), axis=1)
rotated_boxes = np.stack((x_center, y_center, boxes[:, 2], boxes[:, 3], angle * np.ones_like(boxes[:, 0])), axis=1)
# Apply a mask if requested
if mask_shape is not None:
rotated_boxes = remap_boxes(rotated_boxes, orig_shape=orig_shape, dest_shape=mask_shape)
return rotated_boxes


Expand Down
2 changes: 1 addition & 1 deletion doctr/utils/visualization.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ def polygon_patch(
# Switch to absolute coords
x, w = x * width, w * width
y, h = y * height, h * height
points = cv2.boxPoints(((x, y), (w, h), a))
points = cv2.boxPoints(((x, y), (w, h), -a))
fg-mindee marked this conversation as resolved.
Show resolved Hide resolved

return patches.Polygon(
points,
Expand Down
30 changes: 29 additions & 1 deletion test/common/test_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import cv2

from doctr.io import reader, DocumentFile
from doctr.models._utils import extract_crops, extract_rcrops, get_bitmap_angle, estimate_orientation
from doctr.models._utils import extract_crops, extract_rcrops, get_bitmap_angle, estimate_orientation, rotate_image


def test_extract_crops(mock_pdf): # noqa: F811
Expand Down Expand Up @@ -89,6 +89,34 @@ def test_get_bitmap_angle(mock_bitmap):
assert abs(angle - 30.) < 1.


def test_rotate_image():
img = np.ones((32, 64, 3), dtype=np.float32)
rotated = rotate_image(img, 30.)
assert rotated.shape[:-1] == (32, 64)
assert rotated[0, 0, 0] == 0
assert rotated[0, :, 0].sum() > 1

# Expand
rotated = rotate_image(img, 30., expand=True)
assert rotated.shape[:-1] == (60, 72)
assert rotated[0, :, 0].sum() <= 1

# Expand with 90° rotation
rotated = rotate_image(img, 90., expand=True)
assert rotated.shape[:-1] == (64, 64)
assert rotated[0, :, 0].sum() <= 1

# Expand with mask
rotated = rotate_image(img, 30., expand=True, mask_shape=(40, 72))
assert rotated.shape[:-1] == (40, 72)
assert rotated[0, :, 0].sum() > 1


def test_estimate_orientation(mock_image):
angle = estimate_orientation(mock_image)
assert abs(angle - 30.) < 1.

angle = estimate_orientation(mock_image)
rotated = rotate_image(mock_image, -angle)
angle_rotated = estimate_orientation(rotated)
assert abs(angle_rotated - 0.) < 1.
33 changes: 31 additions & 2 deletions test/common/test_utils_geometry.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,25 +28,54 @@ def test_polygon_to_rbbox():
assert all(abs(i - j) <= 1e-7 for (i, j) in zip(pred, target))


def test_bbox_to_rbbox():
pred = geometry.bbox_to_rbbox((0, 0, 0.6, 0.4))
target = (0.3, 0.2, 0.6, 0.4, 0)
assert all(abs(i - j) <= 1e-7 for (i, j) in zip(pred, target))


def test_resolve_enclosing_rbbox():
pred = geometry.resolve_enclosing_rbbox([(.2, .2, .05, .05, 0), (.2, .2, .2, .2, 0)])[:4]
target = (.2, .2, .2, .2)
assert all(abs(i - j) <= 1e-7 for (i, j) in zip(pred, target))


def test_remap_boxes():
pred = geometry.remap_boxes(np.array([[0.5, 0.5, 0.1, 0.1, 0.]]), (10, 10), (20, 20))
target = np.array([[0.5, 0.5, 0.05, 0.05, 0.]])
assert pred.all() == target.all()

pred = geometry.remap_boxes(np.array([[0.5, 0.5, 0.1, 0.1, 0.]]), (10, 10), (20, 10))
target = np.array([[0.5, 0.5, 0.1, 0.05, 0.]])
assert pred.all() == target.all()

pred = geometry.remap_boxes(np.array([[0.25, 0.5, 0.5, 0.33, 0.]]), (80, 30), (160, 30))
target = np.array([[0.375, 0.5, 0.25, 0.1, 0.]])
assert pred.all() == target.all()


def test_rotate_boxes():
boxes = np.array([[0.1, 0.1, 0.8, 0.3]])
rboxes = np.apply_along_axis(geometry.bbox_to_rbbox, 1, boxes)
# Angle = 0
rotated = geometry.rotate_boxes(boxes, angle=0.)
assert rotated.all() == boxes.all()
assert rotated.all() == rboxes.all()
# Angle < 1:
rotated = geometry.rotate_boxes(boxes, angle=0.5)
assert rotated.all() == boxes.all()
assert rotated.all() == rboxes.all()
# Angle = 30
rotated = geometry.rotate_boxes(boxes, angle=30)
assert rotated.shape == (1, 5)
assert rotated[0, 4] == 30.

boxes = np.array([[0., 0., 0.6, 0.2]])
# Angle = -90:
rotated = geometry.rotate_boxes(boxes, angle=-90, min_angle=0)
assert rotated.all() == np.array([[0.1, 0.7, 0.6, 0.2, -90.]]).all()
# Angle = 90
rotated = geometry.rotate_boxes(boxes, angle=+90, min_angle=0)
assert rotated.all() == np.array([[0.9, 0.3, 0.6, 0.2, 90.]]).all()


def test_rotate_image():
img = np.ones((32, 64, 3), dtype=np.float32)
Expand Down