Highlights of the release:

Note: doctr 0.6.0 requires either TensorFlow >= 2.9.0 or PyTorch >= 1.8.0.

Full integration with Huggingface Hub (docTR meets Huggingface)

Loading from hub:

from doctr.io import DocumentFile
from doctr.models import ocr_predictor, from_hub
image = DocumentFile.from_images(['data/example.jpg'])
# Load a custom detection model from huggingface hub
det_model = from_hub('Felix92/doctr-torch-db-mobilenet-v3-large')
# Load a custom recognition model from huggingface hub
reco_model = from_hub('Felix92/doctr-torch-crnn-mobilenet-v3-large-french')
# You can easily plug in this models to the OCR predictor
predictor = ocr_predictor(det_arch=det_model, reco_arch=reco_model)
result = predictor(image)

Pushing to the hub:

from doctr.models import recognition, login_to_hub, push_to_hf_hub
login_to_hub()
my_awesome_model = recognition.crnn_mobilenet_v3_large(pretrained=True)
push_to_hf_hub(my_awesome_model, model_name='doctr-crnn-mobilenet-v3-large-french-v1', task='recognition', arch='crnn_mobilenet_v3_large')

Documentation: https://mindee.github.io/doctr/using_doctr/sharing_models.html

Predefined datasets can be used also for recognition task

from doctr.datasets import CORD
# Crop boxes as is (can contain irregular)
train_set = CORD(train=True, download=True, recognition_task=True)
# Crop rotated boxes (always regular)
train_set = CORD(train=True, download=True, use_polygons=True, recognition_task=True)
img, target = train_set[0]

Documentation: https://mindee.github.io/doctr/using_doctr/using_datasets.html

New models (both frameworks)

classification: VisionTransformer (ViT)
recognition: Vision Transformer for Scene Text Recognition (ViTSTR)

Bug fixes recognition models

MASTER and SAR architectures are now operational in both frameworks (TensorFlow and PyTorch)

ONNX support (experimential)

All models can now be exported into ONNX format (only TF mobilenet left for 0.7.0)

NOTE: full production pipeline with ONNX / build is planned for 0.7.0 (the models can be only exported up to the logits without any post processing included)

Further features

our demo is now also PyTorch compatible, thanks to @odulcy-mindee
it is now possible to detect the language of the extracted text, thanks to @aminemindee

What's Changed

Breaking Changes 🛠

feat: ✨ allow beam width > 1 in the CRNN postprocessor by @khalidMindee in #630
[Fix] TensorFlow SAR_Resnet31 implementation by @felixdittrich92 in #925

New Features

[onnx] classification models export by @felixdittrich92 in #830
feat: Added Vietnamese entry in VOCAB by @calibretaliation in #878
feat: Added Czech to the set of vocabularies in datasets/vocabs.py by @Xargonus in #885
feat: Add ability to upload PT/TF models to Huggingface Hub by @felixdittrich92 in #881
[feature][tf/pt] integrate from_hub for all tasks by @felixdittrich92 in #892
[feature] Part 2 from use datasets for recognition by @felixdittrich92 in #891
[datasets] Add MJSynth (Synth90K) by @felixdittrich92 in #827
[docu]: add documentation for datasets by @felixdittrich92 in #905
add a Slack Community badge by @fharper in #936
Feat/add language detection by @aminemindee in #1023
add ViT as classification model TF and PT by @felixdittrich92 in #1050
[models] add ViTSTR TF and PT and update ViT to work as backbone by @felixdittrich92 in #1055

Bug Fixes

[PyTorch][references] fix pretrained with different vocabs by @felixdittrich92 in #874
[classification] Fix cfgs by @felixdittrich92 in #883
docs: Fixed typo in installation instructions by @frgfm in #901
[Fix] imgur5k test by @felixdittrich92 in #903
fix: Fixed load_pretrained_params in PyTorch when ignoring keys by @frgfm in #902
[Fix]: Documentation add missing in vocabs and correct tab in sharing models by @felixdittrich92 in #904
Fix links in readme by @jsn5 in #937
[Fix] PyTorch MASTER implementation by @felixdittrich92 in #941
[Fix] MJSynth dataset: filter corrupted or missing images by @felixdittrich92 in #956
[Fix] SVT dataset: clip box values and add shape and label check by @felixdittrich92 in #955
[Fix] Tensorflow MASTER implementation by @felixdittrich92 in #949
[FIX] MASTER AMP and onnxruntime issue with master PT by @felixdittrich92 in #986
pytest-api test: fix ping server step by @odulcy-mindee in #997
docs/index: fix two minor typos by @mara004 in #1002
Fix orientation details export by @aminemindee in #1022
Changed return type of multithread_exec to iterator by @mtvch in #1019
[datasets] Fix recognition parts of SynthText and IMGUR5K by @felixdittrich92 in #1038
[Fix] rotation classifier input move to model device by @felixdittrich92 in #1039
[models] Vit: fix intermediate size scale and unify TF to PT by @felixdittrich92 in #1063

Improvements

chore: Applied post release modifications v0.5.1 by @felixdittrich92 in #870
[refactor][fix]: Part1 from use datasets for recognition task by @felixdittrich92 in #889
ci: Add swagger ping in API CI job by @frgfm in #906
[docs] Add naming conventions for upload models to hf hub by @felixdittrich92 in #921
docs: Improved error message of encode_string by @frgfm in #929
[Refactor] PyTorch SAR_Resnet31 make it ONNX exportable (again) by @felixdittrich92 in #930
Add support page in README by @jonathanMindee in #946
[references] Add eval recognition and update eval detection scripts by @felixdittrich92 in #933
update pypdfium2 dep and improve code quality by @felixdittrich92 in #953
docs: Moved need help section after code snippet by @frgfm in #959
chore: Updated TF requirements to fix grouped convolutions on CPU by @frgfm in #963
style: Fixed mypy and moved tool configs to pyproject.toml by @frgfm in #966
Updating the readme by @Atomme1 in #938
Update docs in using_doctr by @odulcy-mindee in #993
feat: add a basic example of text detection by @ianardee in #999
Add pytorch demo by @odulcy-mindee in #1008
[build] move requirements to pyproject.toml by @felixdittrich92 in #1031
Migrate static data from github to monitoring middleware. by @marvinmindee in #1033
Changes needed to be able to use doctr on AWS Lambda by @mtvch in #1017
[Fix] unify recognition dataset parts return signature by @felixdittrich92 in #1041
Updated README.md for custom fonts by @carl-krikorian in #1051
[refactor] detection script by @felixdittrich92 in #1060
[models] ViT add checkpoints and some rework to use pretrained ViT backbone in ViTSTR by @felixdittrich92 in #1072
upgrade pypdfium2 by @felixdittrich92 in #1075
ViTSTR disable pretrained backbone by default by @felixdittrich92 in #1080

Miscellaneous

[Refactor] commit tags by @felixdittrich92 in #871
Update io/pdf.py to new pypdfium2 API by @mara004 in #944
docs: Documentation the reason for keras version specifier by @frgfm in #958
[datasets] update IC / SROIE / FUNSD / CORD by @felixdittrich92 in #983
[datasets] revert whitespace filtering and fix svhn reco by @felixdittrich92 in #987
fix: update tensorflow-addons to match tensorflow version by @ianardee in #998
move transformers implementation to modules by @felixdittrich92 in #1013
[FIX] revert dev deps mistake by @felixdittrich92 in #1047
[models] update vit and transformer layer norm by @felixdittrich92 in #1059
make pretrained backbone flexible in predictor by @felixdittrich92 in #1061
handle LocalizationConfusion memory consuption and upgrade min weasyprint version by @felixdittrich92 in #1062
Fixed small typo in references recognition by @carl-krikorian in #1070
[docs] install extras for MacBooks with M1 chip by @felixdittrich92 in #1076
update version for minor release by @felixdittrich92 in #1073

New Contributors

@calibretaliation made their first contribution in #878
@Xargonus made their first contribution in #885
@khalidMindee made their first contribution in #630
@frgfm made their first contribution in #901
@jsn5 made their first contribution in #937
@fharper made their first contribution in #936
@jonathanMindee made their first contribution in #946
@Atomme1 made their first contribution in #938
@odulcy-mindee made their first contribution in #993
@ianardee made their first contribution in #998
@aminemindee made their first contribution in #1022
@mtvch made their first contribution in #1019
@marvinmindee made their first contribution in #1033
@carl-krikorian made their first contribution in #1051

Full Changelog: v0.5.1...v0.6.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.6.0