v0.6.0
Highlights of the release:
Note: doctr 0.6.0 requires either TensorFlow >= 2.9.0 or PyTorch >= 1.8.0.
Full integration with Huggingface Hub (docTR meets Huggingface)
- Loading from hub:
from doctr.io import DocumentFile
from doctr.models import ocr_predictor, from_hub
image = DocumentFile.from_images(['data/example.jpg'])
# Load a custom detection model from huggingface hub
det_model = from_hub('Felix92/doctr-torch-db-mobilenet-v3-large')
# Load a custom recognition model from huggingface hub
reco_model = from_hub('Felix92/doctr-torch-crnn-mobilenet-v3-large-french')
# You can easily plug in this models to the OCR predictor
predictor = ocr_predictor(det_arch=det_model, reco_arch=reco_model)
result = predictor(image)
- Pushing to the hub:
from doctr.models import recognition, login_to_hub, push_to_hf_hub
login_to_hub()
my_awesome_model = recognition.crnn_mobilenet_v3_large(pretrained=True)
push_to_hf_hub(my_awesome_model, model_name='doctr-crnn-mobilenet-v3-large-french-v1', task='recognition', arch='crnn_mobilenet_v3_large')
Documentation: https://mindee.github.io/doctr/using_doctr/sharing_models.html
Predefined datasets can be used also for recognition task
from doctr.datasets import CORD
# Crop boxes as is (can contain irregular)
train_set = CORD(train=True, download=True, recognition_task=True)
# Crop rotated boxes (always regular)
train_set = CORD(train=True, download=True, use_polygons=True, recognition_task=True)
img, target = train_set[0]
Documentation: https://mindee.github.io/doctr/using_doctr/using_datasets.html
New models (both frameworks)
- classification: VisionTransformer (ViT)
- recognition: Vision Transformer for Scene Text Recognition (ViTSTR)
Bug fixes recognition models
- MASTER and SAR architectures are now operational in both frameworks (TensorFlow and PyTorch)
ONNX support (experimential)
- All models can now be exported into ONNX format (only TF mobilenet left for 0.7.0)
NOTE: full production pipeline with ONNX / build is planned for 0.7.0 (the models can be only exported up to the logits without any post processing included)
Further features
- our demo is now also PyTorch compatible, thanks to @odulcy-mindee
- it is now possible to detect the language of the extracted text, thanks to @aminemindee
What's Changed
Breaking Changes 🛠
- feat: ✨ allow beam width > 1 in the CRNN postprocessor by @khalidMindee in #630
- [Fix] TensorFlow SAR_Resnet31 implementation by @felixdittrich92 in #925
New Features
- [onnx] classification models export by @felixdittrich92 in #830
- feat: Added Vietnamese entry in VOCAB by @calibretaliation in #878
- feat: Added Czech to the set of vocabularies in datasets/vocabs.py by @Xargonus in #885
- feat: Add ability to upload PT/TF models to Huggingface Hub by @felixdittrich92 in #881
- [feature][tf/pt] integrate from_hub for all tasks by @felixdittrich92 in #892
- [feature] Part 2 from use datasets for recognition by @felixdittrich92 in #891
- [datasets] Add MJSynth (Synth90K) by @felixdittrich92 in #827
- [docu]: add documentation for datasets by @felixdittrich92 in #905
- add a Slack Community badge by @fharper in #936
- Feat/add language detection by @aminemindee in #1023
- add ViT as classification model TF and PT by @felixdittrich92 in #1050
- [models] add ViTSTR TF and PT and update ViT to work as backbone by @felixdittrich92 in #1055
Bug Fixes
- [PyTorch][references] fix pretrained with different vocabs by @felixdittrich92 in #874
- [classification] Fix cfgs by @felixdittrich92 in #883
- docs: Fixed typo in installation instructions by @frgfm in #901
- [Fix] imgur5k test by @felixdittrich92 in #903
- fix: Fixed load_pretrained_params in PyTorch when ignoring keys by @frgfm in #902
- [Fix]: Documentation add missing in vocabs and correct tab in sharing models by @felixdittrich92 in #904
- Fix links in readme by @jsn5 in #937
- [Fix] PyTorch MASTER implementation by @felixdittrich92 in #941
- [Fix] MJSynth dataset: filter corrupted or missing images by @felixdittrich92 in #956
- [Fix] SVT dataset: clip box values and add shape and label check by @felixdittrich92 in #955
- [Fix] Tensorflow MASTER implementation by @felixdittrich92 in #949
- [FIX] MASTER AMP and onnxruntime issue with master PT by @felixdittrich92 in #986
- pytest-api test: fix ping server step by @odulcy-mindee in #997
- docs/index: fix two minor typos by @mara004 in #1002
- Fix orientation details export by @aminemindee in #1022
- Changed return type of multithread_exec to iterator by @mtvch in #1019
- [datasets] Fix recognition parts of SynthText and IMGUR5K by @felixdittrich92 in #1038
- [Fix] rotation classifier input move to model device by @felixdittrich92 in #1039
- [models] Vit: fix intermediate size scale and unify TF to PT by @felixdittrich92 in #1063
Improvements
- chore: Applied post release modifications v0.5.1 by @felixdittrich92 in #870
- [refactor][fix]: Part1 from use datasets for recognition task by @felixdittrich92 in #889
- ci: Add swagger ping in API CI job by @frgfm in #906
- [docs] Add naming conventions for upload models to hf hub by @felixdittrich92 in #921
- docs: Improved error message of encode_string by @frgfm in #929
- [Refactor] PyTorch SAR_Resnet31 make it ONNX exportable (again) by @felixdittrich92 in #930
- Add support page in README by @jonathanMindee in #946
- [references] Add eval recognition and update eval detection scripts by @felixdittrich92 in #933
- update pypdfium2 dep and improve code quality by @felixdittrich92 in #953
- docs: Moved need help section after code snippet by @frgfm in #959
- chore: Updated TF requirements to fix grouped convolutions on CPU by @frgfm in #963
- style: Fixed mypy and moved tool configs to pyproject.toml by @frgfm in #966
- Updating the readme by @Atomme1 in #938
- Update docs in
using_doctr
by @odulcy-mindee in #993 - feat: add a basic example of text detection by @ianardee in #999
- Add pytorch demo by @odulcy-mindee in #1008
- [build] move requirements to pyproject.toml by @felixdittrich92 in #1031
- Migrate static data from github to monitoring middleware. by @marvinmindee in #1033
- Changes needed to be able to use doctr on AWS Lambda by @mtvch in #1017
- [Fix] unify recognition dataset parts return signature by @felixdittrich92 in #1041
- Updated README.md for custom fonts by @carl-krikorian in #1051
- [refactor] detection script by @felixdittrich92 in #1060
- [models] ViT add checkpoints and some rework to use pretrained ViT backbone in ViTSTR by @felixdittrich92 in #1072
- upgrade pypdfium2 by @felixdittrich92 in #1075
- ViTSTR disable pretrained backbone by default by @felixdittrich92 in #1080
Miscellaneous
- [Refactor] commit tags by @felixdittrich92 in #871
- Update
io/pdf.py
to new pypdfium2 API by @mara004 in #944 - docs: Documentation the reason for keras version specifier by @frgfm in #958
- [datasets] update IC / SROIE / FUNSD / CORD by @felixdittrich92 in #983
- [datasets] revert whitespace filtering and fix svhn reco by @felixdittrich92 in #987
- fix: update tensorflow-addons to match tensorflow version by @ianardee in #998
- move transformers implementation to modules by @felixdittrich92 in #1013
- [FIX] revert dev deps mistake by @felixdittrich92 in #1047
- [models] update vit and transformer layer norm by @felixdittrich92 in #1059
- make pretrained backbone flexible in predictor by @felixdittrich92 in #1061
- handle LocalizationConfusion memory consuption and upgrade min weasyprint version by @felixdittrich92 in #1062
- Fixed small typo in references recognition by @carl-krikorian in #1070
- [docs] install extras for MacBooks with M1 chip by @felixdittrich92 in #1076
- update version for minor release by @felixdittrich92 in #1073
New Contributors
- @calibretaliation made their first contribution in #878
- @Xargonus made their first contribution in #885
- @khalidMindee made their first contribution in #630
- @frgfm made their first contribution in #901
- @jsn5 made their first contribution in #937
- @fharper made their first contribution in #936
- @jonathanMindee made their first contribution in #946
- @Atomme1 made their first contribution in #938
- @odulcy-mindee made their first contribution in #993
- @ianardee made their first contribution in #998
- @aminemindee made their first contribution in #1022
- @mtvch made their first contribution in #1019
- @marvinmindee made their first contribution in #1033
- @carl-krikorian made their first contribution in #1051
Full Changelog: v0.5.1...v0.6.0