Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[datasets/benchmark] Improve research use #855

Closed
3 tasks done
felixdittrich92 opened this issue Mar 15, 2022 · 1 comment
Closed
3 tasks done

[datasets/benchmark] Improve research use #855

felixdittrich92 opened this issue Mar 15, 2022 · 1 comment
Labels
module: datasets Related to doctr.datasets type: enhancement Improvement

Comments

@felixdittrich92
Copy link
Contributor

felixdittrich92 commented Mar 15, 2022

🚀 The feature

This request can be split into three parts:

  • ensure that any already integrated dataset which has the information (boxes & text labels) to be used for the recognition task can also be used for this (crop boxes with corresponding labels)
  • add a section in the documentation (like models) for datasets (also split into detection / recognition)
  • integrate a script for benchmarking into references/ or directly into the training script for detection and recognition which follows current papers / common used benchmarking splits

detection:
TODO
train:COCO-Text/ ... ?
val: IC03/IC13/...?

recognition:
train: MJSynth/SynthText
val: SVHN/SVT/IIIT5K/IC03/IC13 (+Funsd/Cord)

Motivation, pitch

It would be great to get a comparison to other implementations or other OCR applications for research purposes.
This would make the entire library or its implemented models a little more transparent and easier to compare with others.
As a final point, I have to add that it's just great to see if an implementation reach better benchmarks as other 😅

Additional context

Any feedback or suggestion is very welcome 💯

@felixdittrich92
Copy link
Contributor Author

closed with #933

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: datasets Related to doctr.datasets type: enhancement Improvement
Projects
None yet
Development

No branches or pull requests

2 participants