-
Notifications
You must be signed in to change notification settings - Fork 462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Handwritten text #1049
Comments
Hi @harindercnvrg 👋 That's indeed a long term goal for us! I would suggest dissecting the problem as follows:
|
@frgfm thank you for detailing the steps. Also, I was wondering if we cannot work with a model that is able to work with both digital and handwritten text, would it be possible to include an extra layer of classification and use separate models for handwritten and digital text based on the classification? |
@frgfm also for reference CRAFT + TrOCR |
Yes of course, but I suggested handwritten only first because if that first step doesn't work, it's extremely unlikely than handling both will work 😅
Thanks a lot for the heads up 🙏 I added those to the wishlist of new model implementation on docTR (#1007) |
@frgfm i have to disagree with both model additions (CRAFT is really not a performance beast (with VGG backbone)) and let's don't talk about TrOCR ... this is a beast from Microsoft to show how big a model can be to perform OCR 😆 No back to the facts .. TrOCR uses Roberta as Decoder we don't want to integrate some big LM (really) and i think we are also not able to train it from scratch (would only be possible if we take hf transformers as dependency). ParSeq will be a good fit also for handwritten (where it is solved in the decoding strategy without using any big LM) |
Yeah you're right! We'll have to filter the models once we have gathered all requests (& compare them) |
What is the status on this? If I understand correctly, some handwriting datasets are already added (#587), so is this issue still relevant? |
Hi @tobiascornille 👋 , Yes it is still not solved, because we have some architectures which should be able to perform well for handwritten (sar, master, vitstr) but a lack of training data. |
@felixdittrich92 Good to know. I will be trying to collect some handwriting data in the coming months, so I might be able to contribute to this then. One more question: are the current models already trained on Imgur5k? This might actually be problematic for some use cases, since the dataset is licensed under CC BY-NC 4.0 (see https://github.com/facebookresearch/IMGUR5K-Handwriting-Dataset/blob/main/LICENSE). |
@felixdittrich92 Have you considered adding the IIIT-HWS dataset? It's a synthetic dataset, but considering it 9M words and ~750 fonts, it seems promising. The first author is also the same guy behind IMGUR5k and TextStyleBrush, @kris314 |
@tobiascornille the current pretrained models are trained on an custom dataset ( internal mindee data) :) About the dataset request (looks like MJSynth but only with fonts which looks like handwritten !?): |
@felixdittrich92 I've opened an issue (kris314/hwnet#7). Let's see if the author responds. And yes, this would be very relevant for a project I'm working on. It's a side project, so I cannot commit on any timeline, but I'd like to collect some data and fine-tune a handwriting model this summer. |
@tobiascornille any progress on the fine-tuned handwriting model? ✌️ |
Hey @felixdittrich92 , I'm afraid not. For the side project I ended up using a cloud provider because it was faster :/ |
Hi, I was going to start Handwriting training now and thought to ask if someone made any progress on this. Thank you. |
@odulcy-mindee Do you have internal datasets we could use to train one detection and one recognition model for each backend ? :) (pinned to 2.0.0 so no stress 😅) |
no support for handwritten text? Can someone tell me what project is currently relevant in this direction? |
Hello. I am working on solution for parsing documents with both handwritten and printed text. I have a working TrOCR model that properly recognizes handwriting in my use-case. Are there any plans for supporting the TrOCR as recognition model in docTR? docTR already dectects handwritten text good enough for me, just adding support for TrOCR would work for me. I may share my weights for TrOCR if this functionality gets implemented. |
Hi @majudev 👋, Currently it's not planned to integrate TrOCR because it's really large compared to other architectures which are also able to recognize handwritten (if fine-tuned) and resulting from this the inference latency is really high. If you need only the detection part you could use the standalone Or if you want to benefit also from other docTR features you can build your own pipeline by wrapping the Hope this helps :) |
Handwritten is still planned but providing multilingual models prio is a bit higher atm |
🚀 The feature
Addition of new / Fine Tuning of existing models to support OCR for
Handwritten Text
.As a first step we can start with detection/prediction models that work specifically for Handwritten Text and down the line we can launch a model that works well for
both Handwritten and Typed text.
Motivation, pitch
Thousands of forms, documents and notes are scanned stored in archives but are not accessible by search. This can enable digitising such documents that contain handwritten text and enable search on them.
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: