Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding ViTSTR #513

Closed
felixdittrich92 opened this issue Sep 29, 2021 · 6 comments · Fixed by #1055
Closed

Adding ViTSTR #513

felixdittrich92 opened this issue Sep 29, 2021 · 6 comments · Fixed by #1055
Assignees
Labels
module: models Related to doctr.models topic: text recognition Related to the task of text recognition type: enhancement Improvement

Comments

@felixdittrich92
Copy link
Contributor

Adding Vision Transformer for scene text recognition i work currently on this (with huggingface ViT backbone) if i done and have solid results it would be a charme for me to add this model if you interested !? :)
Same for the new unilm/TrOCR model

@charlesmindee charlesmindee added type: enhancement Improvement module: models Related to doctr.models topic: text recognition Related to the task of text recognition labels Sep 30, 2021
@charlesmindee charlesmindee self-assigned this Sep 30, 2021
@charlesmindee
Copy link
Collaborator

charlesmindee commented Sep 30, 2021

Hi @felixdittrich92,

Thanks for your message, it would be a pleasure having you contributing to the lib!

We already have a recognition model including a transformer decoder (MASTER), but we do not have yet full transformer architectures such as ViT or TrOCR. It is on the mid-term road map, and if you would like to propose your implementation you are more than welcome to open a PR! 🙏

Please read the CONTRIBUTING section and feel free to look at the models already implemented in doctr 😃

Thank you and have a nice day 👍

@felixdittrich92
Copy link
Contributor Author

i will do thanks :) 👍

@charlesmindee
Copy link
Collaborator

Hi @felixdittrich92, do you still plan to implement this ? If not, we may close this issue to avoid a huge stack of unaddressed ones!

@felixdittrich92
Copy link
Contributor Author

Huhu @charlesmindee 👋 ,
yes of course (maybe a bit lighter version with mobilevit) but i think ftm there are other thinks like a fix for master and sar are more important so i would say lets hold this on 1.0.0 wdyt ? 👍

@charlesmindee
Copy link
Collaborator

ok

@chpatrick
Copy link

@felixdittrich92 Hi, are there any model weights available for ViTSTR that are compatible with doctr? :)

I saw these ones but they seem to be named differently I suppose: https://github.com/roatienza/deep-text-recognition-benchmark/releases

@felixdittrich92 felixdittrich92 removed this from the 1.0.0 milestone Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: models Related to doctr.models topic: text recognition Related to the task of text recognition type: enhancement Improvement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants