-
Notifications
You must be signed in to change notification settings - Fork 462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[datasets][PoC] Enable dataset usage for recognition task #867
Conversation
Codecov Report
@@ Coverage Diff @@
## main #867 +/- ##
==========================================
+ Coverage 94.82% 94.94% +0.11%
==========================================
Files 133 133
Lines 5200 5358 +158
==========================================
+ Hits 4931 5087 +156
- Misses 269 271 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@fg-mindee |
6b79301
to
9b2200d
Compare
@frgfm |
Hey there 🙂 So I had thought about this a few months back. To make sure we are all on the same page, the goal is to:
Correct? If so, two major options arise:
If this is for training, I'd argue the second option is the one for a few reasons :
So perhaps this could be done in a temporary or cache folder 🤷♂️ What do you think? |
@frgfm Offtopic: Im really a bit hyped to train the first models on SynthText and MJSynth when we are done with it 😅 |
d3ab5ab
to
9b2200d
Compare
I think it's mostly done i will split it into 2 PRs for easier reviews 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you fix the issues found by Codacy & CodeFactor please?
This PR is handled as Proof of Concept for further discussions to enable the ability to use existing datasets also for recognition task ( main goal: benchmarks ).
It's easier to show the idea directly in code instead of opening a discussion.
Things to investigate if the concept should be fine:
use_polygons=True
) is to slow (multiprocessing ? maybe in another PRreminder: maybe a good reference)
@fg-mindee
No worry we can split this later in parts (maybe geometry, torch, tf) for review if you want 😅
Issue:
#855 First task of this
A good documentation would be part two
(ATTENTION: reminder for docs: SROIE & SVT does only provide uppercase labels and does not match the 'case-sensitive' in images)
Any feedback is very welcome 👍
@charlesmindee @SiddhantBahuguna @fg-mindee