-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1 Text2Image.exe binary please? #396
Comments
|
I have tried this before. Even if you get a binary of text2image for As Quan has suggested you can use jtessboxeditor for generating the box Or get access to a linux machine.
On 28-Aug-2016 11:32 AM, "z0tghvunik" notifications@github.com wrote:
|
@Shreeshrii, there is a new installer on https://github.com/UB-Mannheim/tesseract/wiki. It includes fixes for text2image.exe. If that binary still crashes, I need all information to reproduce the crash. |
|
text2image --fonts_dir= --text ./langdata/san.training_text --outputbase san.exp-1 --ptsize=32 --strip_unrenderable_words --fontconfig_refresh_config_file=false --leading=32 --char_spacing=0.0 --exposure=-1 --find_fonts --min_coverage=.9 --degrade_image=1 --underline_start_prob=.05 --underline_continuation_prob=.01 |
@stweil Thank you for providing the updated binary for text2image - many problems have indeed been fixed since I last looked at it. Thanks to the developers. However, it crashed under two situations today.
I will test further and post more feedback later. |
Here is a copy of the terminal log with all commands I tried and their output. |
The following command is creating the box-tiff pairs with degradation as well as differnt exposure levels as indicated ..
I have done this testing on Windows 10. |
I tried the latest version of the program uploaded today on Windows10 and found that it now works but is unstable. It would fail for Arial font and could not find Times New Roman (the two fonts are most commonly used). The boxes in the generated box file were not as tight as they could be. text2image --text=vie-data.txt --outputbase=vie.arial.exp1 --font="Tahoma" --fonts_dir=C:\Windows\Fonts text2image --text=vie-data.txt --outputbase=vie.arial.exp1 --font=Arial --fonts_dir=C:\Windows\Fonts text2image --text=vie-data.txt --outputbase=vie.arial.exp1 --font="Courier New" --fonts_dir=C:\Windows\Fonts text2image --text=vie-data.txt --outputbase=vie.arial.exp1 --font="Times New Roman" --fonts_dir=C:\Windows\Fonts text2image --text=vie-data.txt --outputbase=vie.arial.exp1 --font="Arial Unicode MS Regular" --fonts_dir=C:\Windows\Fonts |
The previous command does not work because CMD on Windows does not handle
It looks like this error messages can be improved by a line break after the first sentence. I'll send a PR which fixes this small detail.
That command also crashes with SIGSEGV on Linux. This is a bug which needs a fix. |
Training on Windows is not officially supported (but we accept patches):
@Shreeshrii @stweil : please create separate issue for command that crash on linux, so we can track it. |
If you want to compile text2image for windows using VS2015, you can have a look at a fully automated process at It might take a you a while to get the grasp of it (hopefully, hours not months) but you will get your text2image version that you can debug (and send PRs to tesseract). More details:
BUT Finally, do not expect a bulletproof text2image even after patching - more needs to be done to address several corner cases but you have everything needed for this mission. |
@stweil The problem with font not found message was not just of misplaced period. These fonts are there on Windows but text2image is NOT finding them.
Ok, the above shows that Time New Roman also has a , at end of font name. So I tried with that, and results differ based on order in which the parameters are given etc . eg. --fonts_dir= should be given first .
|
try
As @Shreeshrii said, try |
@Shreeshrii: "--fonts_dir=" is wong argument |
@zdenop OK Please see my previous comment, in that I have used |
On Windows10, I get the
|
@Shreeshrii: These errors are comming from external library (Pango/FontConfig?), which are IMO not common on Windows. IMO tessting&issue reporting should be reported there. |
On further investigation, I see that https://github.com/tesseract-ocr/tesseract/blob/master/training/pango_font_info.cpp overrides system and fontconfig defaults ..
The When used the first time, it creates
|
As of now, the two errors still unexplained with text2image under Windows are
@zdenop I can test and report errors to Pango/FOntConfig, but tesseract does not provide any error info that I can refer to. |
I'm currently working on the problem with Arial. That font is found (otherwise there would be an error message), but results in SIGSEGV - maybe from an assertion. It looks like Windows buffers console messages and fails to print them before raising the SIGSEGV. |
Thanks, Stefan. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Aug 31, 2016 at 5:36 PM, Stefan Weil notifications@github.com
|
The crash with Arial is caused by a bug in function strcasestr (locally implemented only for Windows, Linux uses the correct GLIBC implementation). Any short font name (5 characters or less) will result in a similar crash. I'll send a pull request which fixes this. |
PR #406 fixes the problem with Arial (and other fonts with short names) for text2image on Windows. |
Problem 2 (use of --find_fonts) is also caused by the buggy strcasestr function and fixed by PR #406:
|
What do you mean? Is this also happening in Linux? |
The latest version has fixed the issue with Arial font. Thank you. Clearly, the tool produces inconsistencies in font names. Why is "Times New Roman," a valid name, especially it's a plain style? 298: Times New Roman, @amitdo Almost all the generated boxes (created in Windows 10) are consistently a bit low and a bit wide. It was reported that having tightly fitted boxes would improve the quality of the generated traineddata file. |
@Shreeshrii: I need to correct my statement:
|
Thank you for the changes to get text2image working on windows and for making the latest version available via installer at https://github.com/UB-Mannheim/tesseract/wiki I have added a link to the same from https://github.com/tesseract-ocr/tesseract/wiki so that it is easily accessible. |
Hi I have downloaded jtessboxeditor and extracted the files. I dowloaded the java runtime environment too. I have opened the jtessboxeditor.jar file, is getting popped up, but can't accessible. I used the same application yesterday but today i am facing this issue. Can anyone help me to sort out this issue. |
@shobamohan123 Please post your issue or question related to jTessBoxEditor in the appropriate box in either https://sourceforge.net/p/vietocr/discussion or https://github.com/nguyenq. Thanks. |
This is work for me # list avaiable font
text2image --fontconfig_tmpdir=. -text my.txt --outputbase test.exp0 --fonts_dir="C:\xxx\myDir" --list_available_fonts
# Start
text2image --fontconfig_tmpdir=. -text my.txt --outputbase test.exp0 --fonts_dir="C:\xxx\myDir" --font myFont --ptsize 36 |
See guys.. I badly need Text2image.exe but i cannot find it anywhere.
Is there any great soul in this world who will take the time to compile that 1 thing and upload it to mediafire or something?
This one thing has consumed 5 months of my life :'-(
I tried to compile it on windows 32 bit but it gave 100+ errors :'-(
Dear c/cpp experts.. Instead of telling everybody how to compile, isn't it a good idea to directly provide a compiled version?
I am not any intelligent software eng. I am just a normal human being.
Why didn't the developers take some time to upload the compiled binaries? :'-(
Somebody please help!
I now feel pain in my heart for wasting 5 months of my life for 1 program.
Somebody please compile 'Text2Image.cpp' for the needy who don't know how to do it.
P.S I have downloaded Tesseract 3.05 but there does not exist any 'text2image.EXE' :'-(
The text was updated successfully, but these errors were encountered: