Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

peculiarities when running text2image on windows #380

Open
vidiecan opened this issue Aug 5, 2016 · 9 comments
Open

peculiarities when running text2image on windows #380

vidiecan opened this issue Aug 5, 2016 · 9 comments

Comments

@vidiecan
Copy link

vidiecan commented Aug 5, 2016

(this is more of a comment than an issue but more issues can follow and the discussion might be useful; nevertheless, it might be closed after the PR for 1. )

  1. At the moment, text2image expects fc backend e.g,:
    PangoGlyph glyph_index = pango_fc_font_get_glyph(

    but if pango is compiled with win32 support, you get the win32 font map first
#if defined(HAVE_CAIRO_WIN32)
  if (!backend || 0 == strcmp (backend, "win32"))
    return g_object_new (PANGO_TYPE_CAIRO_WIN32_FONT_MAP, NULL);
#endif
#if defined(HAVE_CAIRO_FREETYPE)
  if (!backend || 0 == strcmp (backend, "fc")
           || 0 == strcmp (backend, "fontconfig"))
    return g_object_new (PANGO_TYPE_CAIRO_FC_FONT_MAP, NULL);
#endif 

and nasty crashes follow because of the wrong reinterpret cast.

Fast Solution: specify fc backend
Solution: a simple patch will follow that fixes the behaviour for, at least, the most important functionality.
2. If fontconfig is linked as dll, putenv does not get propagated to fontconfig

std::string env("FONTCONFIG_PATH=");

Solution: specify it as environmental variable
3. You cannot use disk paths (e.g., c:) in FONTCONFIG_PATH because fontconfig strips slashes from path (FcStrCanonAbsoluteFilename) and then uses

GetFullPathNameW (dirname, 0, NULL, NULL)

without the slash and that function, interestingly, behaves like this

a file name begins with only a disk designator but not the backslash after the colon, it is interpreted as a relative path to the current directory on the drive with the specified letter.

Solution: specify a sane directory

vidiecan pushed a commit to vidiecan/tesseract that referenced this issue Aug 5, 2016
vidiecan pushed a commit to vidiecan/tesseract that referenced this issue Aug 5, 2016
zdenop added a commit that referenced this issue Aug 5, 2016
fixes some of the windows issue with text2image, see #380
@zdenop zdenop closed this as completed Aug 5, 2016
@amitdo
Copy link
Collaborator

amitdo commented Sep 7, 2016

None of these issues has been solved.

At least the first one probably affects Tesseract running in MinGW and Mac.

Fast Solution: specify fc backend

https://github.com/GNOME/pango/blob/master/pango/pangocairo-fontmap.c#L48

Something like this should be put in text2image.cpp:

#ifdef _WIN32
 putenv("PANGOCAIRO_BACKEND=fc");
#else
  setenv("PANGOCAIRO_BACKEND", "fc", 1);
#endif // _WIN32

Should be tested on Mac and MinGW before committing this code.
This issue does not affect Linux.

@amitdo
Copy link
Collaborator

amitdo commented Nov 7, 2016

I think this issue should be reopened.

@zdenop
Copy link
Contributor

zdenop commented Nov 7, 2016

AFAIK vidiecan is using VS. Is there any report from mingw users?

@zdenop zdenop reopened this Nov 7, 2016
@amitdo
Copy link
Collaborator

amitdo commented Nov 7, 2016

He fixed number (1) in his list in one place in the code. That piece of code did cause a crash on Windows+VS, MinGW(64) and Mac.
There is another similar piece of code that will probably cause a crash in some situation on all these platforms.
I suggested a solution above, but it useless to test it on Linux.

@amitdo
Copy link
Collaborator

amitdo commented Nov 7, 2016

Here is the problematic line:
https://github.com/tesseract-ocr/tesseract/blob/182ca5bc1e/training/pango_font_info.cpp#L367

You need to use text2image with the flag only_extract_font_properties to trigger the function in which this code lives.

@amitdo
Copy link
Collaborator

amitdo commented Nov 23, 2016

The dotted_circle changes in #381 caused problems (in Linux at least).
See: https://github.com/tesseract-ocr/tesseract/blob/5bb97f966885/training/pango_font_info.cpp#L438

@zdenop
Copy link
Contributor

zdenop commented Oct 20, 2019

@vidiecan : Are points 2. and 3. still valid? If yes, do you have PR for it?

zvezdochiot pushed a commit to ImageProcessing-ElectronicPublications/tesseract that referenced this issue Mar 28, 2021
zvezdochiot pushed a commit to ImageProcessing-ElectronicPublications/tesseract that referenced this issue Mar 28, 2021
fixes some of the windows issue with text2image, see tesseract-ocr#380
zvezdochiot pushed a commit to ImageProcessing-ElectronicPublications/tesseract that referenced this issue Mar 28, 2021
zvezdochiot pushed a commit to ImageProcessing-ElectronicPublications/tesseract that referenced this issue Mar 28, 2021
fixes some of the windows issue with text2image, see tesseract-ocr#380
zvezdochiot pushed a commit to ImageProcessing-ElectronicPublications/tesseract that referenced this issue Mar 28, 2021
zvezdochiot pushed a commit to ImageProcessing-ElectronicPublications/tesseract that referenced this issue Mar 28, 2021
fixes some of the windows issue with text2image, see tesseract-ocr#380
zvezdochiot pushed a commit to ImageProcessing-ElectronicPublications/tesseract that referenced this issue Mar 28, 2021
zvezdochiot pushed a commit to ImageProcessing-ElectronicPublications/tesseract that referenced this issue Mar 28, 2021
fixes some of the windows issue with text2image, see tesseract-ocr#380
@amitdo
Copy link
Collaborator

amitdo commented Sep 9, 2021

The relevant code was rewritten in Tesseract 5.0.

@stweil,

Do you know if all the issues that were mentioned by the OP were solved?

@stweil
Copy link
Member

stweil commented Sep 9, 2021

No, I don't know that and would have to run tests first. @vidiecan, did you test with the latest installer for Windows?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants