Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

text2image segmentation fault on macOS ( regression #195?) #736

Closed
atuyosi opened this issue Feb 25, 2017 · 41 comments
Closed

text2image segmentation fault on macOS ( regression #195?) #736

atuyosi opened this issue Feb 25, 2017 · 41 comments

Comments

@atuyosi
Copy link
Contributor

atuyosi commented Feb 25, 2017

There seems to be a regression bug with text2image of 3.05 release.
The text2image utility caused segfault, so I couldn' use tesstrain.sh on macOS (10.12.3).

$ text2image --text eng.training_text --outputbase=eng.TimesNewRomanBold.exp0 --font='Times New Roman, Bold' --fonts_dir=/Library/Fonts --tlog_level=3
query weight = 700 	 selected weight =700
query_desc: 'Times New Roman, Bold' Selected: 'Times New Roman, Bold'
Render string of size 6801
Starting page 0
max_width = 3400, max_height = 4600
len = 6801  buf_len = 6801
Found offset = 6421
Segmentation fault: 11

ref. #195

for detail:

$ lldb /usr/local/bin/text2image
(lldb) target create "/usr/local/bin/text2image"
Current executable set to '/usr/local/bin/text2image' (x86_64).
(lldb) run --text eng.training_text --outputbase=eng.TimesNewRomanBold.exp0 --font='Times New Roman, Bold' --fonts_dir=/Library/Fonts
Process 40801 launched: '/usr/local/bin/text2image' (x86_64)
Process 40801 stopped
* thread #1: tid = 0xffe117, 0x0000000100bdd5a7 libpangoft2-1.0.0.dylib`pango_fc_font_get_glyph + 25, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000100bdd5a7 libpangoft2-1.0.0.dylib`pango_fc_font_get_glyph + 25
libpangoft2-1.0.0.dylib`pango_fc_font_get_glyph:
->  0x100bdd5a7 <+25>: movq   (%rcx), %rdi
    0x100bdd5aa <+28>: testq  %rdi, %rdi
    0x100bdd5ad <+31>: je     0x100bdd5b8               ; <+42>
    0x100bdd5af <+33>: movq   %rax, %rsi
(lldb)
$ tesseract -v
tesseract 3.05.00
 leptonica-1.74.1
  libjpeg 8d : libpng 1.6.28 : libtiff 4.0.7 : zlib 1.2.8
@amitdo
Copy link
Collaborator

amitdo commented Feb 26, 2017

Here is the source for the regression:
709935851061#diff-b37dca9f063c3727f62c496e514177a9L440

@amitdo
Copy link
Collaborator

amitdo commented Feb 26, 2017

It also affects 4.00.

@amitdo
Copy link
Collaborator

amitdo commented Feb 26, 2017

@atuyosi
Copy link
Contributor Author

atuyosi commented Feb 27, 2017

Hi @amitdo

Thanks for your quick response.

I tried your patch, but same result.

$ wget https://github.com/amitdo/tesseract/commit/da7a7fa2f23f.patch
$ patch -p1 < da7a7fa2f23f.patch
patching file training/pango_font_info.cpp

try patched binary:

$ text2image --text eng.training.txt --outputbase=eng.TimesNewRomanBold.exp0 --font='Times New Roman, Bold' --fonts_dir=/Library/Fonts --tlog_level=3
query weight = 700 	 selected weight =700
query_desc: 'Times New Roman, Bold' Selected: 'Times New Roman, Bold'
Render string of size 84
Starting page 0
max_width = 3400, max_height = 4600
len = 84  buf_len = 84
Segmentation fault: 11
$ lldb /usr/local/bin/text2image
(lldb) target create "/usr/local/bin/text2image"
Current executable set to '/usr/local/bin/text2image' (x86_64).
(lldb) run --text eng.training.txt --outputbase=eng.TimesNewRomanBold.exp0 --font='Times New Roman, Bold' --fonts_dir=/Library/Fonts --tlog_level=3
Process 88534 launched: '/usr/local/bin/text2image' (x86_64)
query weight = 700 	 selected weight =700
query_desc: 'Times New Roman, Bold' Selected: 'Times New Roman, Bold'
Render string of size 84
Starting page 0
max_width = 3400, max_height = 4600
len = 84  buf_len = 84
Process 88534 stopped
* thread #1: tid = 0x122eeb9, 0x0000000100cfb5a7 libpangoft2-1.0.0.dylib`pango_fc_font_get_glyph + 25, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000100cfb5a7 libpangoft2-1.0.0.dylib`pango_fc_font_get_glyph + 25
libpangoft2-1.0.0.dylib`pango_fc_font_get_glyph:
->  0x100cfb5a7 <+25>: movq   (%rcx), %rdi
    0x100cfb5aa <+28>: testq  %rdi, %rdi
    0x100cfb5ad <+31>: je     0x100cfb5b8               ; <+42>
    0x100cfb5af <+33>: movq   %rax, %rsi
(lldb)

@amitdo
Copy link
Collaborator

amitdo commented Feb 27, 2017

Try to manually set the environment variable PANGOCAIRO_BACKEND to fc and then call text2image.

@atuyosi
Copy link
Contributor Author

atuyosi commented Feb 27, 2017

No error occurred. It looks work. It seems to be working.

$ PANGOCAIRO_BACKEND=fc text2image --text eng.training.txt --outputbase=eng.TimesNewRomanBold.exp0 --font='Times New Roman, Bold' --fonts_dir=/Library/Fonts --tlog_level=3  2&> detail_log_736_1.txt

detail_log_736_1.txt

But macOS's Preview.app can't open this tif file (although Gimp can open this file ).
I'll try to run tesstrain.sh later.

@atuyosi
Copy link
Contributor Author

atuyosi commented Feb 27, 2017

I tried to run tesstrain.sh with above patch and workaround . It works fine. Successfully generated jpn.traineddata.

$ TESSDATA_PREFIX=/usr/local/share/  PANGOCAIRO_BACKEND=fc ./training/tesstrain.sh --lang jpn --langdata_dir ../github_tesseract/langdata/  --training_text ../github_tesseract/langdata/jpn/jpn.training_text --fonts_dir /Library/Fonts/

Note: I replaced mktemp with gmktemp (coreutils ) on macOS

@amitdo
Copy link
Collaborator

amitdo commented Feb 27, 2017

It should also work without my patch.

The patch was supposed to programmatically set the said environment variable. Since you reported it did not work, I asked you to do it manually. The call to the setenv function in my patch needs to be moved to another place in the code.

@heyidi
Copy link

heyidi commented Mar 9, 2017

thank you amitdo. it fixed my problem too~~

@FernandoGOT
Copy link

Got the same problem on MAC OSX 10.13.3 using tessercat 4.0.0-beta.1, fixed it using the PANGOCAIRO_BACKEND=fc before executing the tesstrain.sh file

@amitdo
Copy link
Collaborator

amitdo commented Apr 15, 2018

@FernandoGOT,

Can you please check my new patch?
amitdo@ee2d6739cf0782

Don't set the environment variable PANGOCAIRO_BACKEND when you test this patch.

@stweil
Copy link
Member

stweil commented Apr 15, 2018

Instead of setting PANGOCAIRO_BACKEND=fc, it should also be possible to replace pango_cairo_font_map_get_default() by pango_cairo_font_map_new_for_font_type(CAIRO_FONT_TYPE_FT).

@amitdo
Copy link
Collaborator

amitdo commented Apr 15, 2018

@stweil, your suggested method looks simpler. Please make a PR.

@zdenop
Copy link
Contributor

zdenop commented Sep 27, 2018

Is this issue still valid? If yes can we get this modification to 4.0 finale release?

@amitdo
Copy link
Collaborator

amitdo commented Sep 27, 2018

I don't have a Mac, but we didn't done anything to fix this issue.

@stweil, #736 (comment)

@atuyosi
Copy link
Contributor Author

atuyosi commented Sep 27, 2018

@zdenop It has not fixed (4.0.0-beta.4-164-g5dfce).

It is still necessary to set environment variables PANGOCAIRO_BACKEND=fc .

@zdenop
Copy link
Contributor

zdenop commented Sep 28, 2018

Can you please check if 345e5ee solve the problem?

@atuyosi
Copy link
Contributor Author

atuyosi commented Sep 28, 2018

@zdenop ,

I tested 345e5ee. No error occurred.

It's OK.

$ text2image --fonts_dir /System/Library/Fonts/  --list_available_fonts
<skip>
$ echo $?
0
$ text2image --text jpn_test.txt  --outputbase=jpn.HiraginoMinchoProLicht.exp0 --font='Hiragino Mincho Pro Light' --fonts_dir=/System/Library/Fonts --tlog_level=3 --ptsize 32
<skip>
$ echo $?
0

Software Version:

$ text2image --version
4.0.0-beta.4-167-g345e
$ sw_vers
ProductName:	Mac OS X
ProductVersion:	10.14
BuildVersion:	18A391

@zdenop
Copy link
Contributor

zdenop commented Oct 20, 2018

I have to revert this patch because of side effects. Other solution should be found or implemented...

@zdenop zdenop added this to the 4.0.0 milestone Oct 20, 2018
@zdenop zdenop added the bug label Oct 20, 2018
@zdenop
Copy link
Contributor

zdenop commented Oct 20, 2018

@atuyosi: can you please try get the latest code and post output of text2image -v

@zdenop
Copy link
Contributor

zdenop commented Oct 22, 2018

I have ready something like this:

diff --git a/src/training/text2image.cpp b/src/training/text2image.cpp
index 1142bb68..3e2b8a23 100644
--- a/src/training/text2image.cpp
+++ b/src/training/text2image.cpp
@@ -50,6 +50,9 @@
 #include "tlog.h"
 #include "unicharset.h"
 #include "util.h"
+#ifdef __APPLE__
+#  include "TargetConditionals.h"
+#endif

 // A number with which to initialize the random number generator.
 const int kRandomSeed = 0x18273645;
@@ -689,5 +692,19 @@ int main(int argc, char** argv) {
     }
   }
   tesseract::ParseCommandLineFlags(argv[0], &argc, &argv, true);
+#ifdef TARGET_OS_MAC
+  // Respect enviroment variable. could be:
+  // fc (fontconfig), win32, and coretext
+  // If not set force fontconfig for Mac OS.
+  // See https://github.com/tesseract-ocr/tesseract/issues/736
+  char* backend;
+  backend = getenv("PANGOCAIRO_BACKEND");
+  if (backend == NULL) {
+    putenv("PANGOCAIRO_BACKEND=fc");
+  } else {
+    printf("Using '%s' as pango cairo backend based on enviroment "
+           "variable.\n", backend);
+  }
+#endif
   return Main();
 }

Maybe we can implement it as option --pangocairo_backend for text2image, so user will be aware about this possibility to change backend.

@amitdo
Copy link
Collaborator

amitdo commented Oct 22, 2018

    } else {
      printf("Using '%s' as pango cairo backend based on enviroment "
             "variable.\n", backend);
   }

text2image's code needs the 'fc' backend on linux and mac.
Without the 'fc' backend there will be a crash.

@zdenop
Copy link
Contributor

zdenop commented Oct 22, 2018

@amitdo : code you quoted means that program will accept what user will set (intentionally). This will give user option to choose backend (in case of new backend or fix for actual)

I have possibility to test it (just sort text - not whole training) on linux and windows:

  • On linux (opensuse 15) pango choose CAIRO_FONT_TYPE_FT automatically and I did not recognized any crash. I see no reason to modify code for linux.
  • On windows (10) pango choose CAIRO_FONT_TYPE_WIN32 automatically and I did not recognized any crash. Setting it to CAIRO_FONT_TYPE_FT (with PANGOCAIRO_BACKEND=fc). Anyway I prefer to use native solution if it works.

@zdenop
Copy link
Contributor

zdenop commented Oct 22, 2018

@tfmorris : Can you please test this ('text2image -v' and patch above?)?

@amitdo
Copy link
Collaborator

amitdo commented Oct 22, 2018

Linux always uses 'fc'.

Originally Windows had the same issue:
#380

This patch fixed the crash on windows & mac:
8d2d94e4ed32c

but Ray partly reverted it, so it is only compiled for win32 and not on linux or mac.
709935851061#diff-b37dca9f063c3727f62c496e514177a9L440

Although the workaround is still there for win32, it has this remark:

// Fixme! Leaks memory and breaks unittests.

@zdenop
Copy link
Contributor

zdenop commented Oct 23, 2018

@amitdo: thanks for analysis. Thank I will not limit this change to Mac.

@zdenop zdenop closed this as completed in e60318f Oct 23, 2018
@amitdo
Copy link
Collaborator

amitdo commented Oct 23, 2018

The new commit will still crash on mac...

@zdenop
Copy link
Contributor

zdenop commented Oct 23, 2018

Why?

@amitdo
Copy link
Collaborator

amitdo commented Oct 23, 2018

Because the default bacend on mac is 'coretext', but text2image is not designed to work with it

You should force 'fc' as backend, at least for mac.

@zdenop
Copy link
Contributor

zdenop commented Oct 23, 2018

... and the patch DOES NOT care about default backend... It cares about environment variable definition. So it should work.

@zdenop
Copy link
Contributor

zdenop commented Oct 23, 2018

@amitdo: do you have Mac or did you test it on Mac? Because you are claiming that Mac has set PANGOCAIRO_BACKEND variable by default to some value...

@zdenop
Copy link
Contributor

zdenop commented Oct 23, 2018

@atuyosi: can you please test current code?

@zdenop
Copy link
Contributor

zdenop commented Oct 23, 2018

...and patch, that reflect above logic, will do:

  1. if env var is set = do nothing (because this PANGOCAIRO_BACKEND must be set by user and he should know what he is doing)
  2. if env var is not set = set it fc (everything should works).

So crash should happened only in case 1. when user set PANGOCAIRO_BACKEND to wrong backend.

@zdenop
Copy link
Contributor

zdenop commented Oct 23, 2018

@amitdo : can you please read my comments? You miss the point. Or try the code ;-)

@amitdo
Copy link
Collaborator

amitdo commented Oct 23, 2018

I will reread the code, maybe I missed something.

@amitdo
Copy link
Collaborator

amitdo commented Oct 23, 2018

It looks okay :-)

Please pardon me.

You changed the code a few times in the last week, and I was reading the diffs, so I didn't see the whole picture. That was the reason for my confusion.

@stweil
Copy link
Member

stweil commented Dec 22, 2020

I have to revert this patch because of side effects. Other solution should be found or implemented...

@zdenop, do you remember which side effects your initial patch had? It fixes pango_font_info_test for MacOS (see discussion for PR #3189).

@amitdo
Copy link
Collaborator

amitdo commented Dec 22, 2020

See #1999 and #2009

@zdenop
Copy link
Contributor

zdenop commented Dec 22, 2020

@stweil : I do not remember, but pango was always strange for me. ;-) After pango changed build system, I lost interest to to play with as I am mainly on windows at this time...

@amitdo
Copy link
Collaborator

amitdo commented Dec 22, 2020

After pango changed build system, I lost interest to play with as I am mainly on windows at this time...

Meson is great and it does work on windows.

@stweil
Copy link
Member

stweil commented Dec 22, 2020

Thank you for your answers. The main problem was obviously unreleased memory. I now have a different solution for #3189.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants