Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4.0 bugs on MAC OS X and a step by step for reference #1453

Closed
FernandoGOT opened this issue Apr 8, 2018 · 57 comments
Closed

4.0 bugs on MAC OS X and a step by step for reference #1453

FernandoGOT opened this issue Apr 8, 2018 · 57 comments

Comments

@FernandoGOT
Copy link

This is step by step that I used to install tesseract 4.0 on my MAC OS X and the fixes/workaround I needed to do so I could make it work.
I'm sharing this "guide" with the intention of helping other people who may have the same problems I had.

Special thanks for Shree that helped me at the google groups

Project and more details: https://github.com/tesseract-ocr/tesseract

where to get help?

google group: https://groups.google.com/forum/#!forum/tesseract-ocr
git: https://github.com/tesseract-ocr/tesseract/issues

Platform: MAC OS X 10.13.3
Tesseract: 4.0.0-beta.1-69-g10f4
leptonica-1.75.3
libjpeg 9c : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11

Found AVX2
Found AVX
Found SSE

Compiling Tesseract - tesseract 4.0

Reference: https://github.com/tesseract-ocr/tesseract/wiki/Compiling#macos

Warning: Don't install tesseract using brew, since you can't generate the ScrollView.jar from it! (At least I wasn't able to generate it)

Steps

1 - Install these libs

brew install automake autoconf autoconf-archive libtool
brew install pkgconfig
brew install icu4c
brew install leptonica
brew install gcc

2 - Run the code

ln -hfs /usr/local/Cellar/icu4c/60.2 /usr/local/opt/icu4c

Obs.: text2image is set to use icu4c/60.2 but the actual version is icu4c/61.1

3 - Clone tesseract repo

git clone https://github.com/tesseract-ocr/tesseract/

4 - Enter in the folder

cd tesseract

5 - Run the script

./autogen.sh

6 - Run the code, and copy the CPPFLAGS and LDFLAGS

brew info icu4c

7 - Update the CPPFLAGS and LDFLAGS and execute the code

./configure \
  CPPFLAGS=-I/usr/local/opt/icu4c/include \
  LDFLAGS=-L/usr/local/opt/icu4c/lib

8 - Run the code

make -j

9 - Run the code

sudo make install

10 - Run the code

sudo update_dyld_shared_cache

Obs.: this is the sudo ldconfig version for MAC OS X

11 - Run the code

make training

Creating ScrollView.jar - tesseract 4.0

Reference:
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#lstmtraining-command-line
https://github.com/tesseract-ocr/tesseract/wiki/ViewerDebugging

Important: Use the JDK 8 to build, or else it is going to return an error

Steps

1 - Download the files piccolo2d-core-3.0.jar and piccolo2d-extras-3.0.jar

http://search.maven.org/remotecontent?filepath=org/piccolo2d/piccolo2d-core/3.0/piccolo2d-core-3.0.jar
http://search.maven.org/remotecontent?filepath=org/piccolo2d/piccolo2d-extras/3.0/piccolo2d-extras-3.0.jar

2 - Move the files piccolo2d-core-3.0.jar and piccolo2d-extras-3.0.jar to tesseract/java

3 - Enter the tesseract/java folder

cd java

4 - Set the var SCROLLVIEW_PATH to your tesseract/java folder and run the code

SCROLLVIEW_PATH=~/projects/tesseract/java make ScrollView.jar

Training Font - tesseract 4.0

Reference: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#user-content-using-tesstrain

Steps

1 - Clone the langdata dir from git

git clone https://github.com/tesseract-ocr/langdata

2 - Enter the tesseract folder

cd ..

3 - Execute this code and select one font from the list (I recommend "Verdana")

text2image --list_available_fonts --fonts_dir=/Library/Fonts

Font dir for MAC can be : ~/Library/Fonts
/Library/Fonts/
/Network/Library/Fonts/
/System/Library/Fonts/
/System Folder/Fonts/

More details here: https://support.apple.com/en-us/HT201722

4 - replace the line 195 at file tesseract/training/tesstrain_utils.sh from

- export FONT_CONFIG_CACHE=$(mktemp -d --tmpdir font_tmp.XXXXXXXXXX)
+ export FONT_CONFIG_CACHE=$(mktemp -d -t font_tmp.XXXXXXXXXX)

Obs.: this is a fix for the error:

mktemp: illegal option -- -
usage: mktemp [-d] [-q] [-t prefix] [-u] template ...
       mktemp [-d] [-q] [-u] -t prefix
/Users/username/projects/tesseract/training/tesstrain_utils.sh: line 197: /sample_text.txt: Permission denied

5 - Clone the tessdata repo from git (i recommend the "tessdata_best" since it is the more precise, "tessdata_fast" is just more fast)

git clone https://github.com/tesseract-ocr/tessdata_best

or

git clone https://github.com/tesseract-ocr/tessdata_fast

6 - Copy the tessdata_best/eng.traineddata (for english training) from the tessdata you just cloned and past at tesseract/tessdata/

7 - Create the training data

PANGOCAIRO_BACKEND=fc \
~/projects/tesseract/training/tesstrain.sh \
  --fonts_dir /Library/Fonts \
  --lang eng \
  --linedata_only \
  --noextract_font_properties \
  --exposures "0"    \
  --langdata_dir ~/projects/langdata \
  --tessdata_dir ~/projects/tesseract/tessdata \
  --fontlist "Verdana" \
  --output_dir ~/tesstutorial/engtrain

Add the prefix PANGOCAIRO_BACKEND=fc if using MAC OSX

8 - Create other training data using other font to compare

PANGOCAIRO_BACKEND=fc \
~/projects/tesseract/training/tesstrain.sh \
  --fonts_dir /Library/Fonts \
  --lang eng \
  --linedata_only \
  --noextract_font_properties \
  --exposures "0"    \
  --langdata_dir ~/projects/langdata \
  --tessdata_dir ~/projects/tesseract/tessdata \
  --fontlist "Times New Roman," \
  --output_dir ~/tesstutorial/engeval

Add the prefix PANGOCAIRO_BACKEND=fc if using MAC OSX

9 - Create the needed folder

mkdir -p ~/tesstutorial/engoutput

10 - Start the training

SCROLLVIEW_PATH=~/projects/tesseract/java \
~/projects/tesseract/training/lstmtraining \
--debug_interval 100 \
--traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \
--net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \
--model_output ~/tesstutorial/engoutput/base \
--learning_rate 20e-4 \
--train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \
--eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \
--max_iterations 5000 &>~/tesstutorial/engoutput/basetrain.log

Case you failed to build ScrollView.jar, set debug_interval to -1 --debug_interval -1

11 - Monitor the log on another console

tail -f ~/tesstutorial/engoutput/basetrain.log

12 - Test Accuracy with other font

~/projects/tesseract/training/lstmeval \
  --model ~/tesstutorial/engoutput/base_checkpoint \
  --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \
  --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt

13 - Test Accuracy with best traindata

~/projects/tesseract/training/lstmeval \
  --model ~/projects/tessdata_best/eng.traineddata \
  --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt

14 - Test Accuracy with actual traindata (in this case the same as step 13)

~/projects/tesseract/training/lstmeval \
  --model ~/projects/tesseract/tessdata/eng.traineddata \
  --eval_listfile ~/tesstutorial/engtrain/eng.training_files.txt

Fine tuning - tesseract 4.0

Reference: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for-impact

Steps

1 - Create the necessary folder

mkdir -p ~/tesstutorial/verdana_from_small

2 - Start to fine tuning

~/projects/tesseract/training/lstmtraining \
  --model_output ~/tesstutorial/verdana_from_small/verdana \
  --continue_from ~/tesstutorial/engoutput/base_checkpoint \
  --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \
  --train_listfile ~/tesstutorial/engeval/eng.training_files.txt \
  --max_iterations 1200

3 - Validate the progress

~/projects/tesseract/training/lstmeval \
  --model ~/tesstutorial/verdana_from_small/verdana_checkpoint \
  --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \
  --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt

4 - Create the necessary folder

mkdir -p ~/tesstutorial/verdana_from_full

5 - Combine the trained data

~/projects/tesseract/training/combine_tessdata \
  -e ~/projects/tesseract/tessdata/eng.traineddata \
  ~/tesstutorial/verdana_from_full/eng.lstm

6 - Train merged data

~/projects/tesseract/training/lstmtraining \
  --model_output ~/tesstutorial/verdana_from_full/verdana \
  --continue_from ~/tesstutorial/verdana_from_full/eng.lstm \
  --traineddata ~/projects/tesseract/tessdata/eng.traineddata \
  --train_listfile ~/tesstutorial/engeval/eng.training_files.txt \
  --max_iterations 400

7 - Validate the results on the main training file

~/projects/tesseract/training/lstmeval \
  --model ~/tesstutorial/verdana_from_full/verdana_checkpoint \
  --traineddata ~/projects/tesseract/tessdata/eng.traineddata \
  --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt

8 - Validate the results on our training file

~/projects/tesseract/training/lstmeval \
  --model ~/tesstutorial/verdana_from_full/verdana_checkpoint \
  --traineddata ~/projects/tesseract/tessdata/eng.traineddata \
  --eval_listfile ~/tesstutorial/engtrain/eng.training_files.txt

Fine tuning add ± character - tesseract 4.0

Reference: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for--a-few-characters

Steps

1 - Modify langdata/eng/eng.training_text and include these lines:

alkoxy of LEAVES ±1.84% by Buying curved RESISTANCE MARKED Your (Vol. SPANIEL
TRAVELED ±85¢ , reliable Events THOUSANDS TRADITIONS. ANTI-US Bedroom Leadership
Inc. with DESIGNS self; ball changed. MANHATTAN Harvey's ±1.31 POPSET Os—C(11)
VOLVO abdomen, ±65°C, AEROMEXICO SUMMONER = (1961) About WASHING Missouri
PATENTSCOPE® # © HOME SECOND HAI Business most COLETTI, ±14¢ Flujo Gilbert
Dresdner Yesterday's Dilated SYSTEMS Your FOUR ±90° Gogol PARTIALLY BOARDS firm
Email ACTUAL QUEENSLAND Carl's Unruly ±8.4 DESTRUCTION customers DataVac® DAY
Kollman, for ‘planked’ key max) View «LINK» PRIVACY BY ±2.96% Ask! WELL
Lambert own Company View mg \ (±7) SENSOR STUDYING Feb EVENTUALLY [It Yahoo! Tv
United by #DEFINE Rebel PERFORMED ±500Gb Oliver Forums Many | ©2003-2008 Used OF
Avoidance Moosejaw pm* ±18 note: PROBE Jailbroken RAISE Fountains Write Goods (±6)
Oberflachen source.” CULTURED CUTTING Home 06-13-2008, § ±44.01189673355 €
netting Bookmark of WE MORE) STRENGTH IDENTICAL ±2? activity PROPERTY MAINTAINED

2 - Generate the training file

PANGOCAIRO_BACKEND=fc \
~/projects/tesseract/training/tesstrain.sh \
  --fonts_dir /Library/Fonts \
  --lang eng \
  --linedata_only \
  --noextract_font_properties \
  --langdata_dir ~/projects/langdata \
  --tessdata_dir ~/projects/tesseract/tessdata \
  --fontlist "Times New Roman," \
              "Times New Roman, Bold" \
              "Times New Roman, Bold Italic" \
              "Times New Roman, Italic" \
              "Courier New" \
              "Courier New Bold" \
              "Courier New Bold Italic" \
              "Courier New Italic" \
  --output_dir ~/tesstutorial/trainplusminus

3 - Generate the eval data

PANGOCAIRO_BACKEND=fc \
~/projects/tesseract/training/tesstrain.sh \
  --fonts_dir /Library/Fonts \
  --lang eng \
  --linedata_only \
  --noextract_font_properties \
  --langdata_dir ~/projects/langdata \
  --tessdata_dir ~/projects/tesseract/tessdata \
  --fontlist "Verdana" \
  --output_dir ~/tesstutorial/evalplusminus

4 - Combine trained data files

~/projects/tesseract/training/combine_tessdata \
  -e ~/projects/tesseract/tessdata/eng.traineddata \
  ~/tesstutorial/trainplusminus/eng.lstm

5 - Fine tuning

~/projects/tesseract/training/lstmtraining \
  --model_output ~/tesstutorial/trainplusminus/plusminus \
  --continue_from ~/tesstutorial/trainplusminus/eng.lstm \
  --traineddata ~/tesstutorial/trainplusminus/eng/eng.traineddata \
  --old_traineddata ~/projects/tesseract/tessdata/eng.traineddata \
  --train_listfile ~/tesstutorial/trainplusminus/eng.training_files.txt \
  --max_iterations 3600

6 - Test the result on other fonts

~/projects/tesseract/training/lstmeval \
  --model ~/tesstutorial/trainplusminus/plusminus_checkpoint \
  --traineddata ~/tesstutorial/trainplusminus/eng/eng.traineddata \
  --eval_listfile ~/tesstutorial/trainplusminus/eng.training_files.txt

6 - Test the result test on main font

~/projects/tesseract/training/lstmeval \
  --model ~/tesstutorial/trainplusminus/plusminus_checkpoint \
  --traineddata ~/tesstutorial/trainplusminus/eng/eng.traineddata \
  --eval_listfile ~/tesstutorial/evalplusminus/eng.training_files.txt
@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Apr 8, 2018 via email

@godofcheerup
Copy link

@FernandoGOT Thank you. /// As you know, @Shreeshrii he mentioned about problem - Fine tune -training. So I hope so. This page will be reflected soon . Thank you

@tfmorris
Copy link
Contributor

tfmorris commented Apr 9, 2018

This is a great resource! It would be even more amazing if it were in the form of a pull request of changes to the existing documentation so that it could be improved to avoid these problems for other OS X users.

@kas84
Copy link

kas84 commented May 13, 2018

I followed @FernandoGOT steps but I am getting: read_params_file: parameter not found: enable_new_segsearch when running tesseract --list-langs. It's the first time I try to build tesseract so I have no idea what it's going on. Any ideas on where to look?

@Shreeshrii
Copy link
Collaborator

@kas84 please post results of

tesseract -v

Version info.

Are you using latest source from Github ?

@kas84
Copy link

kas84 commented May 14, 2018

@Shreeshrii I cloned the repo like so git clone https://github.com/tesseract-ocr/tesseract/, so if latest version is in master, yes I am.

@Shreeshrii
Copy link
Collaborator

tesseract -v

@kas84
Copy link

kas84 commented May 14, 2018

Yeah, I forgot, sorry!

 leptonica-1.76.0
  libjpeg 9c : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11
 Found AVX2
 Found AVX
 Found SSE

@Shreeshrii
Copy link
Collaborator

Usually tesseract -v should also show the tesseract version.

Is the error only with --list-langs

Are you able to recognize any test images?

@kas84
Copy link

kas84 commented May 14, 2018

My bad:

tesseract 4.0.0-beta.1-232-g45a6
 leptonica-1.76.0
  libjpeg 9c : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11
 Found AVX2
 Found AVX
 Found SSE

It also happens when trying to recognize an image, yes.

@Shreeshrii
Copy link
Collaborator

What commands are you using?

What tessdata-dir are you using? Eg. Where is eng.traineddata installed?

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented May 14, 2018

What output do you get with the following? Use ./tessdata if you have copied eng.traineddata there.

cd tesseract
tesseract ./testing/phototest.tif - --tessdata-dir ../tessdata  -c page_separator=''

Page 1
This is a lot of 12 point text to test the
ocr code and see if it works on all types
of file format.

The quick brown dog jumped over the
lazy fox. The quick brown dog jumped
over the lazy fox. The quick brown dog
jumped over the lazy fox. The quick
brown dog jumped over the lazy fox.

@kas84
Copy link

kas84 commented May 14, 2018

captura de pantalla 2018-05-14 a las 14 02 24

@amitdo
Copy link
Collaborator

amitdo commented May 14, 2018

page _seperator

The space here confuses the command line options parser.

@karthik-ir
Copy link

Has any one built a dockerfile out of this ?

@kas84
Copy link

kas84 commented May 14, 2018

captura de pantalla 2018-05-14 a las 15 26 39

It works now! I am guessing it had something to do with my TESSDATA env

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented May 14, 2018

@amitdo
Copy link
Collaborator

amitdo commented May 14, 2018

I am guessing it had something to do with my TESSDATA env

No.

It was due to wrong command line usage.

@kas84
Copy link

kas84 commented May 14, 2018

I am a newbie with tesseract and this has nothing to do with my bug, but... is it supposed to recognize images like this?
image-numbers
Or do I need to treat the image first to remove everything but white so that tesseract can handle it?

@amitdo
Copy link
Collaborator

amitdo commented May 14, 2018

Please use the forum for asking questions.

@kas84
Copy link

kas84 commented May 14, 2018

Okay, sorry!

@ysnnzlcn
Copy link

ysnnzlcn commented Sep 1, 2018

@FernandoGOT Thank you very much for such a detailed explanation but I can't make it work. When I say "make training" it gives me "Need to reconfigure project, so there are no errors" error. Also, I couldn't create ScrollView.jar. Is it possible to update this post? Thank you.

@FernandoGOT
Copy link
Author

@ysnnzlcn I'm out of times these days (working too much), but when I get some free time I'm going to make a better step-by-step of how to use tesseract and send a merge to the docs

@ysnnzlcn
Copy link

@FernandoGOT That would be great, looking forward to it. Thanks

@hadils
Copy link

hadils commented Sep 23, 2018

Under Training Font -- Tesseract 4.0, Step 7, I get a failure:


=== Starting training for language 'eng'
[Sat Sep 22 16:56:06 MST 2018] /usr/local/bin/text2image --fonts_dir=/Library/Fonts --font=Verdana --outputbase=/var/folders/8x/69qlvhl16n56q28vy__yp10r0000gn/T/font_tmp.XXXXXXXXXX.I4GMoIqG/sample_text.txt --text=/var/folders/8x/69qlvhl16n56q28vy__yp10r0000gn/T/font_tmp.XXXXXXXXXX.I4GMoIqG/sample_text.txt --fontconfig_tmpdir=/var/folders/8x/69qlvhl16n56q28vy__yp10r0000gn/T/font_tmp.XXXXXXXXXX.I4GMoIqG

=== Phase I: Generating training images ===
Rendering using Verdana
[Sat Sep 22 16:56:09 MST 2018] /usr/local/bin/text2image --fontconfig_tmpdir=/var/folders/8x/69qlvhl16n56q28vy__yp10r0000gn/T/font_tmp.XXXXXXXXXX.I4GMoIqG --fonts_dir=/Library/Fonts --strip_unrenderable_words --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/var/folders/8x/69qlvhl16n56q28vy__yp10r0000gn/T/eng-2018-09-22.XXX.rxeEXrp0/eng.Verdana.exp0 --max_pages=0 --font=Verdana --text=/Users/hadilsabbagh/tesseract/java/langdata/eng/eng.training_text
ERROR: /var/folders/8x/69qlvhl16n56q28vy__yp10r0000gn/T/eng-2018-09-22.XXX.rxeEXrp0/eng.Verdana.exp0.box does not exist or is not readable
ERROR: /var/folders/8x/69qlvhl16n56q28vy__yp10r0000gn/T/eng-2018-09-22.XXX.rxeEXrp0/eng.Verdana.exp0.box does not exist or is not readable

I have:

Hadil-Sabbaghs-MacBook-Pro:tesseract hadilsabbagh$ tesseract -v
tesseract 4.0.0-beta.4-158-g02f9d
 leptonica-1.76.0
  libjpeg 9c : libpng 1.6.35 : libtiff 4.0.9 : zlib 1.2.11
 Found AVX2
 Found AVX
 Found SSE

My user is allowed to create files in that directory, and the directory itself is present.

Please advise.
Hadil G. Sabbagh, Ph. D.

@markedphillips
Copy link

Hi, when I try installing this it breaks here:

[Wed Sep 26-19:00:26][MEPMBP2017][(👨💻)markphillips](~/Documents/Development/Tesseract/tesseract) =>>sudo update_dyld_shared_cache Password: update_dyld_shared_cache: warning: x86_64h skipping because of bad install name /System/Library/PrivateFrameworks/FaceCore.framework/Versions/A/Resources/fcl-fc-1.dat update_dyld_shared_cache: warning: x86_64h skipping because of bad install name /System/Library/PrivateFrameworks/FaceCore.framework/Versions/A/Resources/fcl-fc-2.dat update_dyld_shared_cache: warning: x86_64h skipping because of bad install name /System/Library/PrivateFrameworks/FaceCore.framework/Versions/A/Resources/fcl-fc-3.dat update_dyld_shared_cache: warning: i386 skipping because of bad install name /System/Library/PrivateFrameworks/FaceCore.framework/Versions/A/Resources/fcl-fc-1.dat update_dyld_shared_cache: warning: i386 skipping because of bad install name /System/Library/PrivateFrameworks/FaceCore.framework/Versions/A/Resources/fcl-fc-2.dat update_dyld_shared_cache: warning: i386 skipping because of bad install name /System/Library/PrivateFrameworks/FaceCore.framework/Versions/A/Resources/fcl-fc-3.dat update_dyld_shared_cache: warning: x86_64h rejected from cached dylibs: /System/Library/PrivateFrameworks/CreateML.framework/Versions/A/CreateML (("Could not find dependency '/System/Library/PrivateFrameworks/TuriCore.framework/Versions/A/TuriCore'")) [Wed Sep 26-19:00:48][MEPMBP2017][(👨💻)markphillips](~/Documents/Development/Tesseract/tesseract) =>>

I really would like to get this working - I've spent a lot of time getting something running...any help or pointers to instructions would be greatly appreciated..

@zdenop
Copy link
Contributor

zdenop commented Oct 6, 2018

@FernandoGOT @Shreeshrii : can you put the instruction to wiki? I would like to close this issue (related to build process). it is to long and other people mixed other topics (training) here.
@FernandoGOT: can you test the recent code?

@jamesoneill54
Copy link

Hi, when I try installing this it breaks here:

[Wed Sep 26-19:00:26][MEPMBP2017][(👨💻)markphillips](~/Documents/Development/Tesseract/tesseract) =>>sudo update_dyld_shared_cache Password: update_dyld_shared_cache: warning: x86_64h skipping because of bad install name /System/Library/PrivateFrameworks/FaceCore.framework/Versions/A/Resources/fcl-fc-1.dat update_dyld_shared_cache: warning: x86_64h skipping because of bad install name /System/Library/PrivateFrameworks/FaceCore.framework/Versions/A/Resources/fcl-fc-2.dat update_dyld_shared_cache: warning: x86_64h skipping because of bad install name /System/Library/PrivateFrameworks/FaceCore.framework/Versions/A/Resources/fcl-fc-3.dat update_dyld_shared_cache: warning: i386 skipping because of bad install name /System/Library/PrivateFrameworks/FaceCore.framework/Versions/A/Resources/fcl-fc-1.dat update_dyld_shared_cache: warning: i386 skipping because of bad install name /System/Library/PrivateFrameworks/FaceCore.framework/Versions/A/Resources/fcl-fc-2.dat update_dyld_shared_cache: warning: i386 skipping because of bad install name /System/Library/PrivateFrameworks/FaceCore.framework/Versions/A/Resources/fcl-fc-3.dat update_dyld_shared_cache: warning: x86_64h rejected from cached dylibs: /System/Library/PrivateFrameworks/CreateML.framework/Versions/A/CreateML (("Could not find dependency '/System/Library/PrivateFrameworks/TuriCore.framework/Versions/A/TuriCore'")) [Wed Sep 26-19:00:48][MEPMBP2017][(👨💻)markphillips](~/Documents/Development/Tesseract/tesseract) =>>

I really would like to get this working - I've spent a lot of time getting something running...any help or pointers to instructions would be greatly appreciated..

I am having this issue too, has this been resolved here or somewhere else??

@janceChun
Copy link

@FernandoGOT Thank you very much for such a detailed explanation but I can't make it work. When I say "make training" it gives me "Need to reconfigure project, so there are no errors" error. Also, I couldn't create ScrollView.jar. Is it possible to update this post? Thank you.

Please check your output after running this code:
./configure \
CPPFLAGS=-I/usr/local/opt/icu4c/include \
LDFLAGS=-L/usr/local/opt/icu4c/lib

I came across the same error and the log showed me an issue with icu4c and also asked to install pango.

Once done, run the above code again and hopefully your error will be solved.

@jamesoneill54 https://stackoverflow.com/questions/33259191/installing-libicu-dev-on-mac/33352241 this is work for me

@stweil
Copy link
Member

stweil commented Aug 12, 2019

I suggest to close this issue. Part of the information given here is no longer up to date.

@tfmorris
Copy link
Contributor

I made a minor edit to the homebrew instructions on the wiki page,

Please share your minor edits.

@amitdo You can find my edits in the history for the wiki page.

With OpenMP you can get a major speedup, so I suggest to investigate how to make it work on macOS with Clang + LLVM's OpenMP runtime.

That's not something I have time to tackle.

I suggest to close this issue. Part of the information given here is no longer up to date.

@stweil I suggested exactly that back in Oct 2018, so obviously agree. :) If people run into new problems, they can open new issues (or just update the wiki with the necessary corrections).

@stweil stweil closed this as completed Aug 13, 2019
@jtlz2
Copy link

jtlz2 commented Sep 12, 2019

Did anyone manage to overcome the following error:

make training
Need to reconfigure project, so there are no errors

And if so how?

@stweil
Copy link
Member

stweil commented Sep 12, 2019

make training is disabled because some requirements are missing.

@jtlz2
Copy link

jtlz2 commented Sep 16, 2019

@stweil How do I diagnose which requirements are missing and why make training is disabled?

@jtlz2
Copy link

jtlz2 commented Sep 16, 2019

nvm,

configure: WARNING: pango 1.22.0 or higher is required, but was not found.
configure: WARNING: Training tools WILL NOT be built.
configure: WARNING: Try to install libpango1.0-dev package.
checking for cairo... no
configure: WARNING: Training tools WILL NOT be built because of missing cairo library.
configure: WARNING: Try to install libcairo-dev?? package.
checking that generated files are newer than configure... done

@stweil
Copy link
Member

stweil commented Sep 16, 2019

@stweil How do I diagnose which requirements are missing and why make training is disabled?

Obviously you found the answer yourself: configure says that pango 1.22.0 or higher is required, but was not found.

@khalajink
Copy link

khalajink commented Sep 19, 2019

I am getting an error when 'text2image --list_available_fonts --fonts_dir=/Library/Fonts'.

Error : 'text2image: not found'.

Can you please suggest me a direction on how i can tackle this issue?

MacOS : 10.14.6

@stweil
Copy link
Member

stweil commented Sep 19, 2019

@khalajink, I suggest to ask for help at the user forum.

@jtlz2
Copy link

jtlz2 commented Sep 19, 2019

@khalajink Did you install the training tools (including text2image)?

If so, where are they? Make sure you've included them on your $PATH.

@khalajink
Copy link

khalajink commented Sep 19, 2019

@jtlz2 I have followed the @FernandoGOT's comment, i do not see installation for text2image there, i suppose it comes along with icu4c. How do i include it in $PATH?

When i try to run 'text2image --list_available_fonts --fonts_dir=/Library/Fonts'.
Error is '-bash: /usr/local/bin/text2image: No such file or directory'.

Also I see that you had and issue related to pango version 3 days ago, even i am facing this although i have pango 1.44.6 already installed. How did you happen to solve it?

@khalajink
Copy link

Solved the the pango issue by following https://stackoverflow.com/questions/55361379/osx-compiling-training-tools-for-tesseract-4-0-pango-libraries-not-found

Also I see that you had and issue related to pango version 3 days ago, even i am facing this although i have pango 1.44.6 already installed. How did you happen to solve it?

@jtlz2
Copy link

jtlz2 commented Sep 19, 2019

@khalajink Yes, see my answer in that SO thread https://stackoverflow.com/a/57968945/1021819

@khalajink
Copy link

khalajink commented Sep 19, 2019

@jtlz2 Yes i followed your answer got the pango issue fixed but text2image issue still exists. Any idea about it?

When i try to run 'text2image --list_available_fonts --fonts_dir=/Library/Fonts'.
Error is '-bash: /usr/local/bin/text2image: No such file or directory'.

@wanzulfikri
Copy link

wanzulfikri commented Apr 7, 2020

@khalajink Yes, see my answer in that SO thread https://stackoverflow.com/a/57968945/1021819

Thanks for the answer. The commands you shared didn't work for me but the instruction on how to diagnose the issue helped a lot. It turns out that I do not have zlib installed so I installed it and now I can finally build the training tools.

@nnnikolay
Copy link

I have a different but slightly similar problem in 2020 still.

I've successfully installed the latest Tesseract (master branch) on the latest OSX (11.1 Big Sur).

tesseract 5.0.0-alpha-855-g6d86
 leptonica-1.80.0
  libgif 5.2.1 : libjpeg 9d : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 1.1.0 : libopenjp2 2.3.1
 Found AVX2
 Found AVX
 Found FMA
 Found SSE
 Found libarchive 3.4.3 zlib/1.2.11 liblzma/5.2.5 bz2lib/1.0.6 liblz4/1.9.2 libzstd/1.4.5
 Found libcurl/7.64.1 SecureTransport (LibreSSL/2.8.3) zlib/1.2.11 nghttp2/1.41.0

However, my training tools (even though they have been installed) could not find the actual files.

For example, if I call a text2image I see the following error message

This script is just a wrapper for text2image.
See the libtool documentation for more information.
ERROR: Program text2image failed. Abort.

If I enable Debug for the bash script I see the following problem

❯ text2image --list_available_fonts --fonts_dir=~/Library/Fonts
+ sed_quote_subst='s|\([`"$\\]\)|\\\1|g'
+ test -n ''
+ case `(set -o) 2>/dev/null` in
+ set -o posix
+ BIN_SH=xpg4
+ export BIN_SH
+ DUALCASE=1
+ export DUALCASE
+ unset CDPATH
+ relink_command=
+ test '' = '%%%MAGIC variable%%%'
+ test '' '!=' '%%%MAGIC variable%%%'
+ file=/usr/local/bin/text2image
+ ECHO='printf %s\n'
+ lt_option_debug=
+ func_parse_lt_options /usr/local/bin/text2image --list_available_fonts '--fonts_dir=~/Library/Fonts'
+ lt_script_arg0=/usr/local/bin/text2image
+ shift
+ for lt_opt in '"$@"'
+ case "$lt_opt" in
+ for lt_opt in '"$@"'
+ case "$lt_opt" in
+ test -n ''
++ printf '%s\n' /usr/local/bin/text2image
++ /usr/bin/sed 's%/[^/]*$%%'
+ thisdir=/usr/local/bin
+ test x/usr/local/bin = x/usr/local/bin/text2image
++ ls -ld /usr/local/bin/text2image
++ /usr/bin/sed -n 's/.*-> //p'
+ file=
+ test -n ''
+ WRAPPER_SCRIPT_BELONGS_IN_OBJDIR=no
+ test no = yes
++ cd /usr/local/bin
++ pwd
+ absdir=/usr/local/bin
+ test -n /usr/local/bin
+ thisdir=/usr/local/bin
+ program=text2image
+ progdir=/usr/local/bin/.libs
+ test -f /usr/local/bin/.libs/text2image
+ printf '%s\n' '/usr/local/bin/text2image: error: '\''/usr/local/bin/.libs/text2image'\'' does not exist'
/usr/local/bin/text2image: error: '/usr/local/bin/.libs/text2image' does not exist
+ printf '%s\n' 'This script is just a wrapper for text2image.'
This script is just a wrapper for text2image.
+ printf '%s\n' 'See the libtool documentation for more information.'
See the libtool documentation for more information.
+ exit 1

basically, all training tools can't find thier actual executable files which are located under `tesseract/.libs/

Did I miss something during the configuration?

@stweil
Copy link
Member

stweil commented Dec 21, 2020

@nnnikolay, I am sorry, that was my fault. It is now fixed with commit 421ebf0.

stweil referenced this issue Dec 21, 2020
Builds which were configured with --enable-shared did install the wrong files.
Using libtool fixes that.

Add also other flags which are used by the automake default install.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
@nnnikolay
Copy link

wow, @stweil thank you for your swift reaction. it seems that this step works now!

@ching2018
Copy link

You can see the error detail in tesseract/build/config.log about pango 1.22.0 or higher is required, but was not found!!!!!!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests