Possible memory leak when building training files #1999

H-Bluhm · 2018-10-18T09:25:14Z

Environment

Tesseract Version: 4.0-rc3, 4.0-rc2, 4.0-rc1
Platform: Linux 9000119697 4.15.0-36-generic Update for Github & fix spelling #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Cut training text into smaller chunks (~20000 lines each, ~1.3mb) to build training files on smaller computers in parallel.
Tried building frk training files using:

src/training/tesstrain.sh \
--lang frk \
--linedata_only \
--noextract_font_properties \
--fonts_dir ~/frk_fonts/ \
--langdata_dir ~/langdata_lstm/ \
--tessdata_dir ~/tessdata/ \
--output_dir ~/tessOutput/

Current Behavior:

Memory usage increases rapidly (total >15gb when process was killed by kernel)

Expected Behavior:

With same training text until 4.0-beta.4 memory usage increased at a much lower rate (max usage <4gb)

The text was updated successfully, but these errors were encountered:

eighttails · 2018-10-18T14:13:25Z

I'm also just about to report the same issue and have funished bisecting.
Maybe related to this commit.
345e5ee

eighttails · 2018-10-18T14:21:19Z

Well, what I mentioned above is text2image issue.

Now following command consumes more than 1GB of memory.
text2image.exe --fonts_dir /c/Windows/Fonts --font Meiryo --text langdata/jpn/jpn.training_text --max_pages 0 --outputbase test
It used to consume about 100MB before.
tesstrain.sh runs 8 tasks simultanously so if each task consumes 1GB RAM it crashes computer with 8GB RAM.

If you say lstmtraining leaks memory, it is a different problem.

Shreeshrii · 2018-10-18T14:26:29Z

Sorry about mixing up issues.

OP's issue with tesstrain.sh is related to text2image since lstmtraining is run separately after that. I am deleting my earlier comment since it is a different issue.

eighttails · 2018-10-18T14:34:00Z

tesstrain.sh continues even if text2image crashed.
After that lstmtraining generates error message because text2image died without generating box files.

stweil · 2018-10-18T15:12:08Z

This looks like a bug and a regression if beta.4 still was fine. So we have to decide whether fixing it is required for 4.0.0.

amitdo · 2018-10-18T15:20:52Z

It depends on how quickly you can find the faulty commit...

Shreeshrii · 2018-10-18T15:21:47Z

A number of people use training, specially for finetuning of non-Latin languages which have been trained at Google with fewer fonts.

While there are many issues with training that are on hold right now, I think this regression should be fixed before 4.0.0.

Shreeshrii · 2018-10-18T15:22:31Z

@eighttails has identified a commit in #1999 (comment)

amitdo · 2018-10-18T15:34:05Z

Another training issue is #1052.

Also see #1700 (comment)

zdenop · 2018-10-18T20:40:28Z

@eighttails : you are right. Problem is that if PangoFontMap is created with pango_cairo_font_map_get_default() it must not be freed , but if it is created with pango_cairo_font_map_new_for_font_type, it should be freed...

zdenop · 2018-10-18T22:49:16Z

Please check

mgeerdsen · 2018-10-19T14:03:18Z

Memory resumption is looking good again (text2image 4.0.0-rc3-41-g0a42c0).

However I do not get any text rendered into the images (just blank pages), whereas this worked in the same setup with 4.0.0-beta.4. I will look into this and open up a separate issue for it.

This reverts commit d1d73b9.

stweil added the bug label Oct 18, 2018

stweil added this to the 4.0.0 milestone Oct 18, 2018

zdenop mentioned this issue Oct 18, 2018

tesstrain.sh continues even if text2image crashed. #2005

Closed

zdenop closed this as completed in d1d73b9 Oct 18, 2018

mgeerdsen mentioned this issue Oct 19, 2018

text2image segfault when using --list_available_fonts #2009

Closed

zdenop added a commit that referenced this issue Oct 20, 2018

Revert "free PangoFontMap; fixes #1999"

276c684

This reverts commit d1d73b9.

amitdo mentioned this issue Dec 22, 2020

text2image segmentation fault on macOS ( regression #195?) #736

Closed

eighttails mentioned this issue May 3, 2021

text2image eating memory again #3413

Open

amitdo added the text2image label Oct 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible memory leak when building training files #1999

Possible memory leak when building training files #1999

H-Bluhm commented Oct 18, 2018

eighttails commented Oct 18, 2018

eighttails commented Oct 18, 2018 •

edited

Loading

Shreeshrii commented Oct 18, 2018 •

edited

Loading

eighttails commented Oct 18, 2018

stweil commented Oct 18, 2018

amitdo commented Oct 18, 2018

Shreeshrii commented Oct 18, 2018

Shreeshrii commented Oct 18, 2018

amitdo commented Oct 18, 2018 •

edited

Loading

zdenop commented Oct 18, 2018

zdenop commented Oct 18, 2018

mgeerdsen commented Oct 19, 2018

Possible memory leak when building training files #1999

Possible memory leak when building training files #1999

Comments

H-Bluhm commented Oct 18, 2018

Environment

Current Behavior:

Expected Behavior:

eighttails commented Oct 18, 2018

eighttails commented Oct 18, 2018 • edited Loading

Shreeshrii commented Oct 18, 2018 • edited Loading

eighttails commented Oct 18, 2018

stweil commented Oct 18, 2018

amitdo commented Oct 18, 2018

Shreeshrii commented Oct 18, 2018

Shreeshrii commented Oct 18, 2018

amitdo commented Oct 18, 2018 • edited Loading

zdenop commented Oct 18, 2018

zdenop commented Oct 18, 2018

mgeerdsen commented Oct 19, 2018

eighttails commented Oct 18, 2018 •

edited

Loading

Shreeshrii commented Oct 18, 2018 •

edited

Loading

amitdo commented Oct 18, 2018 •

edited

Loading