-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible memory leak when building training files #1999
Comments
I'm also just about to report the same issue and have funished bisecting. |
Well, what I mentioned above is text2image issue. Now following command consumes more than 1GB of memory. If you say lstmtraining leaks memory, it is a different problem. |
Sorry about mixing up issues. OP's issue with tesstrain.sh is related to text2image since lstmtraining is run separately after that. I am deleting my earlier comment since it is a different issue. |
tesstrain.sh continues even if text2image crashed. |
This looks like a bug and a regression if beta.4 still was fine. So we have to decide whether fixing it is required for 4.0.0. |
It depends on how quickly you can find the faulty commit... |
A number of people use training, specially for finetuning of non-Latin languages which have been trained at Google with fewer fonts. While there are many issues with training that are on hold right now, I think this regression should be fixed before 4.0.0. |
@eighttails has identified a commit in #1999 (comment) |
Another training issue is #1052. Also see #1700 (comment) |
@eighttails : you are right. Problem is that if PangoFontMap is created with pango_cairo_font_map_get_default() it must not be freed , but if it is created with pango_cairo_font_map_new_for_font_type, it should be freed... |
Please check |
Memory resumption is looking good again (text2image 4.0.0-rc3-41-g0a42c0). However I do not get any text rendered into the images (just blank pages), whereas this worked in the same setup with 4.0.0-beta.4. I will look into this and open up a separate issue for it. |
Environment
Cut training text into smaller chunks (~20000 lines each, ~1.3mb) to build training files on smaller computers in parallel.
Tried building frk training files using:
Current Behavior:
Memory usage increases rapidly (total >15gb when process was killed by kernel)
Expected Behavior:
With same training text until 4.0-beta.4 memory usage increased at a much lower rate (max usage <4gb)
The text was updated successfully, but these errors were encountered: