Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSTM: Training - Image not trainable #590

Open
Shreeshrii opened this issue Dec 19, 2016 · 68 comments
Open

LSTM: Training - Image not trainable #590

Shreeshrii opened this issue Dec 19, 2016 · 68 comments
Labels

Comments

@Shreeshrii
Copy link
Collaborator

mkdir -p ~/tesstutorial/sanvedic
lstmtraining -U ~/tesstutorial/vedic/san.unicharset
--script_dir ../langdata --debug_interval 0
--learning_rate 10e-5
--net_spec '[1,0,0,1 Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx384 O1c5000]'
--net_mode 192
--perfect_sample_delay 19
--model_output ~/tesstutorial/sanvedic/base
--train_listfile ~/tesstutorial/vedic/san.training_files.txt
--eval_listfile /tesstutorial/vedic/san.training_files.txt
--max_iterations 50000
&>
/tesstutorial/sanvedic/basetrain.log

Setting unichar properties
Setting properties for script Common
Setting properties for script Latin
Setting properties for script Devanagari
Unichar 2306=र्त्स्न्ये->र्त्स्न्ये is too long to encode!!
Warning: given outputs 5000 not equal to unicharset of 5018.
Num outputs,weights in serial:
1,0,0,1:1, 0
Num outputs,weights in serial:
C5,5:25, 0
Ft16:16, 416
Total weights = 416
[C5,5Ft16]:16, 416
Mp3,3:16, 0
Lfys64:64, 20736
Lfx128:128, 98816
Lrx128:128, 131584
Lfx384:384, 787968
Fc5018:5018, 1931930
Total weights = 2971450
Built network:[1,0,0,1[C5,5Ft16]Mp3,3Lfys64Lfx128Lrx128Lfx384Fc5018] from request [1,0,0,1 Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx384 O1c5000]
Training parameters:
Debug interval = 0, weights = 0.1, learning rate = 0.0001, momentum=0.9
Loaded 828/828 pages (0-828) of document /home/shree/tesstutorial/vedic/san.AA_NAGARI_SHREE_L1.exp0.lstmf
Loaded 691/691 pages (0-691) of document /home/shree/tesstutorial/saneval/san.Aksharyogini2.exp0.lstmf
Loaded 1023/1023 pages (0-1023) of document /home/shree/tesstutorial/vedic/san.Sanskrit_2003.exp0.lstmf
Loaded 957/957 pages (0-957) of document /home/shree/tesstutorial/vedic/san.e-Nagari_OT.exp0.lstmf
Loaded 1060/1060 pages (0-1060) of document /home/shree/tesstutorial/vedic/san.FreeSans.exp0.lstmf
Loaded 691/691 pages (0-691) of document /home/shree/tesstutorial/saneval/san.Amiko.exp0.lstmf
Loaded 1213/1213 pages (0-1213) of document /home/shree/tesstutorial/vedic/san.Siddhanta-cakravat.exp0.lstmf
Loaded 1191/1191 pages (0-1191) of document /home/shree/tesstutorial/vedic/san.Sahadeva.exp0.lstmf
Loaded 1291/1291 pages (0-1291) of document /home/shree/tesstutorial/vedic/san.Santipur_OT_Medium.exp0.lstmf
Loaded 1115/1115 pages (0-1115) of document /home/shree/tesstutorial/vedic/san.Lohit_Devanagari.exp0.lstmf
Loaded 1210/1210 pages (0-1210) of document /home/shree/tesstutorial/vedic/san.Nakula.exp0.lstmf
Found AVX
Found SSE
Loaded 1188/1188 pages (0-1188) of document /home/shree/tesstutorial/vedic/san.Siddhanta-Calcutta.exp0.lstmf
Loaded 1211/1211 pages (0-1211) of document /home/shree/tesstutorial/vedic/san.Siddhanta.exp0.lstmf
Loaded 1214/1214 pages (0-1214) of document /home/shree/tesstutorial/vedic/san.Siddhanta-Nepali.exp0.lstmf
Loaded 1157/1157 pages (0-1157) of document /home/shree/tesstutorial/vedic/san.Uttara.exp0.lstmf
Image too large to learn!! Size = 2594x48
Image not trainable
Image too large to learn!! Size = 2758x48
Image not trainable
Image too large to learn!! Size = 2621x48
Image not trainable
At iteration 100/100/103, Mean rms=0.95%, delta=57.759%, char train=100.161%, word train=100%, skip ratio=3%, New worst char error = 100.161 wrote checkpoint

@Shreeshrii
Copy link
Collaborator Author

The images used were created by text2image with training text with word wrap which ran for full width of page.

Is there a limit to size of images for training?

Should training text only to be 70-120 characters wide?

@Shreeshrii Shreeshrii changed the title LSTM: Training - Image too large to learn LSTM: Training - Image not trainable Jan 9, 2017
@Shreeshrii
Copy link
Collaborator Author

This is the opposite case of image being too small.

Built network:[1,0,0,1[C5,5Ft16]Mp3,3Lfys64Lfx128Lrx128Lfx256Fc104] from request [1,0,0,1 Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx256 O1c5000]
Training parameters:
  Debug interval = 0, weights = 0.1, learning rate = 0.0001, momentum=0.9
Loaded 151/151 pages (1-151) of document /home/shree/tesstutorial/trado/ara.Traditional_Arabic.exp0.lstmf
Compute CTC targets failed!
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
At iteration 100/100/104, Mean rms=6.004%, delta=48.481%, char train=138.814%, word train=100%, skip ratio=4%,  New worst char error = 138.814 wrote checkpoint.

Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Compute CTC targets failed!
At iteration 200/200/207, Mean rms=5.654%, delta=40.983%, char train=119.407%, word train=100%, skip ratio=3.5%,  New worst char error = 119.407 wrote checkpoint.

Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!

@amitdo
Copy link
Collaborator

amitdo commented Jan 9, 2017

Is there a limit to size of images for training?

https://github.com/tesseract-ocr/tesseract/blob/ce76d1c569/lstm/lstmrecognizer.cpp#L266

// Maximum width of image to train on.
const int kMaxImageWidth = 2560;

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented Jan 9, 2017 via email

@amitdo
Copy link
Collaborator

amitdo commented Jan 9, 2017

Yes :-)

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented Jan 9, 2017 via email

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented Jan 17, 2017

The default value for images output by text2image can be reduced during running tesstrain.sh by modifying tesstrain_utils.sh

    common_args+=" --leading=${LEADING} --xsize 2550"

@Shreeshrii
Copy link
Collaborator Author

@theraysmith

Ray,

// Maximum width of image to train on.
const int kMaxImageWidth = 2560;

I have some old tif/box pairs . the image width is 4000.

Will training quality be degraded if changing above constant to 4000 in order to use them?

@Shreeshrii
Copy link
Collaborator Author

Also can this be changed during runtime with a variable or do I need to recompile tesseract with the higher value?

@Shreeshrii
Copy link
Collaborator Author

Changing tesstrain_utils.sh for

common_args+=" --leading=${LEADING} --xsize 2550"
fixes this.

@hanikh
Copy link

hanikh commented Aug 8, 2017

@Shreeshrii how can the problem of image being too small be fixed?

@Shreeshrii
Copy link
Collaborator Author

Usually this happens for just a few lines of an image - tesseract splits the input image into separate image per line.

It could be when layout analysis has wrongly segmented the page or a line has been detected as having hundreds of diacritics.

If it is just a few messages, you could ignore.

@theraysmith Any update regarding new line detection algorithm?

@Shreeshrii Shreeshrii reopened this Aug 9, 2017
@hanikh
Copy link

hanikh commented Aug 9, 2017 via email

@amitdo
Copy link
Collaborator

amitdo commented Aug 9, 2017

Image too large to learn!! Size = 2758x48
Image not trainable

@hanikh, please paste a short example for the errors you get.

@theraysmith
Copy link
Contributor

theraysmith commented Aug 10, 2017 via email

@hanikh
Copy link

hanikh commented Aug 12, 2017 via email

@roozgar
Copy link

roozgar commented Aug 12, 2017

@hanikh
did you used v4?
i saw this problem on cube for persian..

@hanikh
Copy link

hanikh commented Aug 12, 2017

@theraysmith would you please help me, how many text line is appropriate?
thanks

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented Aug 12, 2017 via email

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented Aug 12, 2017 via email

@hanikh
Copy link

hanikh commented Aug 12, 2017

Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Compute CTC targets failed!
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Compute CTC targets failed!
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Compute CTC targets failed!
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Compute CTC targets failed!
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Compute CTC targets failed!
Compute CTC targets failed!
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Compute CTC targets failed!
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Compute CTC targets failed!
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
Image too small to scale!! (3x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable
2 Percent improvement time=0, best error was 2.167 @ 14
At iteration 14/1100/20884, Mean rms=0.049%, delta=0%, char train=0%, word train=0%, skip ratio=1798.6%, New best char error = 0 wrote best model:/home/fanasa/tesstutorial/fastuned_from_fas/fastuned-plates0_14.lstm wrote checkpoint.

Finished! Error rate = 0
this is the error I got during training for licence plates.

@theraysmith
Copy link
Contributor

theraysmith commented Aug 13, 2017 via email

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented Aug 13, 2017 via email

@hanikh
Copy link

hanikh commented Aug 14, 2017 via email

@hanikh
Copy link

hanikh commented Aug 16, 2017 via email

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented Aug 16, 2017

where can the lang.lstm-unicharset file be found ?

combine_tessdata -u lang.traineddata lang.

It will create lang.* files , including the unicharset.

You can use dawg2wordlist to see the wordlist used

how can combine_lang_model be used?

 combine_lang_model    \
 --input_unicharset  ../tesstutorial/sanskrit2003/san/san.unicharset  \
 --script_dir "../langdata"   \
 --words "../langdata/san/san.wordlist" \
 --numbers "../langdata/san/san.numbers"   \
 --puncs "../langdata/san/san.punc" \
 --output_dir ../tesstutorial/sanskrit2003   \
 --lang "san"     --pass_through_recoder \
     --version_str "4.0.0alpha-20170816 sanskrit2003"

For RTL languages, there is an additional flag. Please see https://github.com/tesseract-ocr/tesseract/blob/master/training/tesstrain_utils.sh for details.

I used a hand-edited unicharset, because the unicharset generated from the current training process is old style. You should wait for @theraysmith to update the unichar_extractor and other langdata files.

@ghost
Copy link

ghost commented Jul 7, 2018

@Shreeshrii

I haven't had much success in finetuning.

  • Give me examples of things you failed to do with finetuning.
  • Also a sample of the training text that you chosen to use in training.

noahmetzger pushed a commit to noahmetzger/tesseract that referenced this issue Jul 31, 2018
Fixes
Image too large to learn!! Size = 2594x48
Image not trainable

See tesseract-ocr#590 (comment)
for related discussion
@Shreeshrii
Copy link
Collaborator Author

#590 (comment) by Ray Smith

Initial problem: (Image too small to scale)
Those images are ridiculously small at 3x48 pixels. Something is going
wrong somewhere with the images.
Are they oriented vertically? The input scaling scales the height to 48,
whatever it starts as, so it looks like your textlines are vertical.

This bug is still there.

@Shreeshrii
Copy link
Collaborator Author


Error in pixScaleAreaMap: pixd too small
Error in pixClone: pixs not defined
Error in pixCopyText: pixd not defined
Error in pixCopyInputFormat: pixd not defined
Scaling pix of size 35, 4548 by factor 0.0105541 made null pix!!
Error in pixGetWidth: pix not defined
Error in pixGetHeight: pix not defined
Bad pix from ImageData!
Line cannot be recognized!!
Image not trainable

with version

tesseract 4.0.0-beta.4-158-g02f9d
 leptonica-1.76.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.3.0

@anonynamja
Copy link

I had the same problem as the thread OP:

Image too large to learn!! Size = 2594x48 Image not trainable

I resolved it with this suggestion above #590 (comment)

Changing tesstrain_utils.sh for

common_args+=" --leading=${LEADING} --xsize 2550"
fixes this.

Was this the correct approach?

@msklvsk
Copy link

msklvsk commented Dec 29, 2018

Image too large to learn!! hasn’t gone. You get it with a small enough font or with 48-pixel-tall input layer even using --xsize 2550.

Image too large to learn!! Size = 4859x48

So the question is why does this constraint exist and whether it can be dropped or set to, say, 6000? Or should one prepare shorter lines after all? What would be the correct solution?

@Shreeshrii
Copy link
Collaborator Author

Similar to #590 (comment)

Error in pixScaleAreaMap: pixd too small
Error in pixClone: pixs not defined
Error in pixCopyText: pixd not defined
Error in pixCopyInputFormat: pixd not defined
Scaling pix of size 35, 4477 by factor 0.0107215 made null pix!!
Error in pixGetWidth: pix not defined
Error in pixGetHeight: pix not defined
Bad pix from ImageData!
Line cannot be recognized!!
Image not trainable

tesseract 4.1.0-rc1-255-g332a1
leptonica-1.76.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.3.0

@kamrapooja
Copy link

Hi Shree,

I am also getting same error
"Image too large to learn!! Size = 2881x36
Image not trainable"
Max size is not defined in tesstrain.sh. And as per document default size is 3600.
Why this issue is coming?

@Mohamed209
Copy link

does anyone know what is the recommended image size which bounding boxes are extracted from it to retrain tesseract , if so shall i retrain with fixed sizes or with variety of images sizes

@lnutimura
Copy link

lnutimura commented Mar 6, 2020

@Shreeshrii

I'm also experiencing this same error while trying to fine tune an existing model:

[...]
Loaded 1/1 lines (1-1) of document data/bar-ground-truth/test-0-049.exp0.lstmf
Image too large to learn!! Size = 3316x48
Image not trainable
Loaded 1/1 lines (1-1) of document data/bar-ground-truth/test-1-026.exp0.lstmf
Image too large to learn!! Size = 3316x48
[...]

As suggested in #590, I modified tesstrain_utils.sh by changing the X_SIZE variable but it didn't help.

Also, after sorting the .tif files by dimension in descending order, I noticed that the first three files aren't even that large:

files

In fact, none of my images have a width of 3316 px.
I tried to resize them w/ ImageMagick but it didn't help as well.

Why is tesseract getting these different values for dimensions?

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented Mar 7, 2020 via email

@lnutimura
Copy link

This one, for example:
bar-0-032.exp0.lstmf.zip

Thanks for replying.

@Shreeshrii
Copy link
Collaborator Author

@lnutimura Thanks for the lstmf file.

I unpacked it using an experimental feature by @stweil to check the image file in it. You are right, the image size is 2351x32.

I think the image is being resized for 48 height as part of training and that is increasing its width to 3627 and leading to the error. I had thought that the resized image maybe kept in lstmf file, but that is not the case.

Please take a look at the network spec that you are using for training. Usually the image height is either 36 or 48 in them. e.g. from https://tesseract-ocr.github.io/tessdoc/Data-Files-in-tessdata_fast

Version string:4.00.00alpha:amh:synth20170629
LSTM training info:Network str:[1,36,0,1Ct3,3,16Mp3,3Lfys48Lfx96Lrx96Lfx192O1c1], flags=41,
iteration=6112200, sample_iteration=6112270, null_char=284, learning_rate=0.001, momentum=0.5, adam_beta=0.999

Version string:4.00.00alpha:Arabic:synth20170629:[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx128O1c1]
LSTM training info:Network str:[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx128O1c1], flags=41,
iteration=5524000, sample_iteration=5532770, null_char=2, learning_rate=0.001, momentum=0.5, adam_beta=0.999

@stweil, @amitdo Suggestions for fixing this??

@Shreeshrii
Copy link
Collaborator Author

From https://tesseract-ocr.github.io/tessdoc/VGSLSpecs.html

Model string input and output
A neural network model is described by a string that describes the input spec, the output spec and the layers spec in between. Example:

[1,0,0,3 Ct5,5,16 Mp3,3 Lfys64 Lfx128 Lrx128 Lfx256 O1c105]

The first 4 numbers specify the size and type of the input, and follow the TensorFlow convention for an image tensor: [batch, height, width, depth]. Batch is currently ignored, but eventually may be used to indicate a training mini-batch size. Height and/or width may be zero, allowing them to be variable. A non-zero value for height and/or width means that all input images are expected to be of that size, and will be bent to fit if needed. Depth needs to be 1 for greyscale and 3 for color. As a special case, a different value of depth, and a height of 1 causes the image to be treated from input as a sequence of vertical pixel strips. NOTE THAT THROUGHOUT, x and y are REVERSED from conventional mathematics, to use the same convention as TensorFlow.

@lnutimura
Copy link

I'm using the por.traineddata from tessdata_best, so I guess it's:

Version string:4.00.00alpha:por:synth20170629
LSTM training info:Network str:[1,36,0,1Ct3,3,16Mp3,3Lfys48Lfx96Lrx96Lfx192O1c1], flags=41,
iteration=1850300, sample_iteration=1850422, null_char=118, learning_rate=0.001, momentum=0.5, adam_beta=0.999

@Shreeshrii
Copy link
Collaborator Author

Shreeshrii commented Mar 7, 2020

The network spec for tessdata_best is not the same as that for tessdata_fast. I don't think we have the info for all tessdata_best languages.

@Shreeshrii
Copy link
Collaborator Author

Also see #590 (comment) by Ray

The input scaling scales the height to 48,
whatever it starts as,

@lnutimura
Copy link

Oh, I see. That makes sense now.
Do you think there's a way to bypass this?
I'm generating a lot of images for each .tif using a script you provided in this issue, so manually changing them would be a little hard-working, but "doable"...

@Shreeshrii
Copy link
Collaborator Author

Are your original images 2 column, or facing pages of book? If so it will be helpful to split them before generating line images.

@lnutimura
Copy link

lnutimura commented Mar 7, 2020

They're tables that occupy the entire space of the page.
I'll group some of the columns and generate separate files for them before running the script. Let's see if it works better. Anyway, thanks for the help!

EDIT: I was able to finish the training without any error! Now it's just a matter of finding ways to improve the fine tuning.

@ciobania
Copy link

ciobania commented Apr 3, 2020

Hi there,

I'm having the same error, with 1 tif, 1000 iterations, however, the lstmtraining keeps running.
I'm running on a Jetson Nano, using:

tesseract 4.1.1-rc2-21-gf4ef
 leptonica-1.78.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
 Found libarchive 3.2.2 zlib/1.2.11 liblzma/5.2.2 bz2lib/1.0.6 liblz4/1.7.1

I'm training on a single image, just to understand the mechanism, and learn about it.
I'm using a scanned receipt, as an example, 600dpi. Identity, and imagemagick says it's 1696x3930.

I'm confused a bit by this, as the script still runs, and the error rate keeps dropping.
I've read the tutorials and examples, and the scripts, and it's all too much for now, as I've been at it for about 2-3 weeks now.

  1. Do I need to create single line images from each image I have? (~3000)
  2. would it help if I create ground-truth text files - for the entire image, only for a single line?
  3. some of the words in my images are not found in the eng.training_files.txt, as such would it speed up/help if I add them?
  4. is there a way to do fine tuning with my own images and my own eng.training_files.txt data, without running tesstrain.sh?

I could not find details about how to train/fina tune with own tif/box. It's unclear to me if I need to generate the ground-truth data as well, do I still need to fiddle/fix the box files, etc.

Sorry if I asked too many questions, I've invested so much time in it, and I'm not sure where exactly to these questions fit - forum, new issue, Google Group?

Later edit:
Funny enough, the combined new model seems to be worse than the best trained one, probably because of the image resolution/error.

@akmalkadi
Copy link

I am getting:
Image too large to learn!! Size = 4024x36

Even though all my images are 1900x17.

@Shreeshrii
Copy link
Collaborator Author

All images are resized to 36 or 48 pixels height based on network spec used. So looks like your resized image maybe too big.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests