Character level recognition gives the same results as the word level recognition. #877

Kishlay-notabot · 2024-01-24T16:34:30Z

Tesseract.js version (version number for npm/GitHub release, or specific commit for repo)
Latest release version 5.0.4

Describe the bug
A clear and concise description of what the bug is.

Running Tesseract.js code in 2 different PSM modes gives the same output.
Is tesseract configured to give word level outputs only?
Am I guessing it right that PSMs just refine the recognition scope, but do not affect the output because it will always will be in words?
Running in SINGLE_CHAR and PSM_SINGLE_WORD gives the same output from the same sample.
I want to sort the result character by character and in order to do that, I want the bbox data of each character detected to be extracted, and used further. Is this possible?

Device Version:

OS + Version: [e.g. iOS8.1, Windows 10]
Windows 11
Browser [e.g. chrome, safari] or Node version [e.g. Node v18]
Edge

The text was updated successfully, but these errors were encountered:

Balearica · 2024-01-24T19:52:09Z

Page segmentation mode (PSM) has no impact on the format or level of granularity of the output. Running with PSM SINGLE_WORD tells the Tesseract "I believe the input image contains a single word," and running with SINGLE_CHAR tells Tesseract "I believe the input image contains a single character."

If you want more granular output with character-level bounding boxes, look at the blocks output format.

Kishlay-notabot · 2024-01-25T17:49:07Z

Thankyou for giving an insight, will close after experimenting

o7

Kishlay-notabot closed this as completed Jan 26, 2024

Kishlay-notabot mentioned this issue Feb 25, 2024

Add new example for custom granular output [v5] #896

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Character level recognition gives the same results as the word level recognition. #877

Character level recognition gives the same results as the word level recognition. #877

Kishlay-notabot commented Jan 24, 2024

Balearica commented Jan 24, 2024

Kishlay-notabot commented Jan 25, 2024

Character level recognition gives the same results as the word level recognition. #877

Character level recognition gives the same results as the word level recognition. #877

Comments

Kishlay-notabot commented Jan 24, 2024

Balearica commented Jan 24, 2024

Kishlay-notabot commented Jan 25, 2024