You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tesseract.js version (version number for npm/GitHub release, or specific commit for repo)
Latest release version 5.0.4
Describe the bug
A clear and concise description of what the bug is.
Running Tesseract.js code in 2 different PSM modes gives the same output.
Is tesseract configured to give word level outputs only?
Am I guessing it right that PSMs just refine the recognition scope, but do not affect the output because it will always will be in words?
Running in SINGLE_CHAR and PSM_SINGLE_WORD gives the same output from the same sample.
I want to sort the result character by character and in order to do that, I want the bbox data of each character detected to be extracted, and used further. Is this possible?
Device Version:
OS + Version: [e.g. iOS8.1, Windows 10]
Windows 11
Browser [e.g. chrome, safari] or Node version [e.g. Node v18]
Edge
The text was updated successfully, but these errors were encountered:
Page segmentation mode (PSM) has no impact on the format or level of granularity of the output. Running with PSM SINGLE_WORD tells the Tesseract "I believe the input image contains a single word," and running with SINGLE_CHAR tells Tesseract "I believe the input image contains a single character."
If you want more granular output with character-level bounding boxes, look at the blocks output format.
Tesseract.js version (version number for npm/GitHub release, or specific commit for repo)
Latest release version 5.0.4
Describe the bug
A clear and concise description of what the bug is.
Running Tesseract.js code in 2 different PSM modes gives the same output.
Is tesseract configured to give word level outputs only?
Am I guessing it right that PSMs just refine the recognition scope, but do not affect the output because it will always will be in words?
Running in
SINGLE_CHAR
andPSM_SINGLE_WORD
gives the same output from the same sample.I want to sort the result character by character and in order to do that, I want the bbox data of each character detected to be extracted, and used further. Is this possible?
Device Version:
Windows 11
Edge
The text was updated successfully, but these errors were encountered: