Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

text: change default char_whitelist parameter. #3462

Merged
merged 1 commit into from
Apr 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions modules/text/include/opencv2/text/ocr.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -153,14 +153,16 @@ class CV_EXPORTS_W OCRTesseract : public BaseOCR
@param datapath the name of the parent directory of tessdata ended with "/", or NULL to use the
system's default directory.
@param language an ISO 639-3 code or NULL will default to "eng".
@param char_whitelist specifies the list of characters used for recognition. NULL defaults to
"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ".
@param char_whitelist specifies the list of characters used for recognition. NULL defaults to ""
(All characters will be used for recognition).
@param oem tesseract-ocr offers different OCR Engine Modes (OEM), by default
tesseract::OEM_DEFAULT is used. See the tesseract-ocr API documentation for other possible
values.
@param psmode tesseract-ocr offers different Page Segmentation Modes (PSM) tesseract::PSM_AUTO
(fully automatic layout analysis) is used. See the tesseract-ocr API documentation for other
possible values.

@note The char_whitelist default is changed after OpenCV 4.7.0/3.19.0 from "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" to "".
*/
CV_WRAP static Ptr<OCRTesseract> create(const char* datapath=NULL, const char* language=NULL,
const char* char_whitelist=NULL, int oem=OEM_DEFAULT, int psmode=PSM_AUTO);
Expand Down
4 changes: 3 additions & 1 deletion modules/text/src/ocr_tesseract.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -163,10 +163,12 @@ class OCRTesseractImpl CV_FINAL : public OCRTesseract
tesseract::PageSegMode pagesegmode = (tesseract::PageSegMode)psmode;
tess.SetPageSegMode(pagesegmode);

// tessedit_whitelist default changes from [0-9a-zA-Z] to "".
// See https://github.com/opencv/opencv_contrib/issues/3457
if(char_whitelist != NULL)
tess.SetVariable("tessedit_char_whitelist", char_whitelist);
else
tess.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ");
tess.SetVariable("tessedit_char_whitelist", "");

tess.SetVariable("save_best_choices", "T");
#else
Expand Down