-
Notifications
You must be signed in to change notification settings - Fork 225
Stroke Width Transform
The latest version of OpenOCR has the ability to preprocess the image using Stroke Width Transform, which is capable of removing the non-text pixels from an image.
Here is an example of Stroke Width Transform In Action.
This was just recently added, and so the launcher.sh
script hasn't been updated yet. In the meantime, you should be able to get it running with the following:
$ export AMQP_URI=amqp://admin:${RABBITMQ_PASS}@${RABBITMQ_HOST}/
$ docker run -d tleyden5iwx/open-ocr-preprocessor open-ocr-preprocessor -amqp_uri "${AMQP_URI}" -preprocessor "stroke-width-transform"
$ curl -X POST -H "Content-Type: application/json" -d '{"img_url":"http://bit.ly/ocrimage-swt","engine":"tesseract", "preprocessors":["stroke-width-transform"]}' http://${RABBITMQ_HOST}:${HTTP_PORT}/ocr
Expected result:
YH XMCDMTDC
$ curl -X POST -H "Content-Type: application/json" -d '{"img_url":"http://bit.ly/ocrimage-swt","engine":"tesseract"}' http://${RABBITMQ_HOST}:${HTTP_PORT}/ocr
Expected result:
E' ,‘YHwacpMTDCH ;
3?". ‘ V‘L"~m> I shah-r}. I’VMU' i 5: 1“”. A"
As you can see, in this particular case the Stroke Width Transform makes a huge positive difference.
By default, it expects black text on a white background. However, if you have white text on a black background, you will want to pass an additional parameter as follows:
curl -X POST -H "Content-Type: application/json" -d '{"img_url":"http://bit.ly/ocrimage-swt","engine":"tesseract", "preprocessors":["stroke-width-transform"], "preprocessor-args":{"stroke-width-transform":"0"}}' http://${RABBITMQ_HOST}:${HTTP_PORT}/ocr
Legal values for preprocessor-args/stroke-width-transform
:
- "0" -- white text on a black background
- "1" -- black text on a white background (default)
In the case of this test image, since it's black test on a white background, passing in "0" completely breaks the OCR and it returns no output.