Skip to content

indic-ocr/ocrservice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Host and run OCR as a service within your organisation or community.

OCR service is dependent on following:

  1. Java
  2. Maven
  3. Olena
  4. Tesseract
  5. Tessdata (for Indic scripts support)

Checkout the code

git clone https://github.com/indic-ocr/ocrservice.git 

To compile and start the server use following command

mvn package  && java -jar target/IndicOCR-jar-with-dependencies.jar <path_to_olena>/scribo/src/content_in_doc

On my local system it looks like this

mvn package  && java -jar target/IndicOCR-jar-with-dependencies.jar ~/ocr/olena/olena/scribo/src/content_in_doc

The server start on port 8081 and exposes 3 webservice APIs

  • /ocr which converts and image to an ODT file
  • /india which converts an image to text using the scribo engine
  • /indiastring which converts an image (uploaded, http url or data url) using tesseract or scribo and can also do invert or binarization of image before passing it to OCR engine

An experimental server is available on http://35.164.84.230:8081/. All images are removed from the server at least once a day and they are not stored

####Usage Examples

/ocr

curl   -F "dpi=300"   -F "lang=eng"   -F "myfile=@<path_to_image_file>" http://35.164.84.230:8081/ocr

/india

curl   -F "tolang=eng"   -F "sourcelang=pan"   -F "myfile=@<path_to_binarized_image>" http://35.164.84.230:8081/india

/indiastring

curl -H "Content-Type: application/json" -X POST -d '{"filePath":"<http url or data url >", "sourcelang":"pan","tolang":"eng","operation":"invert","engine":"tesseract"}' http://35.164.84.230:8081/indiastring
  • Allowed operations are normal, invert or binarize
  • Allowed values for engine are tesseract or scribo
  • All language parameters need to be 3 letter codes ( eg: eng for English, tam for Tamil)

Authors and Contributors

@rkvsraman

Help

Please join the project and help by code contributions or by reporting bugs.