Skip to content

A Django based web application that takes files in Image, PDF and text formats from users and extracts the textual content of these files using OCR (Optical Character Recognition), summarizes the content of files using NLTK library using Page Rank algorithm and Natural Language Processing

Notifications You must be signed in to change notification settings

inkfil/OCR-In-Django-And-Tesseract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

OCRInDjangoAndTesseract

To Do list

  1. upload a file
  2. pass file through OCR [Pytesseract]
  3. pass file by NLTK summarization function
  4. create an api for east interface
  5. set environment variables for image magik and tesseract

About

A Django based web application that takes files in Image, PDF and text formats from users and extracts the textual content of these files using OCR (Optical Character Recognition), summarizes the content of files using NLTK library using Page Rank algorithm and Natural Language Processing

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published