DocSavvy

Introduction

Document Parsing involves examining the data in a document and extracting useful information. The purpose of this web application is to provide an interface that extracts text from an image, then search for a specific phrase also returns its position in the given image.A document parser can automate the process of extracting information from large volumes of unstructured text documents, which can save time and reduce manual laborcosts. It can reduce the risk of errors and inconsistencies that can occur when information is manually extracted from documents. It can provide a scalable solution for organizations that need to process a large number of documents

Run Locally

Clone the repo in virtual environment and open the website using the following commands:

git clone https://github.com/aditiganvir28/Document_Parser.git
cd Document_Parser/client
npm install
npm run dev

Features

Text extraction from image/pdf
Extraction of emails, phone numbers
Summarization of extracted text
Sentiment classification of text into classes: positive and negative
Highlighting location of input word/phrase in extracted text as well as in the input image through formation of bounding boxes in all its appearances
Voice input for search function
Extracted text to Speech Conversion
Exporting the information to a pdf document which can be downloaded by the user

Tech Stacks & Libraries

ReactJS
Hyper Text Markup Language (HTML)
Cascading Style Sheets (CSS)
Javascript
NodeJS
Tesseract.js
React-pdf, React-speech, React-speech-kit

Model API's

Acknowledgement

This software project was developed as a part of the course CS258(Software Management) under the guidance of Dr.Puneet Gupta, Assistant Professor Discipline of Computer Science and Engineering at IIT Indore.

Team Members

Group-15

Aditi Ganvir(210001016)
Prajakta Darade(210001052)
Princy Sondarva(210001068)
Tanisha Sahu(210001071)

References

https://tesseract.projectnaptha.com

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
client		client
.gitignore		.gitignore
README.md		README.md
Software Project Report .pdf		Software Project Report .pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocSavvy

Contents

Introduction

Run Locally

Features

Tech Stacks & Libraries

Model API's

Acknowledgement

Team Members

References

About

Releases

Packages

Languages

10isha/Document_Parser

Folders and files

Latest commit

History

Repository files navigation

DocSavvy

Contents

Introduction

Run Locally

Features

Tech Stacks & Libraries

Model API's

Acknowledgement

Team Members

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages