TesserXtract.AI

This Flask application empowers users to seamlessly upload image files like invoices or receipts, extract text using robust OCR technologies, and efficiently isolate key fields using precise regular expressions and multiprocessing to streamline data extraction and enhance productivity.

Key Features

Image Upload: Effortlessly upload multiple image files for processing.
OCR Integration: Leverages Tesseract, a powerful open-source OCR engine, to accurately extract text from images.
Field Extraction: Precisely isolates specific fields of interest using meticulously crafted regular expressions.
JSON Output: Delivers extracted field values in a structured JSON format, promoting compatibility with downstream applications.
Multiprocessing: Optimizes performance by concurrently processing multiple image uploads, enhancing efficiency.

Technical Highlights

Python: Built upon the versatile Python programming language.
Flask: Utilizes the lightweight Flask web framework for streamlined development.
Tesseract: Integrates the robust Tesseract OCR library.
Regular Expressions: Harnesses the power of regular expressions for accurate field extraction.
Error Handling: Gracefully manages potential errors for smooth operation.
Asynchronous Processing: Explores asynchronous task queues for further performance optimization (in development).

Installation and Setup

Prerequisites:
- Python 3.x
- Tesseract OCR (install separately)
- Required Python libraries (listed in requirements.txt)
Create a virtual environment (recommended):
- python -m venv venv
- Activate the virtual environment:
  - Windows: venv\Scripts\activate.bat
  - Linux/macOS: source venv/bin/activate
Clone TesserXtract.AI:
- Clone the repository:
```
git clone https://github.com/giruu/TesserXtract.AI.git
```
Navigate to the cloned directory:
```
cd TesserXtract
```
Install dependencies:
- pip install -r requirements.txt

Running the Application

Start the Flask development server:
- flask run
Access the application:
- Open your web browser and navigate to http://127.0.0.1:5000/ (or the specified port)

Usage

Upload images: Use the file upload interface to select multiple image files.
View results: The extracted field values will be displayed in JSON format.

Additional Information

Error Handling: The application incorporates error handling for file uploads, OCR processing, and field extraction.
Asynchronous Processing: Asynchronous task queues are being explored to further optimize performance, especially for large-scale image processing.

For any questions or assistance, feel free to open an issue or contact the developer.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
input		input
static		static
templates		templates
tesseract		tesseract
views		views
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TesserXtract.AI

Key Features

Technical Highlights

Installation and Setup

Running the Application

Usage

Additional Information

About

Releases

Packages

Contributors 2

Languages

giruu/TesserXtract.AI

Folders and files

Latest commit

History

Repository files navigation

TesserXtract.AI

Key Features

Technical Highlights

Installation and Setup

Running the Application

Usage

Additional Information

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages