This Flask application empowers users to seamlessly upload image files like invoices or receipts, extract text using robust OCR technologies, and efficiently isolate key fields using precise regular expressions and multiprocessing to streamline data extraction and enhance productivity.
- Image Upload: Effortlessly upload multiple image files for processing.
- OCR Integration: Leverages Tesseract, a powerful open-source OCR engine, to accurately extract text from images.
- Field Extraction: Precisely isolates specific fields of interest using meticulously crafted regular expressions.
- JSON Output: Delivers extracted field values in a structured JSON format, promoting compatibility with downstream applications.
- Multiprocessing: Optimizes performance by concurrently processing multiple image uploads, enhancing efficiency.
- Python: Built upon the versatile Python programming language.
- Flask: Utilizes the lightweight Flask web framework for streamlined development.
- Tesseract: Integrates the robust Tesseract OCR library.
- Regular Expressions: Harnesses the power of regular expressions for accurate field extraction.
- Error Handling: Gracefully manages potential errors for smooth operation.
- Asynchronous Processing: Explores asynchronous task queues for further performance optimization (in development).
-
Prerequisites:
- Python 3.x
- Tesseract OCR (install separately)
- Required Python libraries (listed in requirements.txt)
-
Create a virtual environment (recommended):
python -m venv venv
- Activate the virtual environment:
- Windows:
venv\Scripts\activate.bat
- Linux/macOS:
source venv/bin/activate
- Windows:
-
Clone TesserXtract.AI:
- Clone the repository:
git clone https://github.com/giruu/TesserXtract.AI.git
Navigate to the cloned directory:
cd TesserXtract
-
Install dependencies:
pip install -r requirements.txt
- Start the Flask development server:
flask run
- Access the application:
- Open your web browser and navigate to
http://127.0.0.1:5000/
(or the specified port)
- Open your web browser and navigate to
- Upload images: Use the file upload interface to select multiple image files.
- View results: The extracted field values will be displayed in JSON format.
- Error Handling: The application incorporates error handling for file uploads, OCR processing, and field extraction.
- Asynchronous Processing: Asynchronous task queues are being explored to further optimize performance, especially for large-scale image processing.
For any questions or assistance, feel free to open an issue or contact the developer.