Tablify - Convert Images to Tabular Data using OCR

Tablify is a Python-based tool that converts tabular data from images into CSV files using Optical Character Recognition (OCR). It processes images, extracts the text using pytesseract, and organizes it into rows and columns for easy data extraction and analysis.

Features:

Converts images of tables into structured CSV files.
Uses pytesseract to perform OCR on images.
Processes images to detect individual text blocks, sort them by coordinates, and group them into rows.

Installation:

Clone the repository:

git clone https://github.com/Preetraj2002/Tablify.git
cd Tablify

Install required dependencies:

Make sure you have Python 3.x installed. Then, install the required libraries:
```
pip install -r requirements.txt
```
Install Tesseract OCR:
- Windows: Download the Tesseract installer from here and add the path to your system environment variables.
- Linux: Install Tesseract using:
```
sudo apt install tesseract-ocr
```
- macOS: Use Homebrew to install Tesseract:
```
brew install tesseract
```

How to Use:

Prepare an Image: Ensure the image contains tabular data that you want to extract. The tool works best with clear, well-contrasted images.
Run the Script: After setting up, simply run the script on your image:
```
python tablify.py path/to/your/image.jpg
```
This will generate a output.csv file in the same directory.
Check the Output: Open output.csv to see the extracted table data in tabular format.

Process Inside Tablify:

Image Preprocessing: The image is converted to grayscale, and a binary thresholding is applied to make the text clearer for OCR.

Original:

Grayscale:

After OTSU thresholding:

Dilation:
Contour Detection: Using OpenCV, contours of the text blocks are identified to group text into rows and columns.

Marked Countours:

Marked Centroids of the countours:
Text Extraction: Each text block is processed with pytesseract to extract the text, which is then organized into a structured CSV format.
CSV Generation: The processed text is organized into rows based on vertical alignment and saved as a CSV file.

License:

This project is licensed under the MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tablify - Convert Images to Tabular Data using OCR

Features:

Installation:

How to Use:

Process Inside Tablify:

License:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
images		images
README.md		README.md
output.csv		output.csv
requirements.txt		requirements.txt
tablify.py		tablify.py

Preetraj2002/Tablify

Folders and files

Latest commit

History

Repository files navigation

Tablify - Convert Images to Tabular Data using OCR

Features:

Installation:

How to Use:

Process Inside Tablify:

License:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages