DOCTABLE

A simple tool to extract tables from pdf and images

Overview

This repository contains the code to extract tables from pdf and images. The table extraction is done in two steps:

Table detection: the table is detected and cropped from the original image
- The table detection is done using the YOLOv8s Table Detection model.
- You can find reference to the model here
Table recognition: the table is recognized and the text is extracted
- I used the PaddleOCR models to recognize the structure and the text of the table.

Demo

You can test your own images or pdfs by simply running a demo with streamlit:

To run streamlit lunch:streamlit run .\doctable\streamlit_demo\app.py

Installation

Prerequisites

Python 3.8 or higher
Poetry

Steps

# Clone repo
git clone https://github.com/enricoGiga/doctable.git

# Go to repo directory
cd doctable

# Download poetry if you don't have it already
see: https://python-poetry.org/docs/#installation

# Install project dependencies
poetry install

Quick Start

Here is a simple example of how to use Doctable to extract tables from an image:

from src.doctable import Doctable

# Initialize Doctable
doctable = Doctable()

# Path to your image (can be jpg, png, pdf, etc.)
img_path = "/path/to/your/image.jpg"

# Extract pages, each page contains a list of tables, 
# if the path is an image there will be only one page, 
# otherwise there will be one page for each page in the pdf.
pages = doctable.table_extraction(img_path)

# Print the recognition results for each table in each page
for page in pages:
   for table in page.tables:
      print(table.recognition_results["text"])

Usage

This project provides several Jupyter notebooks that demonstrate how to use the table detection and recognition features.

How to test table detection:

The detection.ipynb notebook demonstrates how to use the table detection feature. It includes examples of detecting tables in various types of images and PDFs.

How to test table recognition:

The recognition.ipynb notebook demonstrates how to use the table recognition feature. It includes examples of recognizing the structure and text of detected tables.

How to test table detection + table recognition:

The detection%2Brecognition.ipynb notebook demonstrates how to use both the table detection and recognition features together. It includes examples of detecting tables in an image or PDF, and then recognizing the structure and text of the detected tables.

Contributing

Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

Please make sure to update tests as appropriate.

License

This project is licensed under the MIT License. See LICENSE for more details.

Contact

For any questions or support, please contact us at enrico.gigante@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github		.github
configs		configs
notebooks		notebooks
src		src
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.isort.cfg		.isort.cfg
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DOCTABLE

A simple tool to extract tables from pdf and images

Table of Contents

Overview

Demo

Installation

Prerequisites

Steps

Quick Start

Usage

How to test table detection:

How to test table recognition:

How to test table detection + table recognition:

Contributing

License

Contact

About

Releases

Packages

Languages

License

enricoGiga/doctable

Folders and files

Latest commit

History

Repository files navigation

DOCTABLE

A simple tool to extract tables from pdf and images

Table of Contents

Overview

Demo

Installation

Prerequisites

Steps

Quick Start

Usage

How to test table detection:

How to test table recognition:

How to test table detection + table recognition:

Contributing

License

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages