DocLayoutAnalysis

Introduction

This repo aims to analyse the layout of document. At the moment, the project is built upon the layoutparser toolkit, which is a very powerful deep learning-based extracting approach.

Features

Easy-to-use via already-set-up pipeline.

Usage

Install the repo

git clone https://github.com/What-s-behind/DocLayoutAnalysis.git
cd DocLayoutAnalysis
pip -r install requirements.txt

Running the app: in the app.py, there is a variable that holds the file directory, place your file in that position and run the file.

python app.py

Note: I leave an example of running layoutparser in the /notebooks, you can visit it to learn running the layoutparser

Future development

Add more OCR tools (easyocr, paddleocr, etc.) for the diversity.
Add more layout models.
Introduce OCR-free framework to the repo.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dla		dla
notebooks		notebooks
samples		samples
.gitignore		.gitignore
README.md		README.md
app.py		app.py
config.yaml		config.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DocLayoutAnalysis

Introduction

Features

Usage

Future development

About

Uh oh!

Releases

Packages

Uh oh!

Languages

8Opt/DocLayoutAnalysis

Folders and files

Latest commit

History

Repository files navigation

DocLayoutAnalysis

Introduction

Features

Usage

Future development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages