layout-analysis

Here are 49 public repositories matching this topic...

opendatalab / MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具，将PDF转换成Markdown和JSON格式。

python pdf parser ocr pdf-converter extract-data document-analysis pdf-parser layout-analysis ai4science pdf-extractor-rag pdf-extractor-llm pdf-extractor-pretrain

Updated Aug 2, 2025
Python

Layout-Parser / layout-parser

Star

A Unified Toolkit for Deep Learning Based Document Image Analysis

ocr computer-vision deep-learning object-detection document-image-processing layout-analysis document-layout-analysis detectron2 layout-parser layout-detection

Updated Aug 15, 2024
Python

bytedance / Dolphin

Star

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

python pdf parser ocr pdf-converter document-analysis pdf-parser layout-analysis vlm-ocr

Updated Jul 10, 2025
Python

An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.

python ocr latex pytorch latex-pdf math-formula layout-analysis math-ocr mathpix table-ocr math-formula-recognition image-to-markdown

Updated Jul 25, 2025
Jupyter Notebook

UglyToad / PdfPig

Star

Read and extract text and other content from PDFs in C# (port of PDFBox)

pdf csharp pdfbox netstandard pdf-files pdf-document pdf-generation hocr document-analysis pdf-extractor alto-xml page-xml layout-analysis pdf-document-processor

Updated Aug 3, 2025
C#

mittagessen / kraken

Star

OCR engine for all the languages

ocr neural-networks hocr optical-character-recognition htr handwritten-text-recognition alto-xml page-xml layout-analysis

Updated Jul 21, 2025
Python

kotaro-kinoshita / yomitoku

Sponsor

Star

Yomitoku is an AI-powered document image analysis package designed specifically for the Japanese language.

python ocr deep-learning pytorch layout-analysis

Updated Jul 30, 2025
Python

BobLd / DocumentLayoutAnalysis

Sponsor

Star

Document Layout Analysis resources repos for development with PdfPig.

pdf csharp hocr tei hocr-documents alto-xml table-extraction page-xml alto layout-analysis document-layout-analysis xycut docstrum pdfpig xy-cut recursive-xy-cut page-segmentation

Updated Oct 1, 2023
C#

mindspore-lab / mindocr

Star

A toolbox of ocr models and algorithms based on MindSpore

ocr deep-learning text-recognition text-detection layout-analysis crnn dbnet table-recognition mindspore key-information-extraction layoutxlm ocr-large-model tablemaster vary-toy

Updated Jul 24, 2025
Python

RapidAI / RapidLayout

Star

Analysis of Chinese and English layouts 中英文版面分析

layout layout-analysis cdla pp-structure doclayout-yolo

Updated Jul 14, 2025
Python

RapidAI / RapidDoc

Star

📝 针对文档类图像做内容提取，将文档类图像一比一输出到Word或者Txt中，便于进一步使用或处理。后续计划支持输入PDF/图像，输出对应json格式、Txt格式、Word格式和Markdown格式。

layout-analysis layout-recover

Updated Nov 1, 2024
Python

andreagemelli / doc2graph

Star

Doc2Graph transforms documents into graphs and exploit a GNN to solve several tasks.

nlp deep-learning pytorch layout-analysis geometric-deep-learning table-detection gnn document-understanding key-information-extraction

Updated May 23, 2023
Jupyter Notebook

ppaanngggg / yolo-doclaynet

Star

YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis

yolo document-analysis layout-analysis ultralytics yolov8 doclaynet

Updated Aug 3, 2025
Python

NormXU / Layout2Graph

Star

An official implementation of paper "Paragraph2Graph: A Language-independent GNN-based framework for layout analysis"

layout-analysis gnn-framework

Updated Oct 14, 2023
Python

xushengfeng / eSearch-OCR

Star

基于paddleOCR的nodejs库

nodejs ocr layout-analysis onnx paddleocr

Updated May 27, 2025
TypeScript

JPLeoRX / detectron2-publaynet

Star

Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset

python machine-learning computer-vision deep-learning neural-network python3 pytorch artificial-intelligence neural-networks faster-rcnn document-classification object-detection document-analysis document-layout instance-segmentation layout-analysis document-layout-analysis detectron2 publaynet

Updated Apr 16, 2023
Python

CycloneBoy / pdf_table

Star

A Unified Toolkit for Deep Learning-Based Table Extraction

pdf ocr ai table layout-analysis pdf-to-html table-recognition document-parsing

Updated Nov 21, 2024
Python

MaitySubhajit / SelfDocSeg

Star

[ICDAR 2023] SelfDocSeg: A self-supervised vision-based approach towards Document Segmentation (Oral)

computer-vision layout-analysis self-supervised-learning document-segmentation

Updated Oct 6, 2023
Python

dell-research-harvard / HJDataset

Star

A Large Dataset of Historical Japanese Documents with Complex Layouts

python dataset layout-analysis detectron2

Updated Jul 22, 2022
Jupyter Notebook

empressabyss / nordrassil

Star

Nordrassil is a keyboard layout that provides an elegant and balanced typing experience by its use of a thumb-alpha, emphasis on middle fingers, deprioritisation of pinkies, and repeat key (or arcane keys).

warcraft layouts keyboard-layout qmk keyboards dactyl layout-analysis arcane

Updated Sep 4, 2024

Improve this page

Add a description, image, and links to the layout-analysis topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the layout-analysis topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

layout-analysis

Here are 49 public repositories matching this topic...

opendatalab / MinerU

Layout-Parser / layout-parser

bytedance / Dolphin

breezedeus / Pix2Text

UglyToad / PdfPig

mittagessen / kraken

kotaro-kinoshita / yomitoku

BobLd / DocumentLayoutAnalysis

mindspore-lab / mindocr

RapidAI / RapidLayout

RapidAI / RapidDoc

andreagemelli / doc2graph

ppaanngggg / yolo-doclaynet

NormXU / Layout2Graph

xushengfeng / eSearch-OCR

JPLeoRX / detectron2-publaynet

CycloneBoy / pdf_table

MaitySubhajit / SelfDocSeg

dell-research-harvard / HJDataset

empressabyss / nordrassil

Improve this page

Add this topic to your repo