document-parser

Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.

markdown ocr ai structured-data tables pdf-parser document-parser structured-data-capture pdf-to-json llm document-parsing image-to-markdown pdf-to-markdown

Updated Aug 18, 2025
Python

Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pipelines (GenAI, LLM, VLLM) into your applications, supporting various tasks such as document cleanup, optical character recognition (OCR), classification, splitting, named entity recognition, and form processing

python docker ocr pytorch omr optical-character-recognition optical-mark-recognition icr document-parser document-layout-analysis table-recognition table-detection publaynet intelligent-character-recognition intelligent-word-recognition iwr pubtabnet

Updated Aug 19, 2025
Python

JPLeoRX / opencv-text-deskew

Star

Tutorial on how to deskew (straighten) text images

python opencv tutorial computer-vision image-processing opencv-python deskew document-parser

Updated Mar 15, 2022
Python

papercast-dev / papercast

Star

A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines.

python nlp pipeline podcast pdf-converter tts arxiv pdf-to-text dag document-parser pdf-document-processor grobid semantic-scholar document-parsing

Updated Mar 17, 2025
Python

InvoiceableAI / Invoiceable

Star

The invoice, document, and resume parser powered by AI.

python resume ai experimental invoices invoice documents resume-parser resumes document-parser invoice-parser invoiceable

Updated Nov 22, 2024
Python

decisionfacts / semantic-ai

Star

An open source framework for Retrieval-Augmented System (RAG) uses semantic search helps to retrieve the expected results and generate human readable conversational response with the help of LLM (Large Language Model).

pdf machine-learning ocr deep-neural-networks openai docx approximate-nearest-neighbor-search semantic-search document-parser rag fastapi vector-database inference-api openai-api llm retrieval-augmented-generation llama2

Updated Jul 19, 2024
Python

graphlit / graphlit-client-python

Star

Python client library for Graphlit Platform

ai chatbot api-client copilot agents ai-agents document-parser rag pdf-to-json api-client-python llms graphlit

Updated Aug 19, 2025
Python

decisionfacts / df-extract

Star

DF Extract Lib

pdf jpg png jpeg extraction python3 asyncio docx pptx document-parser

Updated Apr 3, 2024
Python

has-abi / docparser

Star

Extract text from your DOCX documents.

text-parser document-parser doc-parser docx-parser

Updated Feb 10, 2024
Python

Gyanvir / DrParser

Star

Dr.Parser 🩸📊 – AI-powered blood report parser that extracts and analyzes medical data from images/PDFs. Built with React, FastAPI, EasyOCR, and Gemini AI. 🚀 🔹 Local Setup Available | 🔹 Future Enhancements Planned | 🔹 Hackathon Project 👉 Clone, run, and explore the future of AI-driven healthcare!

ocr reactjs healthcare hackathon-project document-parser fastapi medical-ai ai-ml easyocr team-euphoria blood-report-analysis

Updated Mar 30, 2025
Python

Vetrivel07 / AI-Powered-Resume-Evaluator

Star

An AI-powered resume evaluation app that compares a candidate’s resume with a job description using Google’s Gemini 1.5 Flash model to provide HR-style feedback and an ATS-style match scoring through a simple and interactive Streamlit interface.

python-library evaluator ats document-parser resume-analysis gemini-api streamlit streamlit-application genai gemini-flash

Updated Jul 1, 2025
Python

privateai-com / docviz

Star

Advanced document contents extraction with multiple output formats

python pdf ocr pdf-parser layout-analysis document-parser vision-language-model

Updated Aug 17, 2025
Python

MegrezAI / LeapRAG

Star

LeapRAG is an open-source platform that integrates advanced RAG technology with Google’s A2A protocol, enabling users to build context-aware, data-driven agents. These agents are automatically A2A-compliant and can be discovered and used by any compatible client without extra development.

nlp pdf openai pdf-to-text agents document-parser rag a2a llm document-parsing chatgpt retrieval-augmented-generation ollama deepseek a2a-protocol agent-to-agent

Updated May 27, 2025
Python

anyparser / anyparser_crewai

Star

Supercharge your AI workflows by combining Anyparser’s advanced content extraction with Crew AI. With this integration, you can effortlessly leverage Anyparser’s document processing and data extraction tools within your Crew AI applications.

python typescript artificial-intelligence knowledge-graph cag document-parser kag rag document-parsing retrieval-augmented-generation crewai crew-ai crewai-rag cache-augmented-generation anyparser crew-ai-rag

Updated Feb 17, 2025
Python

suwa-sh / local-RAG-backend

Star

This is the backend for a RAG system that runs on Docker Compose. It registers documents in a wide range of file formats, which can be searched using the MCP server.

search docker information-retrieval ai docker-compose embeddings full-text-search graph-search ingest reranking unstructured document-parser rag vector-search graphiti llm mcp-server changes-over-time

Updated Jul 25, 2025
Python

Improve this page

Add a description, image, and links to the document-parser topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the document-parser topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

document-parser

Here are 26 public repositories matching this topic...

docling-project / docling

Marker-Inc-Korea / AutoRAG

Filimoa / open-parse

deepdoctection / deepdoctection

iamarunbrahma / vision-parse

NanoNets / docstrange

marieai / marie-ai

JPLeoRX / opencv-text-deskew

papercast-dev / papercast

InvoiceableAI / Invoiceable

decisionfacts / semantic-ai

graphlit / graphlit-client-python

decisionfacts / df-extract

has-abi / docparser

Gyanvir / DrParser

Vetrivel07 / AI-Powered-Resume-Evaluator

privateai-com / docviz

MegrezAI / LeapRAG

anyparser / anyparser_crewai

suwa-sh / local-RAG-backend

Improve this page

Add this topic to your repo