Build software better, together

allenai / papermage

Star

library supporting NLP and CV research on scientific papers

python machine-learning natural-language-processing computer-vision scientific-papers multimodal pdf-processing

Updated Nov 8, 2024
Python

ahmedkhemiri95 / PDFs-TextExtract

Star

Multiple and Large PDF Documents Text Extraction.

python pdf parser data-science pdf-document text-analytics pdfs pypdf2 extract-text pdfminer pdf-processing pdfs-textextract

Updated Feb 10, 2025
Python

postralai / masquerade

Star

The Privacy Firewall for LLMs

privacy mcp claude anonymization pdf-processing pseudonymization pdf-redaction private-llm model-context-protocol mcp-server pdf-pseudonymization

Updated Jun 23, 2025
Python

aws-samples / document-processing-pipeline-for-regulated-industries

Star

A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.

Updated Oct 25, 2021
Python

PSPDFKit / nutrient-dws-client-python

Star

Official Python client library for Nutrient Document Web Services API - PDF processing, OCR, watermarking, and document manipulation with automatic Office format conversion

python pdf-converter pdf-generation pdf-document-processor ocr-python pdf-processing

Updated Jul 3, 2025
Python

Govind-S-B / pdf-to-text-chroma-search

Star

Python scripts that converts PDF files to text, splits them into chunks, and stores their vector representations using GPT4All embeddings in a Chroma DB. It also provides a script to query the Chroma DB for similarity search based on user input.

text-extraction similarity-search pdf-processing vector-embeddings chromadb

Updated Oct 23, 2023
Python

ranguy9304 / LangGraphRAG

Star

LangGraphRAG: A terminal-based Retrieval-Augmented Generation system using LangGraph. Features include message history caching, query transformation, and vector database retrieval. Ideal for NLP researchers and developers working on advanced conversational AI and information retrieval systems.

python natural-language-processing information-retrieval chatbot web-scraping nlp-machine-learning rag terminal-application pdf-processing vector-database openai-api langgraph

Updated Jul 13, 2024
Python

Inc44 / MaTools

Sponsor

Star

An all-in-one GUI management toolkit built with PyQt6, offering a suite of tools for file synchronization, media organization, PDF merging, code formatting, and more.

python rust productivity application gui qt ocr image-processing video-processing speech-recognition youtube-downloader file-management audio-processing pdf-processing code-formatting

Updated Mar 15, 2025
Python

Remy2404 / Polymind

Star

Polymind is a powerful multi-modal Telegram bot built with Gemini, DeepSeek, OpenRouter, and over 50 cutting-edge AI models. It offers seamless conversational intelligence, Mermaid diagram rendering, PDF/DOCX analysis, image generation, and collaborative tools—all in a single bot interface.

telegram-bot voice image-processing voice-recognition gemini multi-model pdf-processing ai-assistant openrouter mermiad deepseek-r1

Updated Jul 26, 2025
Python

DioCrafts / ai-book-summarizer

Star

📚 AI-Powered Book PDF Knowledge Extractor & Summarizer Transform your PDF books into structured knowledge effortlessly! This tool leverages AI to analyze books page by page, extracting key insights, definitions, and concepts, and organizes them into Markdown summaries for easier study

python markdown pdf machine-learning natural-language-processing automation ai text-analysis openai text-summarization document-analysis study-materials pymupdf knowledge-extraction pdf-processing book-summary educational-tools pdf-summarization ai-powered-tools

Updated Jan 2, 2025
Python

Aleptonic / PdfSnipper

Star

PdfSnipper is a lightweight and efficient Python package designed to simplify the management of PDF files, pages, and their conversions during various NLP, Computer Vision (CV), or other data processing tasks. The package eliminates the need for repetitive code by providing intuitive, ready-to-use functions for common PDF-related operations.

utilities pdf-processing nlp-tools

Updated Feb 3, 2025
Python

Yardenrsk / PsychometryReceiverCV

Star

A side project to easily get and annotate questions and answers to the PsychometryBot project DB using computer vision and pdf parsing

pandas opencv-python pdf-processing

Updated Sep 18, 2022
Python

thinhuos0913 / python_useful_mini_projects

Star

This is some useful mini projects that I had worked for self-learning Python programming.

python opencv ocr image-processing pdf-processing

Updated May 20, 2024
Python

Al-shwaib / Book-Preparation-for-Printing

Star

A web application for preparing books and magazines for offset printing. Automatically arranges PDF pages for commercial A3 printing, supporting both Arabic (RTL) and English (LTR) books. تطبيق ويب لتحضير الكتب والمجلات للطباعة على مطابع الأوفست. يقوم تلقائياً بترتيب صفحات PDF للطباعة التجارية على ورق A3، مع دعم الكتب العربية والإنجليزية.

flask-application pymupdf pdf-processing rtl-support offset-printing book-preparation arabic-books commercial-printing a3-printing order-to-print

Updated Jan 6, 2025
Python

arsath-eng / RAG1-NVIDIA-GENAI

Star

A powerful Retrieval Augmented Generation (RAG) application built with NVIDIA AI endpoints and Streamlit. This solution enables intelligent document analysis and question-answering using state-of-the-art language models, featuring multi-PDF processing, FAISS vector store integration, and advanced prompt engineering.

embeddings question-answering document-analysis faiss rag pdf-processing streamlit llm langchain vector-store nvidia-ai-faundry llama-models

Updated Oct 31, 2024
Python

Siddharthsinghkumar / auto-job-match-pipeline

Star

AI-powered job search assistant that reads newspapers daily, finds jobs matching your resume using GPT, and alerts you via Telegram. 2025

automation ocr telegram-bot gpt job-search pdf-processing layout-detection resume-matching

Updated Jul 6, 2025
Python

rithulkamesh / docproc

Sponsor

Star

Opinionated and Sophisticated Document Region Analyzer.

python machine-learning ocr text-classification text-extraction data-extraction region-detection content-extraction document-analysis layout-analysis pdf-processing pdf-text-extraction document-parsing equation-detection mathematical-symbols

Updated Apr 13, 2025
Python

ShadowAniket / AI-RESUME

Star

AI-powered Resume Analyzer and Builder with scoring, suggestions, and ATS optimization. Built using Flask, OpenAI, and Resume Parsing tools for smarter job applications.

python nlp open-source machine-learning job-application resume-parser pdf-processing streamlit ats-resume resume-analyzer ai-resume-builder cv-scoring resume-review

Updated Jun 8, 2025
Python

rlwadh / markitdown-desktop

Star

Professional document converter with Desktop & Web versions. Unlimited PDF processing, multi-file support. Supports kindergarten project.

web-app markdown-converter batch-processing document-converter python-gui pdf-processing kindergarten-support unlimited-pdf multifile-processing

Updated Jun 3, 2025
Python

omritriki / BIU-Points-Calculator

Star

A web application for calculating credit points and GPA from PDF transcripts. Built with FastAPI and pdfplumber, this tool simplifies the process for BIU engineering students.

css python html api education render web-application biu fastapi pdf-processing gpa-calculation-tool university-tools credit-points

Updated Apr 14, 2025
Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pdf-processing

Here are 86 public repositories matching this topic...

allenai / papermage

ahmedkhemiri95 / PDFs-TextExtract

postralai / masquerade

aws-samples / document-processing-pipeline-for-regulated-industries

PSPDFKit / nutrient-dws-client-python

Govind-S-B / pdf-to-text-chroma-search

ranguy9304 / LangGraphRAG

Inc44 / MaTools

Remy2404 / Polymind

DioCrafts / ai-book-summarizer

Aleptonic / PdfSnipper

Yardenrsk / PsychometryReceiverCV

thinhuos0913 / python_useful_mini_projects

Al-shwaib / Book-Preparation-for-Printing

arsath-eng / RAG1-NVIDIA-GENAI

Siddharthsinghkumar / auto-job-match-pipeline

rithulkamesh / docproc

ShadowAniket / AI-RESUME

rlwadh / markitdown-desktop

omritriki / BIU-Points-Calculator

Improve this page

Add this topic to your repo