A self-hosted search engine for documents. Fill our user survey about structured content: : https://forms.gle/PYgusFsoBaMyzUec9
-
Updated
Jul 10, 2025 - Java
A self-hosted search engine for documents. Fill our user survey about structured content: : https://forms.gle/PYgusFsoBaMyzUec9
Bachelor Thesis | Text extraction from complex video scenes
Tika per page PDF extractor server returning content as JSON.
Tess4J CLI OCR Tool is a command-line application that extracts text from images and PDFs using the Tess4J library, with support for multiple languages. The extracted text is automatically copied to the clipboard for easy access.
Simple server to extract text from a PDF
Arachnio client library for Java 11+
A Cloud-Native Infrastructure for License Plate Recognition and Text Extraction with Python Integration
A Spring Boot-based OCR Exporter tool that extracts text from image or PDF files using the OCR Space API and exports the results to various formats such as PDF, text, Word, or a database.
Multiple File Format (PDF/DOC/DOCX/XLSX/XLS/CSV) Text Extraction Utility Project in Java Programming Language
Text extraction: a highway to systematically process car reviews
Yet Another Document 2 Text for pdf/doc/html/rft/etc - Extract text - or - convert to simplified HTML to retain layout information
Run Apache Tika as a service in AWS Lambda by scanning documents in S3 and storing the extracted text back to S3
Extract and detect text from the captured image and also selected images from the gallery.
Add a description, image, and links to the text-extraction topic page so that developers can more easily learn about it.
To associate your repository with the text-extraction topic, visit your repo's landing page and select "manage topics."