text-extraction

Here are 14 public repositories matching this topic...

ICIJ / datashare

A self-hosted search engine for documents. Fill our user survey about structured content: : https://forms.gle/PYgusFsoBaMyzUec9

docker elasticsearch extract text-extraction named-entity-recognition web-gui datashare investigative-journalism

Updated Jul 10, 2025
Java

Arxa / video_text_detection

Star

Bachelor Thesis | Text extraction from complex video scenes

opencv video gradle javafx image-processing text-extraction junit testfx

Updated Mar 15, 2019
Java

mkalus / tika-page-extractor

Star

Tika per page PDF extractor server returning content as JSON.

metadata pdf json tika text-extraction

Updated Mar 16, 2016
Java

Tess4J CLI OCR Tool is a command-line application that extracts text from images and PDFs using the Tess4J library, with support for multiple languages. The extracted text is automatically copied to the clipboard for easy access.

java open-source pdf ocr image-processing text-extraction tesseract-ocr tess4j java-cli

Updated May 10, 2025
Java

FileFormatInfo / ff-pdf2txt

Star

Simple server to extract text from a PDF

pdf text conversion text-extraction file-converter

Updated Apr 15, 2025
Java

matrix-maeny / Text-Detector

Star

Extract Text from An Image.

text-extraction text-detection

Updated Jul 1, 2022
Java

arachnio / arachnio4j

Star

Arachnio client library for Java 11+

text-extraction web-scraping data-extraction article-extractor news-scraping web-scraping-java arachnio

Updated May 31, 2023
Java

eitanflor / ShellHacks-2020

Star

A Cloud-Native Infrastructure for License Plate Recognition and Text Extraction with Python Integration

python java machine-learning javafx text-extraction artificial-intelligence sqlserver google-cloud-platform cloud-sql cloudvision license-plate-recognition

Updated Oct 26, 2020
Java

hyuseinleshov / ocr-exporter

Star

A Spring Boot-based OCR Exporter tool that extracts text from image or PDF files using the OCR Space API and exports the results to various formats such as PDF, text, Word, or a database.

ocr spring-boot text-extraction file-processing word-export pdf-processing text-export