Popular repositories Loading
-
TextClassification
TextClassification PublicA Text Classification API in Java originally developed by DigitalPebble Ltd. The API is independent from the ML implementations used and can be used as a front end to various ML algorithms. libSVM …
-
textclassification-examples
textclassification-examples PublicUse cases for DigitalPebble's TextClassification API
-
stormcrawlerfight
stormcrawlerfight PublicCrawl configurations for benchmarking / testing StormCrawler
-
stormcrawler-docker
stormcrawler-docker PublicResources for running StormCrawler with Docker services
-
Repositories
- crawlurlfrontier Public
Crawl config used to test URL Frontier on a large scale and produce WARCs for CommonCrawl.
DigitalPebble/crawlurlfrontier’s past year of commit activity - tika-detector-stormcrawler Public
Wraps the charset detection logic from StormCrawler as a Tika module
DigitalPebble/tika-detector-stormcrawler’s past year of commit activity - tika Public Forked from apache/tika
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
DigitalPebble/tika’s past year of commit activity - docs Public Forked from docker-library/docs
Documentation for Docker Official Images in docker-library
DigitalPebble/docs’s past year of commit activity