Python port of Nutch that allows controlling Apache Nutch via its REST API.
-
Updated
Dec 2, 2015 - Python
Python port of Nutch that allows controlling Apache Nutch via its REST API.
A simple web crawler inside a docker container using Apache Nutch 1 and Solr.
A search engine built to retrieve geographical information of any country.
Nutch 1.x Indexer Plugin that runs against ES6.7
Different example of using Nutch: with Solr, Selenium Hub, standalone web drivers
DataHarvest: Dockerized Web Crawling, Indexing, and Storage Solution
The proposed system makes use of a crawler to gather information from every document on the website and store this information in the index. The index is a structured system of storing the unstructured data returned by the crawler. In this project, Nutch’s main component named ‘crawler’ is used for indexing and Solr is used for ‘searching’. The …
Add a description, image, and links to the apache-nutch topic page so that developers can more easily learn about it.
To associate your repository with the apache-nutch topic, visit your repo's landing page and select "manage topics."