WebCrawler
is a library to obtain articles related to light pollution.
© 2018 Jorge Galán - OEG-UPM. Available under Apache License 2.0. See LICENSE
.
- Download any article from ZENODO and DARKSKY. While Zenodo is a website with a many articles on different topics, is focused on light pollution.
- Get all information of this articles in a JSON, included tittle, author, abstract, authors, keyWords, doi and licence.
- Make sure you have the latest version of the browser Google Chrome browser on your computer.
- You need a persistent internet connection
Download a version of the WebCrawler's from our releases page, that includes a jar and a exe, which needs to be given permissions on your machine.
WebCrawler
provides a command line application:
$java -jar WebCrawler.jar --help
usage: PDFExtractor [-h] [-i <inputFolder>] [-k <keywords>] [-s
<sourceWeb>]
Mised argument
-h,--help Indicate how yo use the program.
-i,--input <inputFolder> [REQUIRED] Input folder where download the
content. Ex: /Users/jesus/aFolder
-k,--keyword <keywords> [REQUIRED] Keyword to search the PDF files
-s,--sources <sourceWeb> [OPTIONAL] Choose the information source.
(ZENODO, DARKSKY, ALL). Default: ALL
Clone this repo and run:
mvn clean compile assembly:single
Then, get your own version of the jar in the project's target
folder.