A simple command line application that outputs to STDOUT the top posts from https://news.ycombinator.com in JSON.
- Title and author are non empty strings not longer than 256 characters.
- Uri is a valid URI
- Points, comments and rank are integers >= 0.
N.B. Posts that don't follow the validation rules are ignored (except title and author are capped at 256 characters, if longer)
- Download and setup docker from here
- Download docker image here
- Open the terminal
- cd to location of newscraper.tar
- Load docker image:
docker load --input newscraper.tar
- Run docker container:
docker run -i -t newsscraperimg /bin/bash
- Now you are in the docker container you can use:
newsscraper --posts n
where n is a positive integer <=100
- Install git: instructions
- Make a github account (if you don't have one already)
- Clone this repository:
git clone https://github.com/BeardedPug/HackerNewsScraper.git
- Download and setup docker from here
- Install gradle from here
- Open the terminal
- cd into HackerNewsScraper/hacker_news_scraper
- Run
gradle build
- Run
gradle fatjar
- Create docker image:
docker build -t newsscraperimg .
- Run docker container:
docker run -i -t newsscraperimg /bin/bash
- Now you are in the docker container you can use:
newsscraper --posts n
where n is a positive integer <=100
- Gradle: Build tool for project and dependency management
- Docker: Containerisation utility
- IntelliJ: IDE used
- slf4j: Logging facility which works well with Lombok.
- logback: Needed for slf4j.
- jsoup: Used for parsing and traversing the html.
- jackson: Standard JSON library for java to parse into JSON objects and pretty print.
- lombok: Allows for much more concise code by automating some methods.
- junit: Used to unit test the scraper.