Skip to content

A simple command line application that outputs to STDOUT the top posts from https://news.ycombinator.com in JSON.

License

Notifications You must be signed in to change notification settings

BeardedPug/HackerNewsScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

HackerNewsScraper

A simple command line application that outputs to STDOUT the top posts from https://news.ycombinator.com in JSON.

Validation rules

  1. Title and author are non empty strings not longer than 256 characters.
  2. Uri is a valid URI
  3. Points, comments and rank are integers >= 0.
N.B. Posts that don't follow the validation rules are ignored (except title and author are capped at 256 characters, if longer)

TO RUN

Method 1 from Docker Image (Google drive link):

  • Download and setup docker from here
  • Download docker image here
  • Open the terminal
  • cd to location of newscraper.tar
  • Load docker image: docker load --input newscraper.tar
  • Run docker container: docker run -i -t newsscraperimg /bin/bash
  • Now you are in the docker container you can use: newsscraper --posts n where n is a positive integer <=100

Method 2

  • Install git: instructions
  • Make a github account (if you don't have one already)
  • Clone this repository: git clone https://github.com/BeardedPug/HackerNewsScraper.git
  • Download and setup docker from here
  • Install gradle from here
  • Open the terminal
  • cd into HackerNewsScraper/hacker_news_scraper
  • Run gradle build
  • Run gradle fatjar
  • Create docker image: docker build -t newsscraperimg .
  • Run docker container: docker run -i -t newsscraperimg /bin/bash
  • Now you are in the docker container you can use: newsscraper --posts n where n is a positive integer <=100

Technologies used

  • Gradle: Build tool for project and dependency management
  • Docker: Containerisation utility
  • IntelliJ: IDE used

Libraries used

  • slf4j: Logging facility which works well with Lombok.
  • logback: Needed for slf4j.
  • jsoup: Used for parsing and traversing the html.
  • jackson: Standard JSON library for java to parse into JSON objects and pretty print.
  • lombok: Allows for much more concise code by automating some methods.
  • junit: Used to unit test the scraper.

About

A simple command line application that outputs to STDOUT the top posts from https://news.ycombinator.com in JSON.

Resources

License

Stars

Watchers

Forks

Packages

No packages published