#Specification
You are required to build a tool with a command line interface that will be able to download all images from given HTML page, save them to local disk and transcode them to different sizes and formats
Processed images can be later used on a mobile site.
##Minimal functionality
- Downloader tool has to accept at least two parameters: URL of a source page and location of output directory on local disc
- It has to download all images defined with tag on this page to specified directory
- Tool should be optimized to download only new or modified images. Images that have not been changed since tool was last run and still exist in specified directory should be omitted
- Once the image is downloaded, it should be resized to three different sizes: width 100px, 220px and 320px with preserved width/height ratio
- Each of the three image sizes should be saved in at least 2 formats: png and jpg
- Since tool can be used to parse large number of different HTML pages the application should consider performance
- Tool can skip very small images (width or height <= 10 pixels) since they are most likely unusable on a mobile site.
#Solution
##Requirements
##Usage:
To build the application navigate to the root directory and run 'mvn clean install'
To run (using an external test website): java -jar target/image-downloader.jar http://adambarnes5000.weebly.com downloads
This will create a new folder 'downloads' which will be populated with pictures of a dog, a cat and a monkey in the original size/format, and resized to 100, 220, 320 width in jpg and png formats.
##Design Summary
The application is built using Spring/Spring Integration. This decision was made to take advantage of some components offered out of the box by Spring Integration notably Splitter, Filter and Aggregator and also simple integration of multithreading. This is implemented by using a splitter to go from a webpage --> list of image urls, each image becomes a new payload. The image payloads are then placed on a channel and dispatched using a task executor (configured with a pool of 20 threads). The messages are then run through 2 filters, one filtering out previously downloaded and unchanged images, the other images of insufficient size. The remaining messages are then resized and converted to various formats and saved to disk. The messages (both succesful and filtered) are then aggregated and a number indicating how many images were downloaded is returned to the main class.
In terms of other possible approaches, an alternative would be to implement from scratch. This would have meant much more boilerplate code particularly around multithreading, this was not needed in my version leaving me free to concentrate on business logic.
Besides Spring the following libraries were also used: