Web crawler

Simple application for finding all links on a website.

System specifications

The project has the following dependencies:

Selenium (https://www.selenium.dev/) -> for web crawling
Selenium WebDriver (https://github.com/bonigarcia/webdrivermanager) -> carries out the management (i.e., download, setup, and maintenance) of the drivers required by Selenium WebDriver
Gson (https://github.com/google/gson) -> for displaying JSONs in a more readable format

mvn clean compile exec:java

The app aims to find all the links reachable from a base URL, and not every link from every page.
"http://" and "https://" are considered two different links, even if the symbols after the protocol are the same.
We consider that everything after a hash ("#") is an anchor in the page and we eliminate the part. The symbols after hash can be seen as "noise".
Symbol "/" after the link is also considered as being "noise" and it is removed.
If a link redirects to another link (e.g. when accessing "/random" we actually land on "/post/123"), both links are considered valid links and they are taken into account.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src/main/java		src/main/java
.gitignore		.gitignore
pom.xml		pom.xml
readme.MD		readme.MD