Web scraping automatically extracts data and presents it in a format you can easily make sense of.We are going to use Python as our scraping language, together with a simple and powerful library, BeautifulSoup
For Dynamically loading the webpage we use Selenium along with chromedriver
Selenium WebDriver is a collection of open source APIs which are used to automate the testing of a web application. Description: Selenium WebDriver tool is used to automate web application testing to verify that it works as expected. It supports many browsers such as Firefox, Chrome, IE, and Safari.
- You should check a website’s Terms and Conditions before you scrape it. Be careful to read the statements about legal use of data. Usually, the data you scrape should not be used for commercial purposes.
- Do not request data from the website too aggressively with your program (also known as spamming), as this may break the website. Make sure your program behaves in a reasonable manner (i.e. acts like a human). One request for one webpage per second is good practice.
- The layout of a website may change from time to time, so make sure to revisit the site and rewrite your code as needed
Analysis is done
- Author wise
- Month wise
- Tag wise and so on...
The resulting visualizations help us understand data science based medium articles better...