Beautiful Soup is the powerful library when it comes web scraping but it often struggle for the Javascript Enabled Websites. To tackle this issue, I am using the Selenium along with BeautifulSoup to parse the rendered source code of the page into python variable and then using to scrap it using BS4.
The Meta Tags which you can extract using this scraper are:
-
Page Title (Length of Title)
-
H1 (Length of H1)
-
H2
-
Meta Description (Length of Meta Description)
-
Meta Keywords
-
Alt Image Tags
-
Anchor Text
-
Internal Links
Please try and let me know if you liked it..!!!!
In future, I am looking to add some features which I will share it soon!
Enjoy, also please scrap the website on your own risk..!!!!!
Note: Please change the path of the Chrome Driver which I have provied in my Github repository with name "chromedriver.exe".