Task ID: Web Scraper

Website Link : https://blog-scraper.herokuapp.com/

Setup

Open a terminal
Navigate to the directory where you want to install the project
Example : cd wec-recs
Clone the repository
git clone https://github.com/pranav2305/blog-scraper.git
Navigate to the project directory
cd blog-scraper
Install the node packages
npm i
Run the server
node index.js
Open the website on your browser
http://localhost:3000/

How to Use

Open the website or use the localhost if you cloned the repo.
Select any one URL from the list of compatible URLs.
Click on the Scrape button to scrape data from that URL.
The scraped blogs will be displayed.

Tech Used

An express server was made using Node.js.
A node package called Axios was used to request data from a URL.
Cheerio was used to scrape the data.
EJS was used to create templates to render the website with dynamic data.
Bootsrap was used to make the website responsive by using their Grid system
Heroku to deploy the website.

About

A simple web scraper made to scrape blogs. This website take an URL as an input and displays the extracted data from that URL. Currently the scraper is limited to a few URLs only as listed below. As of now, the scraper only scrapes blogs from Detailed which updates the top 50 blogs in various categories, every 24 hours. The main aim of the scraper is to search the meaningful data from a website and ignore the unnecessary data to make it more understandable.

Samples

Home Page

Tech Blogs (Desktop view)

Art Blogs (Mobile view)
For other valid URLs (incompatible URLs)

For invalid URLs

Demo Video

Link: https://youtu.be/SAH5qdraBnA

References

Cheerio docs
Bootstrap grid system

Compatible URLs

Art Blogs
Beauty Blogs
Book Blogs
Business Blogs
Car Blogs
Career Blogs
Christian Blogs
Coffee Blogs
Cryptocurrency Blogs
Design Blogs
Education Blogs
Entertainment Blogs
Esports Blogs
Fashion Blogs
Finance Blogs
Fitness Blogs
Food Blogs
Men Blogs
Women Blogs
Health Blogs
Interior Design Blogs
Lifestyle Blogs
Marketing Blogs
Music Blogs
Outdoor Blogs
Parenting Blogs
Photography Blogs
Pregnancy Blogs
Real Estate Blogs
Running Blogs
Sleep Blogs
Sports Blogs
Stock Market Blogs
Survival Blogs
Tech Blogs
Travel Blogs
Vegan Blogs
Wedding Blogs
Web development Blogs
Wellness Blogs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Task ID: Web Scraper

Website Link : https://blog-scraper.herokuapp.com/

Setup

How to Use

Tech Used

About

Samples

Demo Video

References

Compatible URLs

Files

README.md

Latest commit

History

README.md

File metadata and controls

Task ID: Web Scraper

Website Link : https://blog-scraper.herokuapp.com/

Setup

How to Use

Tech Used

About

Samples

Demo Video

References

Compatible URLs