Skip to content

Latest commit

 

History

History
142 lines (103 loc) · 4.78 KB

README.md

File metadata and controls

142 lines (103 loc) · 4.78 KB

Task ID: Web Scraper


Setup

  1. Open a terminal

  2. Navigate to the directory where you want to install the project
    Example : cd wec-recs

  3. Clone the repository
    git clone https://github.com/pranav2305/blog-scraper.git

  4. Navigate to the project directory
    cd blog-scraper

  5. Install the node packages
    npm i

  6. Run the server
    node index.js

  7. Open the website on your browser
    http://localhost:3000/


How to Use

  1. Open the website or use the localhost if you cloned the repo.

  2. Select any one URL from the list of compatible URLs.

  3. Click on the Scrape button to scrape data from that URL.

  4. The scraped blogs will be displayed.


Tech Used

  • An express server was made using Node.js.
  • A node package called Axios was used to request data from a URL.
  • Cheerio was used to scrape the data.
  • EJS was used to create templates to render the website with dynamic data.
  • Bootsrap was used to make the website responsive by using their Grid system
  • Heroku to deploy the website.

About

A simple web scraper made to scrape blogs. This website take an URL as an input and displays the extracted data from that URL. Currently the scraper is limited to a few URLs only as listed below. As of now, the scraper only scrapes blogs from Detailed which updates the top 50 blogs in various categories, every 24 hours. The main aim of the scraper is to search the meaningful data from a website and ignore the unnecessary data to make it more understandable.


Samples

  1. Home Page
    home-page


  1. Tech Blogs (Desktop view)
    tech-blogs


  1. Art Blogs (Mobile view)
    art-blogs

  2. For other valid URLs (incompatible URLs)
    no-data



  1. For invalid URLs
    invalid-url


Demo Video

Link: https://youtu.be/SAH5qdraBnA


References

  1. Cheerio docs

  2. Bootstrap grid system


Compatible URLs