Skip to content

This application scrapes all the links present inside the main website by leaveraging the concurrency model of go

License

Notifications You must be signed in to change notification settings

ifrah-ashraf/multithreaded-scrapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multithreaded Scraper in Go

This project is a multithreaded web scraper built with Go, designed to scrape all links present on a main website (e.g., wikipedia.org). It leverages goroutines and channels for concurrent processing, along with proper synchronization mechanisms.

Features

  • Concurrent Scraping: Uses goroutines to scrape multiple URLs simultaneously.
  • Channel Communication: Synchronizes goroutines efficiently to manage URL scraping tasks.
  • Customizable URL List: Easily modify the URLs to scrape by updating the urls array.

Prerequisites

Ensure that Go is installed on your machine. Download Go here if needed.

Setup Instructions

  1. Clone the repository:

    git clone https://github.com/ifrah-ashraf/multithreaded-scrapper.git
    cd multithreaded-scrapper
    
  2. Open main.go and modify the URL list in the urls array:

    var urls = []string{"https://news.ycombinator.com/"}
    
  3. Save the file and run it using

    go run main.go
    

Made with ❤️ by ifrah ashraf

About

This application scrapes all the links present inside the main website by leaveraging the concurrency model of go

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages