This project is a multithreaded web scraper built with Go, designed to scrape all links present on a main website (e.g., wikipedia.org
). It leverages goroutines and channels for concurrent processing, along with proper synchronization mechanisms.
- Concurrent Scraping: Uses goroutines to scrape multiple URLs simultaneously.
- Channel Communication: Synchronizes goroutines efficiently to manage URL scraping tasks.
- Customizable URL List: Easily modify the URLs to scrape by updating the
urls
array.
Ensure that Go is installed on your machine. Download Go here if needed.
-
Clone the repository:
git clone https://github.com/ifrah-ashraf/multithreaded-scrapper.git cd multithreaded-scrapper
-
Open main.go and modify the URL list in the urls array:
var urls = []string{"https://news.ycombinator.com/"}
-
Save the file and run it using
go run main.go
Made with ❤️ by ifrah ashraf