Author: Tatjana Chernenko, 2024
This project aims to automate the process of extracting job postings from LinkedIn, storing them in a local MongoDB database, and notifying the user via email about relevant job opportunities. It offers two modes of operation: one-time scraping and continuous scraping.
- Web scraping of LinkedIn job postings.
- Saving job postings to a MongoDB database.
- Two modes of operation: one-time scraping or continuous scraping.
- Saving job postings to a CSV file.
- Email notification for job postings matching predefined keywords.
- Clone the repository: git clone https://github.com/TatjanaChernenko/mongodb_webscrapping
- Navigate to the project directory: cd mongodb_webscrapping
- Install the required dependencies using pip: pip install -r requirements.txt
- Set up a Google Cloud Platform project and obtain credentials for the Gmail API. Refer to the Gmail API documentation for detailed instructions.
- Rename the downloaded client secrets file to
client_secret.json
and place it in the project's root directory. - Configure the parameters in the
config.ini
file, including your email addresses, LinkedIn search parameters, and other settings. - Run the
main.py
file
You can customize the project's behavior by editing the config.ini
file. Here are the available parameters:
email_sender
: Your email address for sending notifications.email_recipient
: Recipient email address for receiving notifications.linkedin_pages
: Number of LinkedIn pages to scrape.keywords
: Keywords to match for suitable job postings.page_number
: Initial page number for scraping.sleep_time
: Time interval between scraping requests.
This project is licensed under the terms of the MIT License.