Facebook Keyword Alerts

A Python script that generates email alerts when specific keywords are mentioned in Facebook group posts. Uses Selenium, Gmail's API and the Firefox browser.

Requirements

The script requires a number of items in order to run:

Python 3
Firefox browser installed on your local machine.
geckodriver file downloaded in this folder. The file will enable you to run Firefox using Selenium.
A Gmail account that will be used to send the automated alerts.
A credentials.json file in this folder, which contains credentials enabling Gmail API on your gmail account. You need to download this file from here by clicking the Enable the Gmail API button.
An input.json file that contains the Facebook group URLs, keywords used for alerts, sender and receiver email addresses.

The file also needs to hold the path to the Firefox profile you wish to use with the browser. You can find it by opening about:support in the Firefox browser and looking at the path in the Profile Directory row - see this for more details.

This is how the json file should be structured:

{
  "alerts": [
    {
      "url": "https://www.facebook.com/groups/SaaSgrowthhacking/",
      "keywords": ["marketing", "sales"]
    },
    {
      "url": "https://www.facebook.com/groups/DeepNetGroup/",
      "keywords": ["pytorch", "nvidia"]
    }
  ],
  "firefox_profile_path": "/home/jon/.mozilla/firefox/yy6ndmx3.default-release",
  "gmail": {
    "sender": "mihail.automated.alerts@gmail.com",
    "receiver": "mihailmarian12@gmail.com"
  }
}

How it works

Video demo

Click here for a video demonstration.

Launch

Once all the requirements are fulfilled, you can launch the script from main.py. The script takes 5-10 minutes to complete, which is why I've added several print statements indicating the different stages it's going through.

Scraping Facebook

The script starts by reading the JSON object from the input.json file. The alerts and firefox_profile_path keys inside this object are then used to create an instance of the Alerts class. The resulting object is initialized with the results key holding all of the relevant data scraped from Facebook.

I leverage Selenium's Python bindings in order to fetch this data. The bindings are placed in the SeleniumBrowser custom class, together with additional methods that allow me to parse Facebook's website.

Challenges

There were two main challenges I faced looking for the right data on Facebook's web pages.

Finding reliable CSS selectors

The HTML structure is incredibly complex, with a very large number of layers that have indistiguishable attributes. For example, here's how far I had to go in order to find a reliable selector for the post publisher's name:

I eventually managed to distinguish a pattern:

<!--wrapper for the entire post-->
<div role="article" aria-posinset>
  <h2><a>Publisher Name</a></h2>
  <div dir="auto">Post content</div>
</div>

I catch any exceptions when I parse through a post because I still get several incorrect results even with this pattern.

Making content visible to Selenium

On page load, Facebook only generates a small number of posts for obvious reasons. I was able to resolve this by asking Selenium to press the END key 7 times (arbitrary number) when the page is loaded.

When loading a lot of posts, you have to be careful because Facebook actually hides elements that are too far away from your viewport.

This is important because Selenium can't interact with elements that are hidden. In my case, I had to make sure I click all the "See more" links for the posts content before I hit the END key to load more content

def expand_results(self):
    self.click_see_more_buttons()
    body = self.find_element_by_tag_name("body")
    for _ in range(7):
        print("--scrolling down for more results...")
        body.send_keys("webdriver" + Keys.END)
        time.sleep(10)
        self.click_see_more_buttons()

Alerts with Gmail's API

Once the results are gathered, I use the Alerts object's email method to send the email to the address specified in input.json. The method calls upon another custom class GmailAlert which enables us to leverage Gmail's API.

When you initalize a GmailAlert object for the first time, your default browser will be opened on a page where Google asks you to confirm whether you'd like the script to access your Gmail account.

Features to consider adding

Parse web pages with BeautifulSoup after loading them and expanding the feed with Selenium. That way, you don't have to click the "See more" buttons and you avoid all the issues that come with it. You might even be able to fetch more posts from the Facebook group.
I'm usually setting fixed waiting times for when Selenium should move to the next action. The script would execute faster if I dynamically wait for the relevant content to load.
A true alerting system should exclude previous results. I could implement that by storing previous results in a file and referring to it every time I run the script.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Facebook Keyword Alerts

Table of contents

Requirements

How it works

Video demo

Launch

Scraping Facebook

Challenges

Finding reliable CSS selectors

Making content visible to Selenium

Alerts with Gmail's API

Features to consider adding

Files

README.md

Latest commit

History

README.md

File metadata and controls

Facebook Keyword Alerts

Table of contents

Requirements

How it works

Video demo

Launch

Scraping Facebook

Challenges

Finding reliable CSS selectors

Making content visible to Selenium

Alerts with Gmail's API

Features to consider adding