A Python script that generates email alerts when specific keywords are mentioned in Facebook group posts. Uses Selenium, Gmail's API and the Firefox browser.
The script requires a number of items in order to run:
-
Firefox browser installed on your local machine.
-
geckodriver file downloaded in this folder. The file will enable you to run Firefox using Selenium.
-
A Gmail account that will be used to send the automated alerts.
-
A
credentials.json
file in this folder, which contains credentials enabling Gmail API on your gmail account. You need to download this file from here by clicking theEnable the Gmail API
button. -
An
input.json
file that contains the Facebook group URLs, keywords used for alerts, sender and receiver email addresses.The file also needs to hold the path to the Firefox profile you wish to use with the browser. You can find it by opening
about:support
in the Firefox browser and looking at the path in theProfile Directory
row - see this for more details.This is how the json file should be structured:
{
"alerts": [
{
"url": "https://www.facebook.com/groups/SaaSgrowthhacking/",
"keywords": ["marketing", "sales"]
},
{
"url": "https://www.facebook.com/groups/DeepNetGroup/",
"keywords": ["pytorch", "nvidia"]
}
],
"firefox_profile_path": "/home/jon/.mozilla/firefox/yy6ndmx3.default-release",
"gmail": {
"sender": "mihail.automated.alerts@gmail.com",
"receiver": "mihailmarian12@gmail.com"
}
}
Click here for a video demonstration.
Once all the requirements are fulfilled, you can launch the script from main.py. The script takes 5-10 minutes to complete, which is why I've added several print
statements indicating the different stages it's going through.
The script starts by reading the JSON object from the input.json
file. The alerts
and firefox_profile_path
keys inside this object are then used to create an instance of the Alerts
class. The resulting object is initialized with the results
key holding all of the relevant data scraped from Facebook.
I leverage Selenium's Python bindings in order to fetch this data. The bindings are placed in the SeleniumBrowser
custom class, together with additional methods that allow me to parse Facebook's website.
There were two main challenges I faced looking for the right data on Facebook's web pages.
The HTML structure is incredibly complex, with a very large number of layers that have indistiguishable attributes. For example, here's how far I had to go in order to find a reliable selector for the post publisher's name:
I eventually managed to distinguish a pattern:
<!--wrapper for the entire post-->
<div role="article" aria-posinset>
<h2><a>Publisher Name</a></h2>
<div dir="auto">Post content</div>
</div>
I catch any exceptions when I parse through a post because I still get several incorrect results even with this pattern.
On page load, Facebook only generates a small number of posts for obvious reasons. I was able to resolve this by asking Selenium to press the END
key 7 times (arbitrary number) when the page is loaded.
When loading a lot of posts, you have to be careful because Facebook actually hides elements that are too far away from your viewport.
This is important because Selenium can't interact with elements that are hidden. In my case, I had to make sure I click all the "See more" links for the posts content before I hit the END
key to load more content
def expand_results(self):
self.click_see_more_buttons()
body = self.find_element_by_tag_name("body")
for _ in range(7):
print("--scrolling down for more results...")
body.send_keys("webdriver" + Keys.END)
time.sleep(10)
self.click_see_more_buttons()
Once the results are gathered, I use the Alerts
object's email
method to send the email to the address specified in input.json
. The method calls upon another custom class GmailAlert
which enables us to leverage Gmail's API.
When you initalize a GmailAlert
object for the first time, your default browser will be opened on a page where Google asks you to confirm whether you'd like the script to access your Gmail account.
- Parse web pages with BeautifulSoup after loading them and expanding the feed with Selenium. That way, you don't have to click the "See more" buttons and you avoid all the issues that come with it. You might even be able to fetch more posts from the Facebook group.
- I'm usually setting fixed waiting times for when Selenium should move to the next action. The script would execute faster if I dynamically wait for the relevant content to load.
- A true alerting system should exclude previous results. I could implement that by storing previous results in a file and referring to it every time I run the script.