Skip to content

Latest commit

 

History

History
19 lines (16 loc) · 1.83 KB

File metadata and controls

19 lines (16 loc) · 1.83 KB

[Aborted] Glassdoor Interview Questions Scrapper

The project is aborted. Many issues are there that prevented scrapping, some of them are resolved but others cannot be:

  1. ChromiumDriver used in Selenium is a test driver and Google doesn't allow proper user sign-in. Tried to run a Chrome instance in a different port and use the same port from the code but that too didn't work (Chrome driver isn't picking the port or the port is not used by Selenium - mismatch).

$ /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222

$ chromedriver_mac_arm64/chromedriver --remote-debugging-port=9515

$/opt/homebrew/bin/chromedriver --remote-debugging-port=9515

  1. Another way was to avoid Google detecting the test Chrome browser by installing undetected_chromedriver driver. This allowed me to sign in to my Google account, but testing became difficult every time with 2-step-verification from Google.

python3 -m pip install undetected_chromedriver

  1. To solve the above problem had to venture into Cookies, which would preserve the session for about 30 mins.

def loadCookies(self):

def saveCookies(self):

  1. The above worked fine however, now the issue is with the HTML. Glassdoor never loads the entire page. And, for some reason, the elements are not identified by Selenium.

Considering the above issues I've aborted the project. Other alternatives would be to write a Chrome Extension in jQuery. And, it is out of scope for this project at the moment (15/04/2024).