Skip to content

Uses Apify to scrape Google Search Engine Results pages for all American Films between 1950-2020 and then parses the knowledge panel structured data.

Notifications You must be signed in to change notification settings

NoahFinberg/google_kg_movie_scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Movie Google Knowledge Panel Dataset: 1950--2020

We iterate through the Wikipedia list of movies from 1950--2020 and then scrape Google Knowledge Panel for these movies using Apify's Google Search Results Scraper. The final dataset including the raw html of the SERP pages as well as the parsed Knowledge Panels is posted here on Harvard Dataverse. Feel free to directly explore on Kaggle too.

We use the dataset to estimate the correlation between average reviews across different platforms.

Scripts

Future

  • It'd be nice (and not super difficult) to generalize this code to be able to automatically parse any google knowledge panel beyond movies for all of the structured data.

Author

Noah Finberg

About

Uses Apify to scrape Google Search Engine Results pages for all American Films between 1950-2020 and then parses the knowledge panel structured data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages