Movie Google Knowledge Panel Dataset: 1950--2020

We iterate through the Wikipedia list of movies from 1950--2020 and then scrape Google Knowledge Panel for these movies using Apify's Google Search Results Scraper. The final dataset including the raw html of the SERP pages as well as the parsed Knowledge Panels is posted here on Harvard Dataverse. Feel free to directly explore on Kaggle too.

We use the dataset to estimate the correlation between average reviews across different platforms.

Scripts

Get Movie List
Google Knowledge Panel Scraper

Future

It'd be nice (and not super difficult) to generalize this code to be able to automatically parse any google knowledge panel beyond movies for all of the structured data.

Author

Noah Finberg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Movie Google Knowledge Panel Dataset: 1950--2020

Scripts

Future

Author

Files

README.md

Latest commit

History

README.md

File metadata and controls

Movie Google Knowledge Panel Dataset: 1950--2020

Scripts

Future

Author