Skip to content

Latest commit

 

History

History
81 lines (64 loc) · 2.94 KB

README.md

File metadata and controls

81 lines (64 loc) · 2.94 KB

metacritic-movies

This project uses Python and regular expressions to create a web scraper that searches for movie titles, dates, descriptions, metascores, and images in Metacritic. It gets the Metacritic url and constructs a list of movies from a particular year and page and then writes it to a csv file. It then reads the file and performs an analysis on the data.

The project is built using Python and Regual Expressions in Jupyter Notebook.

Built With

Visual Studio Code Jupyter Notebook Python Pandas Matplotlib MongoDB

Getting Started

Imports used to run this program:

  • re
  • urlib3
  • certifi
  • json
  • pymongo
  • time
  • pandas
  • matplotlib (pyplot and FormatStrFormatter)
  • Seaborn

To install in terminal:

  1. Open terminal
  2. path\to\project\file: pip3 install {package to install}

How To Use

This project uses two files, one for the scraper and another for the analysis.

metacritic-scraper

Connect to MongoDB

with open("/fileLocation/credentialsFileName.json") as f:
  data = json.load(f)
  mongo_connection_string = data ['mongodb']

Retrieve the data in your MongoDB collection

client = pymongo.MongoClient(mongo_connection_string, tlsCAFile=certifi.where())
db1_database = client['databaseName']
metacritic_data = db1_database['collectionName']

Get the Metacritic url

url = "https://www.metacritic.com/browse/movies/score/metascore/year/filtered?year_selected=(year)&sort=desc&view=detailed&page=(page)"

metacritic-analysis

Retrieve credentials from json credentials file stored on local computer and fetch the MongoDB collection

# Retrieve credentials
with open("/fileLocation/credentialsFileName.json") as f:
  data = json.load(f)
  mongo_connection_string = data ['mongodb']
  
# Fetch the database named "DB1"
client = pymongo.MongoClient(mongo_connection_string, tlsCAFile=certifi.where())
db1_database = client['databaseName']
metacritic_data = db1_database['collectionName']
metacritic = pd.DataFrame(metacritic_data.find())

Add year and month columns to dataframe

metacritic['year'] = metacritic.release_date.dt.year
metacritic['month'] = metacritic.release_date.dt.month

img

License

Distributed under the MIT license. See LICENS.txt for more information.