Skip to content

Investigating the TMDb Movie Dataset to find the trend in popularity of movies and the correlation of popularity, budget, vote count with the revenue.

Notifications You must be signed in to change notification settings

imukoki/Investigate-a-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Project 1: Investigate-a-Dataset

Udacity logo

Table of contents

Introduction

In this project the dataset that is being investigated is the TMDb Movie Dataset which has over 10000 movies with release dates dating from 1960 to 2015.

The questions that are investigated in this project are:

  1. Investigating the trend in popularity of movies as the time progresses from 1960 to 2015
  2. Investigating the correlation of popularity, budget, vote count with the revenue

Data-Wrangling

The dataset is loaded into the Jupyter Notebook and checks are done to see if the dataset does not contain any missing values, upon checking it is discovered that there are variables with missing values. It is decided that all the missing values should be dropped so that all the variables have the same number of data points. Columns that were initially thought to not be useful in the investigation were dropped. The release date was converted to datetime so that it can be used to plot a time series to show the trend of some of the variables as time progresses. The dataframe was sorted in asscending order using the release date as the key, this made it easier to plot time series graphs.

Exploratory-Data-Analysis

Statistics were computed and visualizations were created with the goal of addressing the research questions in the Introduction section. In this case time series plots and scatter plots were created to answer the research questions.

Installation

git clone https://github.com/imukoki/Investigate-a-Dataset.git
cd Investigate-a-Dataset
Jupyter notebook 

Requirements

  • Pandas
  • Numpy
  • Matplotlib

Author

👤 Innocent Mukoki

About

Investigating the TMDb Movie Dataset to find the trend in popularity of movies and the correlation of popularity, budget, vote count with the revenue.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published