Table of contents
In this project the dataset that is being investigated is the TMDb Movie Dataset which has over 10000 movies with release dates dating from 1960 to 2015.
The questions that are investigated in this project are:
- Investigating the trend in popularity of movies as the time progresses from 1960 to 2015
- Investigating the correlation of popularity, budget, vote count with the revenue
The dataset is loaded into the Jupyter Notebook and checks are done to see if the dataset does not contain any missing values, upon checking it is discovered that there are variables with missing values. It is decided that all the missing values should be dropped so that all the variables have the same number of data points. Columns that were initially thought to not be useful in the investigation were dropped. The release date was converted to datetime so that it can be used to plot a time series to show the trend of some of the variables as time progresses. The dataframe was sorted in asscending order using the release date as the key, this made it easier to plot time series graphs.
Statistics were computed and visualizations were created with the goal of addressing the research questions in the Introduction section. In this case time series plots and scatter plots were created to answer the research questions.
git clone https://github.com/imukoki/Investigate-a-Dataset.git
cd Investigate-a-Dataset
Jupyter notebook
- Pandas
- Numpy
- Matplotlib
👤 Innocent Mukoki
- GitHub: Innocent Mukoki
- LinkedIn: Innocent Mukoki