Skip to content

An end-to-end data science application for predicting revenue for films released during the 2010s

License

Notifications You must be signed in to change notification settings

jklewis99/furiosa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

furiosa

An end-to-end Data Science application for predicting revenue for films from the 2010s.

Getting Started

Ensure you have Python 3.5 or greater installed. You can use pip or anaconda. You can download the latest version here.

1. Clone the repository

Navigate to the folder in which you want to store this repository. Then clone the repository and change directory to the repository:

git clone https://github.com/jklewis99/furiosa.git
cd furiosa

2. Activate a virtual environment:

With pip:

Windows

py -m venv [ENV_NAME]
.\[ENV_NAME]\Scripts\activate

Linux/Mac

python3 -m venv [ENV_NAME]
source [ENV_NAME]/bin/activate
With conda:
conda update conda
conda create -n [ENV_NAME]
conda activate [ENV_NAME]
conda install pip # install pip to allow easy requirements.txt install

3. Install the requirements:

pip install -r requirements.txt

Data

The data for this project was generated using a set of APIs and databases. All databases can be found here.

Movies

The initial dataset came from the MovieLens 25M Dataset. Only movies that were released in the 2010's decade (2010-2019) were kept. For each of these movies, a request was made to the TMDB API for updated or new features for budget, title, vote_count, vote_average, revenue, runtime, popularity, and overview. After/During these requests, additional requests were made to get information on credits and crew and release_dates.

Trailers

To get data for trailers, the YouTube Data API was used. The YouTube API's search list method and Videos list methods were used to get data on trailers, specifically title, channel_title, channel_id, description, release_date, tags, view_count,like_count, dislike_count, and comment_count (features renamed for Python syntax), with a similarity score added based on the similarity_score metric.

Setting up YouTube API calls

The main API requests used in this project can be found in the youtubeAPIrequests.py file. In order for this file to work on your computer without access tokens and refresh tokens (and signing in at every execution of the file), there are a few steps to follow:

1. Set up the YouTube Data API on your Google account

Follow directions at the YouTube Data API Overview page to get started.

2. Set up the Google Cloud API

Follow directions at the Getting Started with authentication page to get started. When you have the JSON file that identifies the credientials to your application, set up your Environment Variable (System Properties -> Advanced -> Environment Variables -> User variables for {USER} -> New) for GOOGLE_APPLICATION_CREDENTIALS to the path where your application's JSON file is saved. If you do not wish to set up this environment variable globally, you can set it up at the beginning of each shell session instead. These instuctions can be found here.

3. Verify authentication

Run the following commands in the console at the root of the furiosa directory:

cd examples
python youtube_api_test.py

This should return the following:

Toy Story 3: Trailer - Walt Disney Studios

Data Overview

Much of the visualization and evaluation of the data itself can be found in the notebooks folder. Additionally, many graphs that represent the significance of each feature and some other visual data representations are available in the correlations folder in figures

Network Success

The following machine learning models were used in this repository:

  • Neural Networks (Regression)
  • Support Vector Machines (Regression)
  • Linear Regression
  • Decision Tree (Regression)
  • Random Forest (Regression)

About

An end-to-end data science application for predicting revenue for films released during the 2010s

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published