furiosa

An end-to-end Data Science application for predicting revenue for films from the 2010s.

Getting Started

Ensure you have Python 3.5 or greater installed. You can use pip or anaconda. You can download the latest version here.

1. Clone the repository

Navigate to the folder in which you want to store this repository. Then clone the repository and change directory to the repository:

git clone https://github.com/jklewis99/furiosa.git
cd furiosa

2. Activate a virtual environment:

With pip:

Windows

py -m venv [ENV_NAME]
.\[ENV_NAME]\Scripts\activate

Linux/Mac

python3 -m venv [ENV_NAME]
source [ENV_NAME]/bin/activate

With conda:

conda update conda
conda create -n [ENV_NAME]
conda activate [ENV_NAME]
conda install pip # install pip to allow easy requirements.txt install

3. Install the requirements:

pip install -r requirements.txt

Data

The data for this project was generated using a set of APIs and databases. All databases can be found here.

Movies

The initial dataset came from the MovieLens 25M Dataset. Only movies that were released in the 2010's decade (2010-2019) were kept. For each of these movies, a request was made to the TMDB API for updated or new features for budget, title, vote_count, vote_average, revenue, runtime, popularity, and overview. After/During these requests, additional requests were made to get information on credits and crew and release_dates.

Trailers

To get data for trailers, the YouTube Data API was used. The YouTube API's search list method and Videos list methods were used to get data on trailers, specifically title, channel_title, channel_id, description, release_date, tags, view_count,like_count, dislike_count, and comment_count (features renamed for Python syntax), with a similarity score added based on the similarity_score metric.

Setting up YouTube API calls

The main API requests used in this project can be found in the youtubeAPIrequests.py file. In order for this file to work on your computer without access tokens and refresh tokens (and signing in at every execution of the file), there are a few steps to follow:

1. Set up the YouTube Data API on your Google account

Follow directions at the YouTube Data API Overview page to get started.

2. Set up the Google Cloud API

Follow directions at the Getting Started with authentication page to get started. When you have the JSON file that identifies the credientials to your application, set up your Environment Variable (System Properties -> Advanced -> Environment Variables -> User variables for {USER} -> New) for GOOGLE_APPLICATION_CREDENTIALS to the path where your application's JSON file is saved. If you do not wish to set up this environment variable globally, you can set it up at the beginning of each shell session instead. These instuctions can be found here.

3. Verify authentication

Run the following commands in the console at the root of the furiosa directory:

cd examples
python youtube_api_test.py

This should return the following:

Toy Story 3: Trailer - Walt Disney Studios

Data Overview

Much of the visualization and evaluation of the data itself can be found in the notebooks folder. Additionally, many graphs that represent the significance of each feature and some other visual data representations are available in the correlations folder in figures

Network Success

The following machine learning models were used in this repository:

Neural Networks (Regression)
Support Vector Machines (Regression)
Linear Regression
Decision Tree (Regression)
Random Forest (Regression)

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
dbs		dbs
errors		errors
examples		examples
figures		figures
notebooks		notebooks
utils		utils
video-editor-cached		video-editor-cached
weights		weights
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
additional_dbs.py		additional_dbs.py
append_trailer_dbs.py		append_trailer_dbs.py
automate_testing.py		automate_testing.py
automate_training.py		automate_training.py
banner.png		banner.png
create_database.py		create_database.py
create_trailer_databases.py		create_trailer_databases.py
figure_garbage.py		figure_garbage.py
forest40-plotly.csv		forest40-plotly.csv
furiosaforest.py		furiosaforest.py
furiosalinear.py		furiosalinear.py
furiosanet.py		furiosanet.py
furiosanet2.py		furiosanet2.py
furiosasvr.py		furiosasvr.py
garbage.py		garbage.py
model-evaluation.csv		model-evaluation.csv
naive_bayes.py		naive_bayes.py
notes.txt		notes.txt
pickled-naive_bayes.pickle		pickled-naive_bayes.pickle
prerelease_models.py		prerelease_models.py
presentation.py		presentation.py
requirements.txt		requirements.txt
similarity_score.py		similarity_score.py
tmdbAPIrequests.py		tmdbAPIrequests.py
trailer_linear_regression.py		trailer_linear_regression.py
youtubeAPIrequests.py		youtubeAPIrequests.py
youtube_video.py		youtube_video.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

furiosa

Getting Started

1. Clone the repository

2. Activate a virtual environment:

With pip:

With conda:

3. Install the requirements:

Data

Movies

Trailers

Setting up YouTube API calls

1. Set up the YouTube Data API on your Google account

2. Set up the Google Cloud API

3. Verify authentication

Data Overview

Network Success

About

Releases

Packages

Contributors 2

Languages

License

jklewis99/furiosa

Folders and files

Latest commit

History

Repository files navigation

furiosa

Getting Started

1. Clone the repository

2. Activate a virtual environment:

With pip:

With conda:

3. Install the requirements:

Data

Movies

Trailers

Setting up YouTube API calls

1. Set up the YouTube Data API on your Google account

2. Set up the Google Cloud API

3. Verify authentication

Data Overview

Network Success

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages