Skip to content

Latest commit

 

History

History
120 lines (79 loc) · 5.06 KB

File metadata and controls

120 lines (79 loc) · 5.06 KB

Feature engineering notebooks and cloud functions

Cloud functions

Google's GCP Cloud Functions are a convenient way of generating training data. They allow for a synchronous execution of several atomized and short chunks of code. In the /cloud_functions folder we have included the neccessary tools for the generation of the training data.

GCP steps

1.- In order to be able to make use of the Cloud Functions, one must have previously set up a Google Cloud account.

2.- Enable Storage API.

3.- Upload your rendition video data to a bucket named livepeer-verifier-renditions and your original video data to another named livepeer-verifier-originals.

4.- Enable Datastore API. Here is where the generated training data will be stored.

Local steps

Once the above essential steps are done, browse to the /cloud_functions folder and execute:

bash deploy.sh

The bash script will output a:

Deploying function (may take a while - up to 2 minutes)...⠏

Once it is finished, and in case your video data is located in the verification-classifier/data folder, you can launch the script:

bash call_cloud_function.sh

This should iterate asset by asset through your verication-classifier/data and send a call to the cloud function, generating the needed inputs for further training of the models.

Jupyter notebooks

Jupyter notebooks employed in the experiments are stored here. To interact with them, it is reccommended to build and launch Docker image as explained below.

1.- Build the image

To build the image, we have to run the following shell script that lies in the same folder of the repo:

bash build_docker.sh

This will create a image based on jupyter/datascience-notebook but adding the needed python dependencies. This image contains ffmpeg with VMAF and libav with ms-ssim.

2.- Run the image

To run the image, we have to type the following IN THE ROOT FOLDER OF THE REPO:

docker run -p 8888:8888 --volume="$(pwd)":/home/jovyan/work/ epicjupiter:v1

This will run the image on the port 8888 and mount a volume with the contents of this repo in the folder /home/jovyan/work/.

Copy, paste (and ammend by removing spurious information) the URL provided in the console and navigate to the work folder to access the notebooks. Alternatively, navigate to http://127.0.0.1:8888 and copy / paste the provided token in the console into the Password or token input box to log in.

If you are using symbolic links to point the videos from the data folder to other folder, you need to mount the other folder to be visible in the cointainer.

For example if we have symbolic links in the data folder pointing to /videos/ folder we need a new volume as follows:

docker run -d -p 8888:8888 --volume="$(pwd)":/home/jovyan/work/ --volume=/videos/:/videos/ epicjupiter:v1

Also it is important to have read and write permissions in the output folder in order to be able to store the results.

3.- Notebooks

The notebooks used in the experiments are inside the folder work/notebooks

3.1.- Tools.ipynb

In order to run experiments there is need for preparing some data.

This notebook contains different sections to generate different datasets or auxiliary files that are needed in order to run the other notebooks.

The notebook can be found here

3.2.- Compare_videos.ipynb

We have taken a number of assets from Youtube’s YT8M dataset and encoded a few renditions from there. Specifically, we have taken about 140 videos from this dataset, established the 1080p rendition as original, and encoded 10 seconds of each to 720p, 480p, 360p and 240p. For the sake of simplicity, we have reduced the respective bitrates to be equal to those used by YouTube for each rendition (you can find a more detailed article on how this can be done here.

We have also invited a few more full reference metrics to the party, namely cosine, euclidean and hamming distances, so we add more diversity to the analysis.

Once we have gathered our renditions, we have iterated video by video (4 renditions x 140 videos = 560 specimens) and extracted their mean PSNR, SSIM, MS-SSIM, VMAF, cosine, Hamming and euclidean hash distances with respect to the original 1080p rendition.

This notebook expects the videos to reside in a data folder with the following structure. The structure needs to be created in the data folder of the repo, which is empty as default.

data
├── 1080p
│   └── 01.mp4
├── 720p
│   └── 01.mp4
├── 480p
│   └── 01.mp4
├── 360p
│   └── 01.mp4
│── 240p
│   └── 01.mp4
└── 144p
    └── 01.mp4    

The result will be stored in the /output folder

The notebook can be found here

3.3.- Metric analysis.ipynb

This notebook expects a file metrics.csv as input. For convenience, a sample can be found in the output folder as generated by the the Compare_videos.ipynb notebook.

The notebook can be found here