-
Notifications
You must be signed in to change notification settings - Fork 0
Blocks Description
You can check the summary of the videos used at this project at this notebook
At the bellow image it's possible to check the macro view of the whole data generating process. At the left you have the dataset folder, witch is responsible to store all the results files from the process, and at the right you have the source codes witch are responsible to process each part of the data generation. Let's go step by step for a better understanding:
-
The M1.1 and M1.2 modules receives the S0 (source video from local file and a list of links from youtube videos located at the video source folder at in_DD-Local and in_REF-Gold respectively) and saves the basic information at the F0 (VD_INFO.CSV) file. More detailed explanation here.
-
After that, the M2 module receives both the S0 (same one from the previous step) and the F0, wich is the file with the videos information, this file has the video name or link, this will be necessary to find the video if it is local or even on online sources like youtube. This module is responsible for extracting the landmarks from each frame of the video, resulting at the F1 (VD_FEATURES_L1.CSV) file. More detailed explanation here.
-
Now it's time for the video adjusting, here we have the M3.1 module witch receives the F1 file and interpolate it's discontinuities (we consider discontinuites if the landmarks on that frame was not able to be captured, so if this discontinuity is less or equal 5 frames we can interpolate it, if it is more than 5 frames, we just assume that are two diferent series and don't interpolate it), resulting at the F2 (VD_FEATURES_L2.CSV) file. More detailed explanation here.
-
Continuing with the video adjusting step, now we proceed for the M3.2 module, it will receive the F2 file (video with it's discontinuities interpolated) and apply the spacial normalization at the video, normalizing the Z axis and roll rotation, resulting at the F3 (VD_FEATURES_L3.CSV) file. More detailed explanation here.
-
With the video adjusting step concluded, it's time for the measure making step (each measure can be represented like a pair of landmarks, for example, the M3 measure is the pair 49-55 on figure 7), M4. This module receives the F3 file and performs the measures in it, resulting at the F4 (VD_MEASURES_L0.CSV) file. More detailed explanation here.
-
Finally it's time for the automatic labeling process, M6. This module will receive both the F4 (representing the entire video) and MF4 (representing the seed reference that will be searched inside the entire video) files and perform the search for the MF4 file at all the F4 files of all the videos, resulting on a F6 (VD_LABELED_L0.CSV) file for each video. More detailed explanation here.
Figure 1: Blocks resume at a macro view
The Indexer Module performs the indexing of videos from two sources, YouTube and Local, while also extracting the initial characteristics of the videos. This block is divided in two modules, YouTube Video Indexer and Local Indexer.
Figure 2: Example of video indexing
This module allows users to add links to the NEW_VIDEO_LIST.LST file, which will then be indexed.
- A file containing a list of links to YouTube videos
- In each video folder, a file named VD_INFO.CSV is added, which aggregates basic information throughout the video qualification process.
Users can place videos with unique names in the video source folder, and the module will automatically index all the videos.
- A video source folder containing videos in .mp4 format downloaded locally.
- In each video folder, a file named VD_INFO.CSV is added, which aggregates basic information throughout the video qualification process.
Basic Information will be saved into the VD_INFO.CSV file
- video_id: Video ID
- origin_vid: Video origin, either local (D) or youtube (y)
- process_status: Processing stage (I for indexed) (L for labeled)
- link_video: Video link or name (if locally downloaded)
- height_vid: Video height in pixels
- width_vid: Video width in pixels
- duration_vid: Video duration in seconds
- fps_vid: Video FPS
- total_frames: Total number of frames in the video
- time_step_fr: Frame update frequency
If you want to check how the indexer works, just go at this notebook and run all the cells.
The Feature Extractor Module extracts primary and raw landmarks for each frame of videos indexed on YouTube and Local folders.
Figure 3: Example of feature extraction
- File containing information of each video, VD_INFO.CSV, and the video frames.
File containing frame-by-frame extraction, VD_FEATURES_L1.CSV, with the 68 points describing the face based on DLIB.
If you want to check how the feature extractor process works, just go at this notebook and run all the cells.
The Video Adjuster Module analyzes temporal discontinuities between frames, characterizes them, and performs interpolations to correct them. The limit of frames that will be interpolated is by default 5, but it can be changed on the source code. If the discontinuity if above the frame limit, then the series are considered different, and they will be separated.
After pursuing the interpolations, this module also performs the spacial normalization at the Z axis and roll rotation.
Figure 4: Example of video frames interpolation
- Raw landmarks extracted series VD_FEATURES_L1.CSV
- Landmarks extracted series with discontinuities interpolated VD_FEATURES_L2.CSV
Bellow is a image of how the spacial normalization process works, on the left you have the original landmarks and on the right is possible to see the rotated ones.
Figure 5: Example of video frames spacial normalization
For the Z axis normalization, it was used a default size for a chosen measure. The measure is the distance between the point 1 and point 17 at the figure 3 bellow, the default distance chosen was 100 (this distance is calculated with the euclidean distance using the x and y coordinates of those points). As the face go backwards the new distance between the 1 and 17 points will decrease, and a scale factor will be increased so it's possible to maintain the distance fo 100, this scale factor multiplies every other points on the face so it can normalize the whole face using only the default distance between the 1 and 17 points of the face.
Now for the roll rotation (illustrated on figure 5) normalization it's necessary to calculate the rotation angle of the head, for this task was used the distance of the center of the eyes, when the eyes are horizontally aligned they form a angle equals to 0, but when you start to rotate the head at the roll rotation, the line connecting the center of the eyes start to form a certain angle with the line between the center of the eyes when the head was at starting position, check this angle on figure 7. This angle will be used to multiply all the landmarks points of the face so it can go for it's default position. This calculus can be seen at figure 6 below.
Figure 6: Equations used for Roll normalization.
Figure 7: Image of the face landmarks points
Figure 8: Roll rotation illustration
- Interpolated landmarks VD_FEATURES_L2.CSV
- Landmarks normalized at Z and roll axis VD_FEATURES_L3.CSV
If you want to check how the video adjuster process works, just go at this notebook and run all the cells.
The Measure Maker Module standardizes measurements based on FACS fundamentals and extracts 22 measures transformed into time series.
Figure 9: Example of measures making
Landmarks time series with discontinuities interpolated and spacial normalized frames VD_FEATURES_L3.CSV
Time series with 22 measures each, resulting in VD_MEASURE_L0.CSV
If you want to check how the measure making process works, just go at this notebook and run all the cells.
For the manual labeler we have a little bit different process, it will be responsible for the manual labeling step for each video seed we will have, these video seed will be useful for searching them into the original videos. With the seed videos been labeled, for example, with the happy emotion, the automatic pattern search will search for this video pattern at every video in the dataset, when finding this pattern the automatic pattern search will label that frames with the emotion the seed video was initially labeled. This module counts with some steps:
-
It first performs indexation at the M5.1 module, this process receives the source video S0 and get the basic information of every video inside the REF-Gold-Label file, resulting at a MF0 (VD_INFO.CSV) file for each seed video.
-
Now the M5.2 module receives the S0 and MF0 file and performs the extraction of the features from it, resulting at the MF1 file with the features extracted.
-
The M5.3 module performs the spacial normalization of the previously extracted features, it will normalize both the Z axis and the roll rotation resulting at the MF2 (VD_FEATURES_L2.CSV) file.
-
For the M5.4 module we have the measure making process, witch will receive the MF2 file and give as result the MF3 file with the measures done.
-
Finally we have the M5.5 module, this is the most important one and is where the labeling itself happens. This module will receive the MF3 file with the measures done and will result at the MF4 file with will be a file labeled with a emotion, happy, sad or surprise, for example.
Figure 10: Manual labeling blocks resume
Bellow is a brief example of how the M5.5 step works, the number 1 represents the frames of the video that will be labeled, the number 2 is the time series correspondent to that frames (MF3 on figure 7) and number 3 is that time series marked with a certain emotion (MF4 at the figure 7), this case is anger. This is a manual process where you choose the initial and final frame to be labeled and the emotion you want to label that interval. This labeled interval will be further used to search this time series pattern at all the videos time series at the dataset folder, when the automatic pattern search finds a similar pattern, it will label that interval of frames with the emotion manually labeled at the seed file.
Figure 11: Manual labeling M5.5 step example
- Seeds S0, called reference videos, only with the expression (smile, for example).
- MF4 file, a multivariate time series corresponding to the video labeled.
From the previously cataloged reference patterns and transformed into time series (seeds), performs an automatic search for similar ones in the raw data mass created in Flow A and performs labeling, creating the labeled files of the proposed dataset in this research for future training. This matching process is done using the euclidean distance and the stumpy matching algorithm.
Figure 12: Labeling process, 1 representing the seeds, 2 representing the measurement time series and 3 representing the labeled series.
- File with FACS measures extracted from VD_MEASURE_L0.CSV, represented by 2 in Figure 6 above.
- Cataloged references at flow B (seed), represented by 1 in Figure 6 above.
- Labeled series stored in VD_LABELED_L0.CSV and represented by 3 in Figure 6 above.
If you want to check how the automatic labeler process works, just go at this notebook and run all the cells.
If you want to see the euclidian algorithm results for each sequences, go for this notebook.
From the labeled videos by the automatic labeler we can train a neural network using tensorflow for predicting the emotion at any other video who will be used as the input for the inference. For more information about the neural network, click here