Models
- Basel Face Model (BFM'09): click here to obtain the Basel Face Model from the University of Basel
- Expression Model: Download the expression model (the Exp_Pca.bin) file from this link
Software
- CUDA (tested with v12.2)
- OpenCV (tested with v4.8.1; for fast landmark detection, we strongly advise to compile OpenCV with CUDA)
- Python 3
The following python packages are also needed, but these can be installed by following the instructions in Section 2 of Installation below.
- cvxpy (for temporal smoothing via post-processing)
- scikit-learn
- matplotlib
- opencv-python
Download the code by running the commands below
wget https://github.com/Computational-Psychiatry/3DI/archive/refs/tags/v0.2.0.zip
unzip v0.2.0.zip
cd 3DI-0.2.0
and compile the CUDA code as below
cd build
chmod +x builder.sh
./builder.sh
Install the necessary packages via pip. It is advised to use a virtual environment by running and update pip
cd build
python3 -m venv env
source env/bin/activate
pip install --upgrade pip
The necessary packages can simply be installed by running.
pip install -r requirements.txt
Make sure that you downloaded the Basel Face Model (01_MorphableModel.mat
) and the Expression Model (Exp_Pca.bin
) as highlighted in the Requirements section above. Then, copy these model files into the build/models/raw
directory. Specifically, these files should be in the following locations:
build/models/raw/01_MorphableModel.mat
build/models/raw/Exp_Pca.bin
Then run the following python script to adapt these models to the 3DI code:
cd build/models
python3 prepare_BFM.py
Also, you need to unpack the landmark models etc. in the same directory
tar -xvzf lmodels.tar.gz
Go to the build
directory and, if you used a virtual environment, activate it by running source env/bin/activate
. Then, the following is an example command (will produce visuals as well)
python process_video.py ./output testdata/elaine.mp4
The produced files are in the subdirectory created under the output
directory. The file with expression parameters has the extension .expressions_smooth
and the pose parameters have the extension .poses_smooth
. The script above takes approximately 6 minutes to run, and this includes the production of the visualization parameters as well as temporal smoothing. Some tips to reduce total processing time:
- Visualization (i.e., production of rendering videos) can be disabled by passing the parameter
--produce_3Drec_videos=False
. - Temporal smoothing can be disabled by passing the parameter
--smooth_pose_exp=False
- The reconstruction can be sped up by passing the parameter
--cfgid=2
to use the 2nd configuration file, which is faster although results will include more jitter
If you want to compute local expression basis coefficients (experimental), you can run:
python process_video.py ./output testdata/elaine.mp4 --compute_local_exp_coeffs=True
More parameters can be seen by running
python process_video.py --help
The process_video.py
script does a series of pre- and post-processing for reconstruction (details are in the section below). It is important to know that we first estimate the identity parameters of the subject in the video, by using a small subset of the video frames, and then we compute pose and expression coefficients at every frame. Thus, the identity parameters are held common throughout the video.
The process_video.py
script does a series of processes on the video. Specifically, it does the following steps in this order:
- Face detection on the entire video
- Facial landmark detection on the entire video
- 3D (neutral) identity parameter estimation via 3DI (using a subset of frames from the videos)
- Frame-by-frame 3D reconstruction via 3DI (identity parameters are fixed here to the ones produced in the previous step)
- (Optional) Temporal smoothing
- (Optional) Production of 3D reconstruction videos
- (Optional) Production of video with 2D landmarks estimated by 3DI
The first four steps are visualized below; the blue text indicates the extension of the corresponding files
The 2D landmarks estimated by 3DI are also produced optionally based on the files produced above.
Below are the extensions some of the output files provided by 3DI video analysis software:
.expressions_smooth
: A text file that contains all the expression coefficients of the video. That is, the file contains aTx79
matrix, where thet
th row contains the 79 expression coefficients of the expression (PCA) model.poses_smooth
: A text file that contains all the poses coefficients of the video. The file contains aTx9
matrix, where the first 3 columns contain the the 3D translation(tx, ty, tz)
for all theT
frames of the video and the last 3 columns of the matrix contain the rotation (yaw, pitch, roll
) for all theT
frames..local_exp_coeffs.*
(requires--compute_local_exp_coeffs=True
): Localized expression coefficients for the video; a text file that containsTxK
entries, whereK
is the number of coefficients of the basis that is used to compute the expressions. See video here for an illustration of to what each coefficient corresponds to (e.g., the 25th coefficient in this text file indicates activation in the 25th basis component in the video)..2Dlmks
: A text file with the 51 landmarks corresponding to the inner face (see below), as predicted by 3DI. The file contains a matrix of sizeTx102
, where each row is of the format:x0 y0 x1 y1 ... x51 y50
..canonicalized_lmks
: A text file with the 51 landmarks corresponding to the inner face (see below), after removing identity and pose variation. The file contains a matrix of sizeTx153
, where each row is of the format:x0 y0 z0 x1 y1 z1 ... x50 y50 z50
.
The following are the relative IDs of the landmarks corresponding to each facial feature:
{'lb': [0, 1, 2, 3, 4], # left brow
'rb': [5, 6, 7, 8, 9], # right brow
'no': [10, 11, 12, 13, 14, 15, 16, 17, 18], # nose
'le': [19, 20, 21, 22, 23, 24], # left eye
're': [25, 26, 27, 28, 29, 30], # right eye
'ul': [31, 32, 33, 34, 35, 36, 43, 44, 45, 46], # upper lip
'll': [37, 38, 39, 40, 41, 42, 47, 48, 49, 50]} # lower lip
The processing of a video has a number of steps (see Running the code), and the table below lists the computation time for each of these. We provide computation time for two different configuration files (see cfgid
above). The default configuration file (cfgid=1
) leads to significantly less jitter in the results but to also longer processing times, whereas the second one (cfgid=2
) works faster but yields more jittery videos.
Note that the main script for videos (process_video.py
) includes a number of optional steps like post smoothing and visualization. These can be turned off using the parameters outlined at the section Running the code.
Config. 1 | Config. 2 | Time unit | |
---|---|---|---|
Face detection† | 18.83 | ms per frame | |
Landmark detection† | 137.92 | ms per frame | |
Identity learning† | 95542 | 62043 | ms per video |
3D reconstruction† | 331.72 | 71.27 | ms per frame |
Smoothing | 82.84 | ms per frame | |
Production of 3D reconstruction videos | 199.35 | ms per frame | |
Production of 2D landmark videos | 32.70 | ms per frame |
†Required step
The performance of 3DI is expected to improve if one incorporates the matrix of the camera that was used to record the data into the reconstruction process. We provide a calibration procedure, outlined below. (The procedure is for a camera of a MacBook Pro 21, you may replace the strings macbookpro21
with your camera model wherever applicable.)
- Print the checkerboard pattern in the following link to an letter- or A4-sized paper: https://raw.githubusercontent.com/opencv/opencv/4.x/doc/pattern.png (the pattern is also in this repository at
./build/models/cameras/pattern.png
) - Tape the printed checkerboard pattern to a sturdy surface, like a clipboard. Make sure that the paper is taped tightly and neatly.
- Record a video of yourself while holding the chechkerboard pattern at various angles and distances. An example video can be found in the following link: https://www.youtube.com/watch?v=z0nQGeVJS3s
- Make sure that you move slowly to minimize motion blur
- Try to keep the checkerboard pattern within the frame at all times
- Try to have a video of 2-3 minutes --- not much shorter or longer
- Create the directory
build/calibration/videos
:mkdir build/calibration/videos
- Copy your video inside the folder
build/calibration/videos
. In the rest of this tutorial, we assume that the video file is at the following location:build/calibration/videos/macbookpro21.mp4
- Go to the
build
directory:cd build
- Create directories through the following comments:
mkdir calibration/videos/frames
mkdir calibration/videos/frames/macbookpro21
- Create frames from the video by running the following
ffmpeg
command:ffmpeg -i calibration/videos/macbookpro21.mp4 -r 1 calibration/videos/frames/macbookpro21/frame%04d.png
- Manually inspect the images in the directory
calibration/videos/frames/macbookpro21/
. Remove any images where there is too much motion blur or the checkerboard is not fully in the frame. - Now we will do the calibration.
- Make sure that the you successfully completed the installation by following the steps in the Installation section
- Run the following command from the
build
directory:./calibrate_camera "calibration/videos/frames/macbookpro21/frame*.png" "models/cameras/macbookpro21.txt"
- The code above can take a few minutes
- The code may need a GUI to run, but this can easily be fixed (you can comment out the
imread
's within thecamera.h
file that cause the need for GUI and re-compile the code)
- If the code runs successfully, your calibration file must be located at
build/models/cameras/macbookpro21.txt
.
Once you obtained the calibration matrix, it can be incorporated into the reconstruction process by using two additional command line arguments, namely camera_param
and undistort
. The former argument be set to the camera calibration file and the latter to 1
, as shown below:
python process_video.py ./output testdata/elaine.mp4 --camera_param=./models/cameras/macbookpro21.txt --undistort=1
(This command is for illustration only, since the elaine.mp4
video was clearly not recorded with a MacBook Pro.)