P4P-Speech2Gesture

Introduction of P4P

Unique improvements

TBD

Dataset

PATS and Repo

Set up environment

Python version

3.8 - 3.11

e.g 3.11.9

pycasper

Windows

Open cmd as admin
Navigate to P4P-Speech2Gesture
mkdir ..\pycasper
git clone https://github.com/chahuja/pycasper ..\pycasper
cd src
Delete existing pycasper folder (if needed)
mklink /D pycasper ..\..\pycasper\pycasper

Dependencies

pip install -r requirements.txt

ffmpeg

download ffmpeg
unzip the packgae
add the directory there .exe files are stored to system path
check ffmpeg installation ffmpeg --version

Virtual environment

Windows

Create new virtual environment: python -m venv .venv
Activate virtual environment: .venv\Scripts\activate
Exit virtual environment: deactivate

Linux

Create new virtual environment: python3 -m venv .venv
Activate virtual environment: source .venv/bin/activate
Exit virtual environment: deactivate

Training

Windows:

array input arguments need to be specified manually in argsUtils.py

python src/train.py -path2data '<path_to_dataset>' -path2outdata '<path_to_dataset>' -batch_size 32 -cpk speech2gesture -early_stopping 0 -exp 1 -fs_new '[15, 15]' -gan 1 -loss L1Loss -model Speech2Gesture_G -note speech2gesture -num_epochs 100 -overfit 0 -render 0 -save_dir save/speech2gesture/oliver -stop_thresh 3 -tb 1 -window_hop 5

Linux:

python src/train.py -path2data '<path_to_dataset>' -path2outdata '<path_to_dataset>' -batch_size 32 -cpk speech2gesture -early_stopping 0 -exp 1 -fs_new '[15, 15]' -gan 1 -input_modalities '["audio/log_mel_400"]' -loss L1Loss -modalities '["pose/data", "audio/log_mel_400"]' -model Speech2Gesture_G -note speech2gesture -num_epochs 100 -overfit 0 -render 0 -save_dir save/speech2gesture/oliver -speaker '["oliver"]' -stop_thresh 3 -tb 1 -window_hop 5

Quantitative evaluation (Optional)

Produce evaluation in respect to quantitative metrics outlined below (Can be done during training process)

cd src

python sample.py -load "<path_to_weight>" -path2data "<path_to_dataset>"

Evaluation metrics

L1 Loss

PCK

F1

FID

W1 (W1_vel & W1_acc)

IS

Rendering

Generate pose animation

cd src

python render.py -render 20 -load "<path_to_weight>" -render_text 0 -path2data "<path_to_dataset>"

Codebase introduction

gan.py

Focuses on the GAN framework, including the training loop, loss calculations, and integration of generator and discriminator for the training process

speech2gesture.py

Defines the specific architectures of the generator and discriminator models used within the GAN framework

Dependencies

scipy:

pip install scipy==1.10.1

webrtcvad:

Install Microsoft C++ build tool on Windows

Bug report

style_classifier.py - line 19

Change default num of speaker from input to 25 if pretrained model (trainer.py - line 406) is activated

dataUtils.py - line 138

Update pandas data structure operation

dataUtils.py - line 277

Convert byte strings in missing_intervals to regular strings

trainer.py - line 62

Have to manually replace speaker arugment if pretrained model (trainer.py - line 406) is activated otherwise it would become shelly

trainer.py - line 406

Deactivate pretrained model since we do not have a pretrained speech2gesture model

speech2gesture.py - line 30

Added missing field kwargs

argUtils.py - Dataset Parameters (not a bug actually)

Updated some parameters to ensure compatibility issue with Windows, dataset storage, data type (originally designed for Linux)

Lab workstation setup

Instructions

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
preprocessing		preprocessing
save/inception_score		save/inception_score
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
GPU.md		GPU.md
README.md		README.md
cmu_intervals_df.csv		cmu_intervals_df.csv
cmu_intervals_df_transforms.csv		cmu_intervals_df_transforms.csv
missing_intervals.h5		missing_intervals.h5
requirements.txt		requirements.txt

jgai284/P4P-Speech2Gesture

Folders and files

Latest commit

History

Repository files navigation

P4P-Speech2Gesture

Unique improvements

Dataset

Set up environment

Python version

pycasper

Dependencies

ffmpeg

Virtual environment

Windows

Linux

Training

Quantitative evaluation (Optional)

Evaluation metrics

Rendering

Codebase introduction

gan.py

speech2gesture.py

Dependencies

scipy:

webrtcvad:

Bug report

style_classifier.py - line 19

dataUtils.py - line 138

dataUtils.py - line 277

trainer.py - line 62

trainer.py - line 406

speech2gesture.py - line 30

argUtils.py - Dataset Parameters (not a bug actually)

Lab workstation setup

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages