Skip to content

jgai284/P4P-Speech2Gesture

Repository files navigation

P4P-Speech2Gesture

Introduction of P4P

Unique improvements

TBD

Dataset

PATS and Repo

Set up environment

Python version

3.8 - 3.11

e.g 3.11.9

pycasper

Windows

  1. Open cmd as admin
  2. Navigate to P4P-Speech2Gesture
  3. mkdir ..\pycasper
  4. git clone https://github.com/chahuja/pycasper ..\pycasper
  5. cd src
  6. Delete existing pycasper folder (if needed)
  7. mklink /D pycasper ..\..\pycasper\pycasper

Dependencies

  • pip install -r requirements.txt

ffmpeg

  1. download ffmpeg
  2. unzip the packgae
  3. add the directory there .exe files are stored to system path
  4. check ffmpeg installation ffmpeg --version

Virtual environment

Windows

  1. Create new virtual environment: python -m venv .venv
  2. Activate virtual environment: .venv\Scripts\activate
  3. Exit virtual environment: deactivate

Linux

  1. Create new virtual environment: python3 -m venv .venv
  2. Activate virtual environment: source .venv/bin/activate
  3. Exit virtual environment: deactivate

Training

Windows:

array input arguments need to be specified manually in argsUtils.py

python src/train.py -path2data '<path_to_dataset>' -path2outdata '<path_to_dataset>' -batch_size 32 -cpk speech2gesture -early_stopping 0 -exp 1 -fs_new '[15, 15]' -gan 1 -loss L1Loss -model Speech2Gesture_G -note speech2gesture -num_epochs 100 -overfit 0 -render 0 -save_dir save/speech2gesture/oliver -stop_thresh 3 -tb 1 -window_hop 5

Linux:

python src/train.py -path2data '<path_to_dataset>' -path2outdata '<path_to_dataset>' -batch_size 32 -cpk speech2gesture -early_stopping 0 -exp 1 -fs_new '[15, 15]' -gan 1 -input_modalities '["audio/log_mel_400"]' -loss L1Loss -modalities '["pose/data", "audio/log_mel_400"]' -model Speech2Gesture_G -note speech2gesture -num_epochs 100 -overfit 0 -render 0 -save_dir save/speech2gesture/oliver -speaker '["oliver"]' -stop_thresh 3 -tb 1 -window_hop 5

Quantitative evaluation (Optional)

Produce evaluation in respect to quantitative metrics outlined below (Can be done during training process)

cd src
python sample.py -load "<path_to_weight>" -path2data "<path_to_dataset>"

Evaluation metrics

L1 Loss

PCK

F1

FID

W1 (W1_vel & W1_acc)

IS

Rendering

Generate pose animation

cd src
python render.py -render 20 -load "<path_to_weight>" -render_text 0 -path2data "<path_to_dataset>"

Codebase introduction

gan.py

  • Focuses on the GAN framework, including the training loop, loss calculations, and integration of generator and discriminator for the training process

speech2gesture.py

  • Defines the specific architectures of the generator and discriminator models used within the GAN framework

Dependencies

scipy:

  • pip install scipy==1.10.1

webrtcvad:

Bug report

style_classifier.py - line 19

  • Change default num of speaker from input to 25 if pretrained model (trainer.py - line 406) is activated

dataUtils.py - line 138

  • Update pandas data structure operation

dataUtils.py - line 277

  • Convert byte strings in missing_intervals to regular strings

trainer.py - line 62

  • Have to manually replace speaker arugment if pretrained model (trainer.py - line 406) is activated otherwise it would become shelly

trainer.py - line 406

  • Deactivate pretrained model since we do not have a pretrained speech2gesture model

speech2gesture.py - line 30

  • Added missing field kwargs

argUtils.py - Dataset Parameters (not a bug actually)

  • Updated some parameters to ensure compatibility issue with Windows, dataset storage, data type (originally designed for Linux)

Lab workstation setup

Instructions

Reference

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •