Introduction of P4P
TBD
3.8
- 3.11
e.g 3.11.9
Windows
- Open cmd as admin
- Navigate to
P4P-Speech2Gesture
mkdir ..\pycasper
git clone https://github.com/chahuja/pycasper ..\pycasper
cd src
- Delete existing
pycasper
folder (if needed) mklink /D pycasper ..\..\pycasper\pycasper
pip install -r requirements.txt
- download ffmpeg
- unzip the packgae
- add the directory there .exe files are stored to system path
- check ffmpeg installation
ffmpeg --version
- Create new virtual environment:
python -m venv .venv
- Activate virtual environment:
.venv\Scripts\activate
- Exit virtual environment:
deactivate
- Create new virtual environment:
python3 -m venv .venv
- Activate virtual environment:
source .venv/bin/activate
- Exit virtual environment:
deactivate
Windows:
array input arguments need to be specified manually in argsUtils.py
python src/train.py -path2data '<path_to_dataset>' -path2outdata '<path_to_dataset>' -batch_size 32 -cpk speech2gesture -early_stopping 0 -exp 1 -fs_new '[15, 15]' -gan 1 -loss L1Loss -model Speech2Gesture_G -note speech2gesture -num_epochs 100 -overfit 0 -render 0 -save_dir save/speech2gesture/oliver -stop_thresh 3 -tb 1 -window_hop 5
Linux:
python src/train.py -path2data '<path_to_dataset>' -path2outdata '<path_to_dataset>' -batch_size 32 -cpk speech2gesture -early_stopping 0 -exp 1 -fs_new '[15, 15]' -gan 1 -input_modalities '["audio/log_mel_400"]' -loss L1Loss -modalities '["pose/data", "audio/log_mel_400"]' -model Speech2Gesture_G -note speech2gesture -num_epochs 100 -overfit 0 -render 0 -save_dir save/speech2gesture/oliver -speaker '["oliver"]' -stop_thresh 3 -tb 1 -window_hop 5
Produce evaluation in respect to quantitative metrics outlined below (Can be done during training process)
cd src
python sample.py -load "<path_to_weight>" -path2data "<path_to_dataset>"
L1 Loss
PCK
F1
FID
W1 (W1_vel & W1_acc)
IS
Generate pose animation
cd src
python render.py -render 20 -load "<path_to_weight>" -render_text 0 -path2data "<path_to_dataset>"
- Focuses on the GAN framework, including the training loop, loss calculations, and integration of generator and discriminator for the training process
- Defines the specific architectures of the generator and discriminator models used within the GAN framework
pip install scipy==1.10.1
- Install Microsoft C++ build tool on Windows
style_classifier.py - line 19
- Change default num of speaker from input to 25 if pretrained model (trainer.py - line 406) is activated
dataUtils.py - line 138
- Update pandas data structure operation
dataUtils.py - line 277
- Convert byte strings in missing_intervals to regular strings
trainer.py - line 62
- Have to manually replace speaker arugment if pretrained model (trainer.py - line 406) is activated otherwise it would become shelly
trainer.py - line 406
- Deactivate pretrained model since we do not have a pretrained speech2gesture model
speech2gesture.py - line 30
- Added missing field
kwargs
argUtils.py - Dataset Parameters (not a bug actually)
- Updated some parameters to ensure compatibility issue with Windows, dataset storage, data type (originally designed for Linux)