Skip to content

mrapplexz/visualstories

Repository files navigation

VisualStories generation

Overview on Medium

Part 0

Preparation

After you add CUDA binaries to PATH and LD_LIBRARY_PATH you can install this project:

git clone git@github.com:mrapplexz/aiijc-visualstories.git
cd aiijc-visualstories
./install.sh

Part 1

Text generation

If your host doesn't provide access to huggingface hub, download pytorch_model.bin,tokenizer_config.json,vocab.json,config.json,special_tokens_map.json, merges.txt here and start with --local_model MODEL_PATH

python3_text generate_text.py --device cuda:0                             \
                              --output_filename ./output/texts/text.txt   \
                              --temperature 0.1                           \
                              --top_k 10000                               \
                              --top_p 0.95                                \
                              --repetition_penalty 5.0                    \
                              --max_length 1000                           \
                              --seed 42                                   \
                              --start "The kingdom and a princess"        \
                              --genre fairy_tale

Part 2

Аll the sub-parts of the part 2 can be executed in any order or in parallel

Part 2.1

TTS generation

Before start you need to download pretrained model LibriTTS_800000.tar here and put it to ./tts_generation/FastSpeech2/output/ckpt/LibriTTS/800000.pth.tar

If your host doesn't provide access to nltk hub, you need to install cmudict and averaged_perceptron_tagger packages manually with this instruction

python3_tts generate_tts.py --input_filename ./output/texts/text.txt       \
                            --temp_dir ./tmp                               \
                            --speaker_id 205                               \
                            --output_dir ./output/tts                      

Part 2.2

Image generation

If your host doesn't provide access to OpenAI hub, you need to download RN50.pt , ViT-B-16.pt , ViT-B-32.pt and put them to ~/.cache/clip

python3_image generate_images.py --input_filename ./output/texts/text.txt       \
                                 --devices cuda:0,cuda:1                        \
                                 --main_dir ./output/frames                     

Part 2.3

Music generation

If your host doesn't provide access to OpenAI hub, you need to download vqvae.pth.tar, prior_level_0.pth.tar, prior_level_1.pth.tar and put them to ~/.cache/jukebox/models/5b and prior_level_2.pth.tar to ~/.cache/jukebox/models/5b_lyrics

python3_music generate_music.py --music_genre country               \
                                --artist john_denver                \
                                --save_path ./output/music          \
                                --sample_len 30

Part 3

Video generation

After music generation you will have three different musics ./output/music/item_0.wav, ./output/music/item_1.wav, ./output/music/item_2.wav, so you will need to choose one and pass it to --music_filename

python3_video generate_video.py --frame_dir ./output/frames                       \
                                --tts_dir ./output/tts                            \
                                --music_filename ./output/music/item_0.wav        \
                                --temp_dir ./tmp                                  \
                                --video_name ./output/video/video.avi             \
                                --quality 6                                       \
                                --music_corrector -3

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published