Large Language Models are Frame-level Directors for Zero-shot Text-to-Video Generation

🎥 Zero-shot Video Demo

DirecT2V.mp4

📰 Abstract

In the paradigm of AI-generated content (AIGC), there has been increasing attention in extending pre-trained text-to-image (T2I) models to text-to-video (T2V) generation. Despite their effectiveness, these frameworks face challenges in maintaining consistent narratives and handling rapid shifts in scene composition or object placement from a single user prompt. This paper introduces a new framework, dubbed DirecT2V, which leverages instruction-tuned large language models (LLMs) to generate frame-by-frame descriptions from a single abstract user prompt. DirecT2V utilizes LLM directors to divide user inputs into separate prompts for each frame, enabling the inclusion of time-varying content and facilitating consistent video generation. To maintain temporal consistency and prevent object collapse, we propose a novel value mapping method and dual-softmax filtering. Extensive experimental results validate the effectiveness of the DirecT2V framework in producing visually coherent and consistent videos from abstract user prompts, addressing the challenges of zero-shot video generation. The code and demo will be publicly availble.

Overall pipeline of DirecT2V. Our framework consists of two parts: directing an abstract user prompt with an LLM director (GPT-4) and video generation with a modified T2I diffusion (Stable Diffusion).

🗃️: Code

The running code can be found in run_direct2v.py. We used PyTorch 1.13.0 and Diffusers 1.19.3.

python run_direct2v.py

🔥 TODOs

Upload code
Implement a demo using the ChatGPT API
Improve efficiency

Cite As

@article{hong2023large,
  title={Large Language Models are Frame-level Directors for Zero-shot Text-to-Video Generation},
  author={Hong, Susung and Seo, Junyoung and Hong, Sunghwan and Shin, Heeseong and Kim, Seungryong},
  journal={arXiv preprint arXiv:2305.14330},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
LICENSE		LICENSE
README.md		README.md
direct2v_zeroshot.py		direct2v_zeroshot.py
run_direct2v.py		run_direct2v.py
video_ours.mp4		video_ours.mp4
video_t2vz.mp4		video_t2vz.mp4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Large Language Models are Frame-level Directors for Zero-shot Text-to-Video Generation

🎥 Zero-shot Video Demo

📰 Abstract

🗃️: Code

🔥 TODOs

Cite As

About

Releases

Packages

Contributors 2

Languages

License

cvlab-kaist/DirecT2V

Folders and files

Latest commit

History

Repository files navigation

Large Language Models are Frame-level Directors for Zero-shot Text-to-Video Generation

🎥 Zero-shot Video Demo

📰 Abstract

🗃️: Code

🔥 TODOs

Cite As

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages