Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate a longer video #36

Closed
PMPBinZhang opened this issue Mar 13, 2024 · 6 comments
Closed

generate a longer video #36

PMPBinZhang opened this issue Mar 13, 2024 · 6 comments

Comments

@PMPBinZhang
Copy link

Thanks for your wonderful work, but I have a question here, I modify the inference_1024_v1.0.yaml file, changing the video_length parameter from 16 to 32, and I run the gradio_app.py, I got the error: " size mismatch for image_proj_model.latents: copying a param with shape torch.Size([1, 256, 1024]) from checkpoint, the shape in current model is torch.Size([1, 512, 1024]).", can you show me how to generate longer video?

@Doubiiu
Copy link
Owner

Doubiiu commented Mar 14, 2024

Hi. You can modify temporal_length=32 in inference_1024_v1.0.yaml and keep other parameters unchanged.
However, this will make the model perform differently between training and inference stages due to the model configuration and model design (fixed length of image context queries), resulting in degradation of motions. I show some examples here (inference time: 150s, Peak memory: 24G on a single A100 GPU):

time-lapse_of_a_blooming_flower_with_lea.mp4
a_robot_is_walking_through_a_destroyed_c.mp4
pouring_beer_into_a_glass_of_ice_and_bee.mp4

@PMPBinZhang
Copy link
Author

Very grateful for your replay, I'll try!!!

@Caixy1113
Copy link

Hi Doubiiu,i follow your instruction and change the unet_config.params.temporal_length to 32 in inference_512_v1.0.yaml, but it seems no work.

@Doubiiu
Copy link
Owner

Doubiiu commented Jun 19, 2024

Hi Doubiiu,i follow your instruction and change the unet_config.params.temporal_length to 32 in inference_512_v1.0.yaml, but it seems no work.

Hi Sorry for the late reply. This is for the gradio demo. Have you solved the problem?

@machengcheng2016
Copy link

I think we should change video_length from 16 to 32, instead of temporal_length? Because I think temporal_length is about the max_relative_position, while video_length is about n_frames as well as noise_shape.

@binbinlan
Copy link

Hi. You can modify temporal_length=32 in inference_1024_v1.0.yaml and keep other parameters unchanged. However, this will make the model perform differently between training and inference stages due to the model configuration and model design (fixed length of image context queries), resulting in degradation of motions. I show some examples here (inference time: 150s, Peak memory: 24G on a single A100 GPU):

time-lapse_of_a_blooming_flower_with_lea.mp4
a_robot_is_walking_through_a_destroyed_c.mp4
pouring_beer_into_a_glass_of_ice_and_bee.mp4

Hello,Doubiiu, a really good work. I am sure I set temporal_length=32 in inference_1024_v1.0.yaml in gradio demo, but the output still remains 2 seconds instead of 4 seconds. I also tried to change the video_length=32, but it generated error size mismatch for image_proj_model.latents: copying a param with shape torch.Size([1, 256, 1024]) from checkpoint, the shape in current model is torch.Size([1, 512, 1024]). So how can I generate a longer videos like you do? please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants