This repository contains a small implementation of the code used to reproduce the results of the paper First Order Motion Model for Image Animation - Github code. FOMM can create animation by combining a, so-called, driving video and an image. This idea catching my mind since is not too hard to see many potential applications ranging across several areas of interest like e-shops.
In the intent to explore the capability of the model and learn more about the architecture and the code behind I develop Vincent Van Gogh - Says Hello.
The idea is inspired to those extraordinary Harry Potter’s moving paintings that made us dream in the childhoods and finally are possible🧙!
The FOMM model is mainly composed of two parts: the motion estimation module and the image generation module.
The first module uses a set of keypoints learned in a self-supervised way. The module tracks the motion in the driver video and in the target image. The locations of the keypoints separately predicted by an encoder-decoder network.
In the second module, a dense motion network combines the local approximations to obtain the resulting dense motion and feed the generation module renders an image of the source object moving as provided in the driving video.
The entire code is wrapped into a notebook that can be run inside Google Colab. The project recalls the original GitHub repo and the pre-trained model provided by the author to provide a quick inference over the new application. No significant changes to the code have been included to test the code as described in the paper.
Original checkpoints of the model can be found under following link: google-drive.
- Reuse the model to perform a full-body inference
- Retrain the model to extend to different categories
- Tune the model for a specific application