Skip to content

postech-ami/SoundBrush

Repository files navigation

SoundBrush (AAAI 2025)

This repository contains a pytorch implementation for the AAAI 2025 paper, SoundBrush: Sound as a Brush for Visual Scene Editing. SoundBrush can manipulate scenes to reflect the mood of the input audio or to insert sounding objects while preserving the original content.

teaser

Getting started

This code was developed on Ubuntu 18.04 with Python 3.8, CUDA 11.7 and PyTorch 2.0.1. Later versions should work, but have not been tested.

Installation

Create and activate a virtual environment to work in:

conda create --n soundbrush python=3.8
conda activate soundbrush

Install the requirements with pip and PyTorch. For CUDA 11.7, this would look like:

pip insall -r requirements.txt
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2

Download models

To run SoundBrush, you need to download the pretrained model. Download pretrained model.

After downloading the models, place them in ./checkpoints.

./checkpoints/model.ckpt

Demo

Run below command to inference the model. We provide sample images and audios in ./source_images and ./source_wavs, respectively. The edited images will be saved in ./outputs

python edit_inference.py --audio_dir <audio directory> --img_dir <ikmg dir> --save_dir <output dir>

#or simply run

sh inference.sh

Agreement

  • The SoundBrush dataset is provided for non-commercial research purposes only.
  • All wavfile and images of the SoundBrush dataset are sourced from the Internet and do not belong to our institutions. Our institutions do not take responsibility for the content or the meaning of these videos.
  • You agree not to reproduce, duplicate, copy, sell, trade, resell, or exploit any portion of the videos and any portion of derived data for commercial purposes.
  • You agree not to further copy, publish, or distribute any portion of the SoundBrush dataset. Except, it is allowed to make copies of the dataset for internal use at a single site within the same organization.

MultiTalk Dataset

TOBEUPLOADED

Training and testing

TOBEUPLOADED

Notes

@inproceedings{soundbrush,
  title     = {SoundBrush: Sound as a Brush for Visual Scene Editing},
  author    = {Sung-Bin, Kim and Jun-Seong, Kim and Ko, Junseok and Kim, Yewon and Oh, Tae-Hyun},
  booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
  year      = {2025},
}

Acknowledgement

We heavily borrow the code from InstructPix2Pix and ImageBind and the dataset from VGGSound, and the agreement statement from CelebV-HQ. We sincerely appreciate those authors.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published