Project Page | Paper | Dataset
This repository contains a pytorch implementation for the AAAI 2025 paper, SoundBrush: Sound as a Brush for Visual Scene Editing. SoundBrush can manipulate scenes to reflect the mood of the input audio or to insert sounding objects while preserving the original content.
This code was developed on Ubuntu 18.04 with Python 3.8, CUDA 11.7 and PyTorch 2.0.1. Later versions should work, but have not been tested.
Create and activate a virtual environment to work in:
conda create --n soundbrush python=3.8
conda activate soundbrush
Install the requirements with pip and PyTorch. For CUDA 11.7, this would look like:
pip insall -r requirements.txt
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
To run SoundBrush, you need to download the pretrained model. Download pretrained model.
After downloading the models, place them in ./checkpoints
.
./checkpoints/model.ckpt
Run below command to inference the model. We provide sample images and audios in ./source_images and ./source_wavs, respectively. The edited images will be saved in ./outputs
python edit_inference.py --audio_dir <audio directory> --img_dir <ikmg dir> --save_dir <output dir>
#or simply run
sh inference.sh
- The SoundBrush dataset is provided for non-commercial research purposes only.
- All wavfile and images of the SoundBrush dataset are sourced from the Internet and do not belong to our institutions. Our institutions do not take responsibility for the content or the meaning of these videos.
- You agree not to reproduce, duplicate, copy, sell, trade, resell, or exploit any portion of the videos and any portion of derived data for commercial purposes.
- You agree not to further copy, publish, or distribute any portion of the SoundBrush dataset. Except, it is allowed to make copies of the dataset for internal use at a single site within the same organization.
TOBEUPLOADED
TOBEUPLOADED
@inproceedings{soundbrush,
title = {SoundBrush: Sound as a Brush for Visual Scene Editing},
author = {Sung-Bin, Kim and Jun-Seong, Kim and Ko, Junseok and Kim, Yewon and Oh, Tae-Hyun},
booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
year = {2025},
}
We heavily borrow the code from InstructPix2Pix and ImageBind and the dataset from VGGSound, and the agreement statement from CelebV-HQ. We sincerely appreciate those authors.