SoundBrush (AAAI 2025)

Project Page | Paper | Dataset

This repository contains a pytorch implementation for the AAAI 2025 paper, SoundBrush: Sound as a Brush for Visual Scene Editing. SoundBrush can manipulate scenes to reflect the mood of the input audio or to insert sounding objects while preserving the original content.

Getting started

This code was developed on Ubuntu 18.04 with Python 3.8, CUDA 11.7 and PyTorch 2.0.1. Later versions should work, but have not been tested.

Installation

Create and activate a virtual environment to work in:

conda create --n soundbrush python=3.8
conda activate soundbrush

Install the requirements with pip and PyTorch. For CUDA 11.7, this would look like:

pip insall -r requirements.txt
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2

Download models

To run SoundBrush, you need to download the pretrained model. Download pretrained model.

After downloading the models, place them in ./checkpoints.

./checkpoints/model.ckpt

Demo

Run below command to inference the model. We provide sample images and audios in ./source_images and ./source_wavs, respectively. The edited images will be saved in ./outputs

python edit_inference.py --audio_dir <audio directory> --img_dir <ikmg dir> --save_dir <output dir>

#or simply run

sh inference.sh

Agreement

The SoundBrush dataset is provided for non-commercial research purposes only.
All wavfile and images of the SoundBrush dataset are sourced from the Internet and do not belong to our institutions. Our institutions do not take responsibility for the content or the meaning of these videos.
You agree not to reproduce, duplicate, copy, sell, trade, resell, or exploit any portion of the videos and any portion of derived data for commercial purposes.
You agree not to further copy, publish, or distribute any portion of the SoundBrush dataset. Except, it is allowed to make copies of the dataset for internal use at a single site within the same organization.

MultiTalk Dataset

TOBEUPLOADED

Training and testing

TOBEUPLOADED

Notes

@inproceedings{soundbrush,
  title     = {SoundBrush: Sound as a Brush for Visual Scene Editing},
  author    = {Sung-Bin, Kim and Jun-Seong, Kim and Ko, Junseok and Kim, Yewon and Oh, Tae-Hyun},
  booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
  year      = {2025},
}

Acknowledgement

We heavily borrow the code from InstructPix2Pix and ImageBind and the dataset from VGGSound, and the agreement statement from CelebV-HQ. We sincerely appreciate those authors.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
ImageBind		ImageBind
assets		assets
configs		configs
metrics		metrics
source_images		source_images
source_wavs		source_wavs
src		src
stable_diffusion		stable_diffusion
LICENSE		LICENSE
README.md		README.md
edit_dataset.py		edit_dataset.py
edit_inference.py		edit_inference.py
filter.sh		filter.sh
inference.sh		inference.sh
main.py		main.py
requirements.txt		requirements.txt
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SoundBrush (AAAI 2025)

Project Page | Paper | Dataset

Getting started

Installation

Download models

Demo

Agreement

MultiTalk Dataset

Training and testing

Notes

Acknowledgement

About

Releases

Packages

Languages

License

postech-ami/SoundBrush

Folders and files

Latest commit

History

Repository files navigation

SoundBrush (AAAI 2025)

Project Page | Paper | Dataset

Getting started

Installation

Download models

Demo

Agreement

MultiTalk Dataset

Training and testing

Notes

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages