Skip to content

hechang25/MVSD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

MVSD: Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion

Audio style transfer under visual guidance has been made significant progress with the emergence of cross-modal generation. Nevertheless, simultaneously recording large-scale audio pairs at both the source and receiving ends presents a formidable challenge. What makes matters worse, existing methods treat each task independently, overlooking the inverse correlation between some dual tasks, which hinders the ability to leverage massive unlabeled data. In this paper, we introduce MVSD, a diffusion model-based mutual learning mechanism. MVSD exploits the intrinsic reciprocity between visual acoustic matching (VAM) and dereverberation, enabling learning from symmetric tasks and overcome the scarcity of data. More specifically, MVSD employs two converters: one for VAM called reverberator and another for dereverberation called dereverberator. The dereverberator judges whether the reverberation audio generated by reverberator sounds like being in the conditional visual scenario, and vice versa. By forming a closed loop, these two converters can generate informative feedback signals which can optimize the inverse tasks, even with easily acquired one-way unpaired data. Furthermore, we employ the diffusion model as foundational conditional generators to circumvent the training instability and over-smoothing drawbacks of conventional GAN architectures. Extensive experiments exhibit that our framework can improve the performance of each task and better match specified visual scenarios. In both tasks, MVSD surpasses competitors on two standard benchmarks. Remarkably, the performance of the models can be further enhanced by adding unpaired data.

Visual Acoustic Matching (VAM)

SoundSpaces


image

          Source              GT    Image2Reverb            Avatir            MVSD

image

          Source              GT    Image2Reverb            Avatir            MVSD

image

          Source              GT    Image2Reverb            Avatir            MVSD

AVSpeech

          Source              GT    Image2Reverb            Avatir            MVSD

          Source              GT    Image2Reverb            Avatir            MVSD

Dereverbation


image

          Source              GT    MetricGAN            VIDA            MVSD

image

          Source              GT    MetricGAN            VIDA            MVSD

image

          Source              GT    MetricGAN            VIDA            MVSD

Citation

Please consider citing our paper if it helps your research.

@inproceedings{ma2024mutual,
  title={Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion},
  author={Ma, Jian and Wang, Wenguan and Yang, Yi and Zheng, Feng},
  booktitle={ECCV},
  year={2024}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published