Suqi Song1†, Chenxu Zhang1†, Peng Zhang1, Pengkun Li2, Fenglong Song3, Lei Zhang1*
1Chongqing University, 2Huawei Technologies Co., Ltd., 3Huawei Noah's Ark Lab
Urban waterlogging poses a major risk to public safety and infrastructure. Conventional methods using water-level sensors need high-maintenance to hardly achieve full coverage. Recent advances employ surveillance camera imagery and deep learning for detection, yet these struggle amidst scarce data and adverse environmental conditions. In this paper, we establish a challenging Urban Waterlogging Benchmark (UW-Bench) under diverse adverse conditions to advance real-world applications. We propose a Large-Small Model co-adapter paradigm (LSM-adapter), which harnesses the substantial generic segmentation potential of large model and the specific task-directed guidance of small model. Specifically, a Triple-S Prompt Adapter module alongside a Dynamic Prompt Combiner are proposed to generate then merge multiple prompts for mask decoder adaptation. Meanwhile, a Histogram Equalization Adap-ter module is designed to infuse the image specific information for image encoder adaptation. Results and analysis show the challenge and superiority of our developed benchmark and algorithm.
- We propose an innovative large-small model co-adapter paradigm (LSM-adapter), aiming at achieving win-win regime. In order to learn a robust prompter, a Triple-S prompt adapter (TSP-Adapt) with a dynamic prompt combiner is formulated, enabling a success on adaptation. We pioneer the use of vision foundation model i.e., SAM for urban waterlogging detection, providing new insights for future research.
The proposed Large-Small Model Co-adapter Paradigm, which include a histogram equalization adapter, a triple-S prompt adapter and a dynamic prompt combiner. All components except the image encoder of SAM are trained for prompt generation, learning and adaptation, toward adverse waterlogging detection.
- Details of the proposed HE-Adapt and Semantic Prompter
The proposed histogram equalization adapter module mainly consists of a histogram equalization, a high-frequency filter and MLP blocks. Given that the features of water are not pronounced in most challenging scenarios, we first conduct histogram equalization operation to highlight the contrast and texture of input image. %which can enhance the of water, and make the boundaries more distinct. The enhanced image is then passed through a high-frequency filter to extract high-frequency information beneficial for segmentation, and converted into frequency patch embedding. The image embedding of large model contains rich semantic information. Therefore, we propose a prototype learning-based semantic prompter, which leverages useful foreground features from large model to generate semantic prompts.
- One-stage and Two-stage training strategies
Two training strategies are proposed to explore suitable joint training of models with diverse architectures.
Training and testing examples in the developed UW-Bench. For objectively evaluating the capability of the model in real-world applications, we consider both general-sample and hard-sample cases in test set.
- Please note that the training set (Baidu Drive) | Google Drive) was collected and labeled by LiVE group of Chongqing University and the test set was provided by Huawei.
- Dataset Password: Sign the Dataset Access Agreement and send it to one of the following e-mail addresses for a password. (songsuqi@stu.cqu.edu.cn/zhangchenxu@cqu.edu.cn/leizhang@cqu.edu.cn)
- Users of this benchmark: Zhejiang University, Nanjing University
@inproceedings{
song2024lsmadapter,
title={Urban Waterlogging Detection: A Challenging Benchmark and Large-Small Model Co-Adapter},
author={Suqi Song and Chenxu Zhang and Peng Zhang and Pengkun Li and Fenglong Song and Lei Zhang},
journal = {ECCV},
issue_date = {2024}
}