High-Definition (HD) map construction is essential for autonomous driving to accurately understand the surrounding environment. In this paper, we propose a Tightly Coupled temporal fusion Map Network (TICMapNet). TICMapNet breaks down the fusion process into three sub-problems: PV feature alignment, BEV feature adjustment, and Query feature fusion. By doing so, we effectively integrate temporal information at different stages through three plug-and-play modules, using the proposed tightly coupled strategy. Our approach does not rely on camera extrinsic parameters, offering a new perspective for addressing the visual fusion challenge in the field of object detection. Experimental results demonstrate that TICMapNet significantly enhances the single-frame baseline and achieves impressive performance across multiple datasets.
Method | Backbone | PV2BEV | BEVDeocder | Lr Schd | mAP | Config | Download |
---|---|---|---|---|---|---|---|
ours_2 | R50 | GKT | DQ | 24ep | 59.0 | config | model |
Method | Backbone | PV2BEV | BEVDeocder | Lr Schd | mAP | Config | Download |
---|---|---|---|---|---|---|---|
ours_1 | R50 | GKT | VA | 10ep | 61.7 | config | model |
ours_2 | R50 | GKT | DQ | 10ep | 60.6 | config | model |
Method | Backbone | PV2BEV | BEVDeocder | Lr Schd | mAP | Config | Download |
---|---|---|---|---|---|---|---|
ours_2[1] | R50 | GKT | DQ | 24ep | 28.3 | config | model |
ours_2[2] | R50 | GKT | DQ | 24ep | 32.9 | config | model |
Method | Backbone | PV2BEV | BEVDeocder | Lr Schd | mAP | Config | Download |
---|---|---|---|---|---|---|---|
ours_2 | R50 | GKT | DQ | 24ep | 57.4 | config | model |
Method | Backbone | PV2BEV | BEVDeocder | Lr Schd | mAP | Config | Download |
---|---|---|---|---|---|---|---|
ours_2 | R50 | GKT | VA | 10ep | 59.7 | config | model |
Historical frames | Backbone | PV2BEV | BEVDeocder | Lr Schd | mAP | Config | Download |
---|---|---|---|---|---|---|---|
1 | R50 | GKT | DQ | 24ep | 59.0 | config | model |
2 | R50 | GKT | DQ | 24ep | 60.1 | config | model |
3 | R50 | GKT | DQ | 24ep | 61.3 | config | model |
Notes:
ours_1 employs MapTR as a single-frame baseline, and ours_2 introduces Decoupled Query based on ours_1.
[1]A. Lilja, J. Fu, E. Stenborg, and L. Hammarstrand, "Localization is all you evaluate: Data leakage in online mapping datasets and how to fix it," in CVPR 2024, pp. 22150–22159.
[2]T. Yuan, Y. Liu, Y. Wang, Y. Wang and H. Zhao, "StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map Construction," in WACV 2024, pp. 7341-7350.
TICMapNet is based on MapTR. It is also greatly inspired by the following outstanding contributions to the open-source community:BEVFormer, StreamMapNet,BEVFusion,GKT,mmdetection3d.