Official Repo of the Project - FAC: 3D Pretraining via Foreground Aware Feature Contrast
Constructing informative contrast pairs matters in contrastive learning: Conventional contrast requires strict point-level correspondence. The proposed method FAC takes both foreground grouping and foreground-background distinction cues into account, thus forming better contrast pairs to learn more informative and discriminative 3D feature representations.
The framework of this work is illustrated above, basically we construct better contrast pairs for more informative constrative feature learning to enhance downstream performance. In other words, grouping and foreground-background distinction for better contrastive feature learning.
Visualizations of projected point correlation maps over the indoor ScanNet (1st-4th rows) and the outdoor KITTI (5th-8th rows) with respect to the query points highlighted by yellow crosses. The View 1 and View 2 in each sample show the intra-view and cross-view correlations, respectively. We compare FAC with the state-of-the-art CSC on instance segmentation (rows 1-4) and ProposalContrast on detection (rows 5-8). FAC clearly captures better feature correlations within and across views (columns 3-4).
This work presents a general and simple framework to tackle 3D point cloud pre-training for the base model of the downstream detection and segmentation tasks. Contrastive learning has recently demonstrated great potential for unsupervised pre-training in 3D scene understanding tasks. However, most existing work randomly selects point features as anchors while building contrast, leading to a clear bias toward background points that often dominate in 3D scenes. Also, object awareness and foreground-to-background discrimination are neglected, making contrastive learning less effective. To tackle these issues, we propose a general foreground-aware feature contrast (FAC) framework to learn more effective point cloud representations in pre-training. FAC consists of two novel contrast designs to construct more effective and informative contrast pairs. The first is building positive pairs within the same foreground segment where points tend to have the same semantics. The second is that we prevent over-discrimination between 3D segments/objects and encourage foreground-to-background distinctions at the segment level with adaptive feature learning in a Siamese correspondence network, which adaptively learns feature correlations within and across point cloud views effectively. Visualization with point activation maps shows that our contrast pairs capture clear correspondences among foreground regions during pre-training. Quantitative experiments also show that FAC achieves superior knowledge transfer and data efficiency in various downstream 3D semantic segmentation and object detection tasks.
For the task of 3D Object Detection, please refer to FAC_Det.
For the task of 3D Semantic Segmentation, please refer to FAC_Sem_Seg.
For the task of 3D Instance Segmentation, please refer to FAC_Ins_Seg.
For the pre-training setup in detection and segmentation, please refer to Downstreams.
Please refer to INSTALL.md for the installation of OpenPCDet
.
Our codebase of 3D object detection is based on OpenPCDet.
OpenPCDet
is a clear, simple, self-contained open source project for LiDAR-based 3D object detection.
It is also the official code release of [PointRCNN]
, [Part-A^2 net]
, [PV-RCNN]
and [Voxel R-CNN]
.
- Support both one-stage and two-stage 3D object detection frameworks
- Support distributed training & testing with multiple GPUs and multiple machines
- Support multiple heads on different scales to detect different classes
- Support stacked version set abstraction to encode various number of points in different scenes
- Support Adaptive Training Sample Selection (ATSS) for target assignment
- Support RoI-aware point cloud pooling & RoI-grid point cloud pooling
- Support GPU version 3D IoU calculation and rotated NMS
Selected supported methods are shown in the below table. Here we provide the pretrained models which achieves State of the art 3D detection performance on the val set of KITTI dataset.
- All models are trained with 4 RTX 2080 Ti GPUs and are available for download.
- The training time is measured with 4 2080 Ti GPUs and PyTorch 1.5.
Data Efficient Learning with 3% labels
training time | Car@R11 | Pedestrian@R11 | Cyclist@R11 | download | |
---|---|---|---|---|---|
PointPillar | ~2.66 hours | 66.57 | 46.28 | 52.89 | model_PointPillar |
SECOND | ~2.75 hours | 69.92 | 43.86 | 56.98 | model_SECOND |
SECOND-IoU | - | 68.76 | 45.74 | 57.88 | model_SECOND-IoU |
PointRCNN | ~5.67 hours | 64.91 | 46.89 | 62.73 | model_PointRCNN |
PointRCNN-IoU | ~6.12 hours | 67.81 | 47.28 | 60.65 | model_PointRCNN-IoU |
Part-A^2-Free | ~5.98 hours | 66.56 | 58.26 | 63.87 | model_Part-A^2-Free |
Part-A^2-Anchor | ~7.87 hours | 68.87 | 51.22 | 57.89 | model_Part-A^2-Anchor |
PV-RCNN | ~8.78 hours | 75.12 | 47.32 | 59.98 | model_PV-RCNN |
Voxel R-CNN (Car) | ~3.87 hours | 75.87 | - | - | model_Voxel_R-CNN |
CaDDN | ~19.83 hours | 19.21 | 11.23 | 9.09 | model_CaDDN |
We provide the setting of DATA_CONFIG.SAMPLED_INTERVAL
on the Waymo Open Dataset (WOD) to subsample partial samples for training and evaluation,
so you could also play with WOD by setting a smaller DATA_CONFIG.SAMPLED_INTERVAL
even if you only have limited GPU resources.
By default, all models are trained with 3% data (~4.8k frames) of all the training samples on 4 2080 Ti GPUs, and the results of each cell here are mAP/mAPH calculated by the official Waymo evaluation metrics on the whole validation set (version 1.2).
Vec_L1 | Vec_L2 | Ped_L1 | Ped_L2 | Cyc_L1 | Cyc_L2 | |
---|---|---|---|---|---|---|
SECOND | 57.15/66.38 | 49.99/48.79 | 51.65/42.65 | 41.64/36.64 | 47.12/45.22 | 42.65/42.64 |
Part-A^2-Anchor | 60.33/59.12 | 54.77/53.18 | 53.65/43.87 | 44.43/39.65 | 55.86/54.65 | 51.54/51.77 |
PV-RCNN | 59.06/55.38 | 55.67/63.38 | 54.23/43.76 | 44.89/38.28 | 53.15/50.94 | 49.87/48.69 |
We could not provide the above pretrained models due to Waymo Dataset License Agreement, you could easily achieve similar performance by training with the default configs.
More datasets are on the way.
Please refer to INSTALL.md for the installation of OpenPCDet
.
Please refer to DEMO.md for a quick demo to test with a pretrained model and visualize the predicted results on your custom data or the original KITTI data.
Please refer to GETTING_STARTED.md to learn more usage about this project.
The FAC
is released under the MIT license.
For Questions regarding the 3D point cloud smenatic segmentation, 3D instance segmentation, and 3D object detction, and the methodology of our FAC-(Foreground_Aware_Contrast), please contact through email (kcliuntu@gmail.com or kcliu@gmail.com).
Please cite our work if you find our work useful:
@article{liu2023fac,
title={FAC: 3D Representation Learning via Foreground Aware Feature Contrast},
author={Liu, Kangcheng* and Xiao, Aoran and Zhang, Xiaoqin and Lu, Shijian and Shao, Ling},
journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023)},
year={2023}
}
If you find our work helpful, please feel free to give a star to this repo.