Skip to content

Latest commit

 

History

History
218 lines (168 loc) · 13.5 KB

README.md

File metadata and controls

218 lines (168 loc) · 13.5 KB

SMART: Self-Supervised Multi-Modal 3D Bounding Box Annotation Error Detection Framework

3D bounding box annotation, essential for training object detection models, can be acquired by professional data annotation engineers or through auto-labeling methods. However, annotated 3D boxes may not be entirely reliable, potentially harboring various errors. Low-quality annotation has a detrimental impact on the performance of trained models. In this paper, we introduce SMART, a Self-supervised Multi-modal 3D bounding box Annotation eRror deTection framework. SMART generates pseudo-erroneous 3D boxes to create supervisory signals, and subsequently trains the error detector accordingly. This detector is proficient in integrating multi-modal data and directly regressing error scores. A novel loss function is adopted to train the error detector, allowing certain 3D bounding boxes in the initial annotation to possess higher error scores while ensuring that generated pseudo-erroneous 3D boxes do not have lower error scores. Remarkably, SMART operates without reliance on prior knowledge and can be universally applied for detecting 3D bounding boxes across various classes (i.e., open-vocabulary). Extensive experiments demonstrate the effectiveness of SMART in detecting errors within annotated 3D boxes, thereby aiding users in enhancing annotation quality.

Datasets

In the absence of an existing 3D bounding box error detection dataset, we opt to inject erroneous boxes into the KITTI and nuScenes (mini) 3D object detection datasets for evaluation. Erroneous boxes are generated using PointPillars, a 3D object detection approach, or by randomly changing the class of some boxes. There are three steps involved in generating erroneous boxes using the PointPillars method. Firstly, utilizing the PointPillars model, a 3D object detection approach, we generate initial 3D bounding boxes. Next, we proceed to remove boxes that intersect with pre-existing 3D bounding boxes in the dataset for each object class. Finally, we inject erroneous boxes by randomly selecting from the 3D bounding boxes output in the previous step.

To improve our experimental datasets’ reliability, we manually correct any inaccurate annotations within the datasets. Additionally, we reassess the injected errors to ensure their accuracy, which has resulted in reclassifying some injected boxes as correct. The basic statistics of these datasets are shown below.

KITTI

Pedestrian Cyclist Car
Total Number of Boxes 4502 1621 27198
Number of Erroneous Boxes 299 115 1125
Error Ratio(%) 6.64 7.09 4.14

The 3D bounding boxes of the KITTI dataset are stored in the 'label_add_KITTI.zip' file within this project. Each file within the 'label_add_KITTI.zip' corresponds to a sample_id. The format for each box is as follows:

Car | 0.0 | 0 |  -1.56 | 564.62 | 174.59 | 616.43 | 224.74 | 1.61 | 1.66 | 3.6 | -0.69 | 1.65 | 25.21 | -1.59 | 0.0

The values within the box represent the following:

- Car : box_class
- 0.0: Not applicable
- 0: Not applicable
- -1.56: Not applicable
- 564.62: Not applicable
- 174.59: Not applicable
- 616.43: Not applicable
- 224.74: Not applicable
- 1.61: height
- 1.66: width
- 3.6: length
- -0.69: x coordinate
- 1.65: y coordinate
- 25.21: z coordinate
- -1.59: yaw
- 0.0: label_represent_if_erroneous (0.0 indicates correct, 1.0 indicates erroneous)

The x, y, and z are under the camera coordinate system.

nuScenes

Pedestrian Bicycle Car Bus Motorcycle Trailer Truck
Total Number of Boxes 2593 121 4232 318 268 55 474
Number of Erroneous Boxes 134 6 212 16 14 2 24
Error Ratio(%) 5.17 7.09 5.01 5.03 5.22 3.64 5.06

The 3D bounding boxes of the nuScenes dataset are stored in the 'label_add_nuScenes-mini.zip' file within this project. Each file within the 'label_add_KITTI.zip' corresponds to a sample_id. The format for each box is as follows:

car | 0bdebf547fc94ee19c8d28dc36f157b7 | 0.7 | -5.6 | -1.2 | 4.795 | 2.09 | 2.0 | -0.68 | -0.01 | 0.04 | -0.72 | 0.0

The values within the box represent the following:

- Car : box_class
- 0bdebf547fc94ee19c8d28dc36f157b7: Not applicable
- 0.7: center_x
- -5.6: center_y
- -1.2: center_z
- 4.795: length
- 2.09: width
- 2.0: height
- (-0.68 -0.01 0.04 -0.72): Bounding box orientation as quaternion: w, x, y, z.
- 0.0: label_represent_if_erroneous (0.0 indicates correct, 1.0 indicates erroneous)

The x, y, and z are under the Lidar coordinate system.

Experiment results

We conduct each experiment five times and present the average and standard deviation of results as mean_±_std.

Experiment results on KITTI dataset.

Pedestrian Pedestrian Cyclist Cyclist Car Car
F-Score(%) AP(%) F-Score(%) AP(%) F-Score(%) AP(%) F-Score(%) AP(%) mF-Score(%) mAP(%)
VAE 8.8±0.0 4.7±0.0 44.6±1.5 44.4±1.5 27.5±1.2 21.1±0.5 19.6±0.3 15.5±0.2 30.6±0.9 27.0±0.6
Deep-SVDD 8.8±0.0 4.6±0.0 39.3±0.8 38.0±0.6 47.6±1.4 39.9±1.0 16.6±0.8 9.3±0.4 34.5±1.0 29.1±0.6
iForest 8.8±0.0 4.6±0.0 36.7±1.4 25.9±0.5 34.9±0.3 22.2±0.6 15.3±0.4 11.3±0.4 28.9±0.5 19.8±0.4
OCSVM 8.8±0.0 4.7±0.0 50.8±2.1 52.4±2.2 44.0±1.9 49.6±0.8 18.7±0.5 14.4±0.5 37.8±1.3 38.8±0.9
ECOD 8.8±0.0 4.6±0.0 31.4±0.9 19.9±0.4 34.5±1.0 23.6±0.4 14.0±0.3 8.8±0.5 26.6±0.6 17.4±0.3
LUNAR 8.8±0.0 4.6±0.0 40.8±1.2 36.2±1.3 46.8±1.1 33.6±0.9 31.8±0.9 22.4±0.6 39.8±0.9 30.8±0.7
SMART 77.8±0.9 84.7±1.4 78.7±2.9 86.5±2.5 78.8±1.4 86.2±2.3 79.5±0.6 86.0±0.6 79.0±1.4 86.3±1.1

Experiment results on nuScenes dataset.

Pedestrian Pedestrian Bicycle Bicycle Car Car Truck Truck Trailer Trailer Bus Bus Motorcycle Motorcycle
F-Score(%) AP(%) F-Score(%) AP(%) F-Score(%) AP(%) F-Score(%) AP(%) F-Score(%) AP(%) F-Score(%) AP(%) F-Score(%) AP(%) F-Score(%) AP(%) mF-Score(%) mAP(%)
VAE 9.6±0.0 5.1±0.0 13.5±0.1 7.1±0.1 11.1±0.0 5.8±0.1 11.8±0.1 5.9±0.0 9.9±0.0 4.9±0.0 7.1±0.0 2.8±0.0 9.7±0.0 3.9±0.0 21.4±0.9 8.1±0.4 12.1±0.1 5.5±0.0
Deep-SVDD 9.6±0.0 5.0±0.0 16.4±0.1 9.1±0.1 25.0±3.2 15.2±3.1 10.6±0.0 5.6±0.0 13.5±0.1 10.3±0.0 59.7±10.8 66.7±12.0 12.2±0.2 6.0±0.1 10.5±0.3 6.2±0.3 21.1±2.1 17.0±2.3
iForest 9.6±0.0 5.1±0.0 14.4±0.2 12.9±0.1 29.6±3.8 28.5±4.2 12.2±0.0 6.8±0.0 12.0±0.0 5.7±0.0 76.0±14.2 83.3±15.8 14.5±0.1 7.3±0.1 11.9±0.2 5.7±0.2 24.4±2.5 21.5±2.6
OCSVM 9.6±0.0 5.1±0.0 12.7±0.1 9.3±0.1 28.6±3.1 27.8±4.1 14.0±0.1 8.7±0.0 12.5±0.0 5.9±0.0 56.7±11.4 70.0±12.9 23.3±0.4 10.5±0.2 13.8±0.3 6.3±0.1 23.1±1.9 19.8±2.1
ECOD 9.6±0.0 5.1±0.0 13.3±0.2 10.2±0.1 27.3±2.4 15.9±1.1 10.7±0.1 6.4±0.0 11.1±0.0 6.1±0.0 42.4±10.0 41.7±9.4 11.1±0.0 5.9±0.0 15.2±0.2 6.1±0.3 18.7±1.6 13.2±1.2
LUNAR 9.6±0.0 5.1±0.0 15.4±0.1 11.7±0.1 31.6±2.1 28.5±2.1 16.7±0.1 10.0±0.1 17.3±0.1 9.1±0.1 79.8±15.3 81.1±12.8 22.2±0.2 13.2±0.2 12.1±0.2 6.1±0.2 27.9±3.1 22.8±2.5
SMART* 56.5±0.9 55.9±0.3 64.2±1.7 59.5±1.9 85.7±0.0 75.0±0.0 48.2±0.6 47.4±0.8 85.6±0.9 76.0±1.4 100.0±0.0 100.0±0.0 100.0±0.0 100.0±0.0 94.9±1.9 97.3±0.8 82.7±0.2 79.3±0.2

Using Our Code to Reproduce the Results

KITTI

  1. Create conda environment.
 conda install --yes --file requirements.txt # You may need to downgrade the torch using pip to match the CUDA version
  1. Download the KITTI 3D object dataset.
  1. Select a directory named YOUR_DATASET_DIR and extract the training subset of the downloaded data from step 2 into this directory. Also, unzip the 'label_add_KITTI.zip' file located in this project folder into YOUR_DATASET_DIR.
   ├── YOUR_DATASET_DIR                        
   │ ├── calib      <- data in 'data_object_calib.zip/training/calib'
   │ ├── image      <- data in 'data_object_image_2.zip/training/image_2' 
   │ ├── label_add  <- data in 'label_add_KITTI.zip' 
   │ └── velodyne   <- data in 'data_object_velodyne.zip/training/velodyne'
  1. Set dataPath in cfg_kitti.yaml as YOUR_DATASET_DIR.
  2. Run the data preprocessing code to preprocess the data and generate pseudo-erroneous 3D bounding boxes.
    python data_preprocessing_kitti.py
  1. Train the model and then use it to detect erroneous boxes in the 'label_add'.
    python streamline.py --cfg_file cfg_kitti.yaml
  1. (Optional): Utilize the trained model for 30 epochs to replicate our results. Download the model_30.pth and save it to the './checkpoints_kitti' directory. Then perform error detection.
    python detect.py --cfg_file cfg_kitti.yaml --bg 30

nuScenes

  1. Create conda environment.
 conda install --yes --file requirements.txt # You may need to downgrade the torch using pip to match the CUDA version
  1. Download the nuScenes (mini) dataset.

  2. Select a directory named YOUR_DATASET_DIR and unzip the downloaded data from step 2 into this directory. Also, unzip the 'label_add_nuScenes-mini.zip' file located in this project folder into YOUR_DATASET_DIR.

   ├── YOUR_DATASET_DIR      
   │ ├── label_add  <- data in 'label_add_KITTI.zip'                   
   │ ├── lidarseg      
   │ ├── maps      
   │ ├── panoptic      
   │ ├── samples      
   │ ├── sweeps      
   │ └── v1.0-mini   
  1. Set dataPath in cfg_nuscenes.yaml as YOUR_DATASET_DIR.
  2. Run the data preprocessing code to preprocess the data and generate pseudo-erroneous 3D bounding boxes.
    python nuscenes_related/data_preprocessing_nuscenes.py
  1. Train the model and then use it to detect erroneous boxes in the 'label_add'.
    python nuscenes_related/streamline.py --cfg_file cfg_nuscenes.yaml
  1. (Optional): Utilize the trained model for 35 epochs to replicate our results. Download the model_35.pth and save it to the './checkpoints_nuscenes' directory. Then perform error detection.
    python nuscenes_related/detect.py --cfg_file cfg_nuscenes.yaml --bg 35