Skip to content

Seamless Scene Segmentation dataset format

Lorenzo Porzi edited this page Jul 24, 2019 · 2 revisions

This is our standardized dataset format for panoptic segmentation. Scripts to convert from specific datasets to the common format are located in the scripts/data_preparation folder.

Folder structure

dataset_root
|- img
   |- [image_id1].{jpg|png}
   |- [image_id2].{jpg|png}
   ...
|- msk
   |- [image_id1].png
   |- [image_id2].png
   ...
|- lst
   |- [split1].txt
   |- [split2].txt
   ...
|- coco
   |- [split1].json
   |- [split2].json
   ...
metadata.bin
  • img: original RGB images, stored either as jpg or png
  • msk: panoptic segmentation masks, stored as 16 bit grayscale png
  • lst: dataset splits, stored as txt files containing lists of image_ids (one per line)
  • coco: annotations in COCO format
  • metadata.bin: metadata file, described below

Metadata format

metadata.bin is a binarized dictionary, encoded using umsgpack, which contains meta-data about the images and the dataset itself. Its structure is as follows:

{
  "images" : [
    {
      "id": "image_id",
      "size": (height, width),
      "cat": [255, cat_id_of_seg_id1, cat_id_of_seg_id2, ...],
      "iscrowd": [1, seg_id1_is_crowd, seg_id2_is_crowd, ...]
    },
    ...
  ],
  "meta": {
    "categories": ["cat1", "cat2", ...],
    "num_stuff": #stuff_categories,
    "num_thing": #thing_categories,
    "palette": [[r1, g1, b1], [r2, g2, b2], ...],
    "original_ids": [original_cat_id1, original_cat_id2, ...]
  }
}

Data encoding

The panoptic segmentation masks contain, for each pixel, the seg_id of the segment that pixel belongs to. These ids uniquely identify each segment in a particular image, being it an instance or a stuff area. The cat_id of the category a segment belongs to can be recovered from the metadata as: metadata[image_id]["cat"][seg_id]. Segment ids are contiguous integers in the range [0, #segments_in_the_image], with 0 always denoting the void areas. Category ids are contiguous integers in the set {0, 1, ..., #categories - 1, 255}, with 255 denoting void, {0, ..., #stuff_categories - 1} denoting the "stuff" categories and {#stuff_categories, ..., #categories - 1} denoting the "thing" categories. Finally, metadata[image_id]["iscrowd"][seg_id] = 1 for segments that correspond to "crowd" or "group" regions, i.e. regions belonging to a "thing" category where instances are not clearly separable, 0 otherwise.

The meta section of metadata.bin mainly contains information about the categories:

  • categories: original names of the categories
  • num_stuff, num_thing: number of "stuff" and "thing" categories, respectively
  • palette: default palette mapping from cat_id to RGB values
  • original_ids: original category ids before remapping to the seamseg format
Clone this wiki locally