Seamless Scene Segmentation dataset format

This is our standardized dataset format for panoptic segmentation. Scripts to convert from specific datasets to the common format are located in the scripts/data_preparation folder.

Folder structure

dataset_root
|- img
   |- [image_id1].{jpg|png}
   |- [image_id2].{jpg|png}
   ...
|- msk
   |- [image_id1].png
   |- [image_id2].png
   ...
|- lst
   |- [split1].txt
   |- [split2].txt
   ...
|- coco
   |- [split1].json
   |- [split2].json
   ...
metadata.bin

img: original RGB images, stored either as jpg or png
msk: panoptic segmentation masks, stored as 16 bit grayscale png
lst: dataset splits, stored as txt files containing lists of image_ids (one per line)
coco: annotations in COCO format
metadata.bin: metadata file, described below

Metadata format

metadata.bin is a binarized dictionary, encoded using umsgpack, which contains meta-data about the images and the dataset itself. Its structure is as follows:

{
  "images" : [
    {
      "id": "image_id",
      "size": (height, width),
      "cat": [255, cat_id_of_seg_id1, cat_id_of_seg_id2, ...],
      "iscrowd": [1, seg_id1_is_crowd, seg_id2_is_crowd, ...]
    },
    ...
  ],
  "meta": {
    "categories": ["cat1", "cat2", ...],
    "num_stuff": #stuff_categories,
    "num_thing": #thing_categories,
    "palette": [[r1, g1, b1], [r2, g2, b2], ...],
    "original_ids": [original_cat_id1, original_cat_id2, ...]
  }
}

Data encoding

The panoptic segmentation masks contain, for each pixel, the seg_id of the segment that pixel belongs to. These ids uniquely identify each segment in a particular image, being it an instance or a stuff area. The cat_id of the category a segment belongs to can be recovered from the metadata as: metadata[image_id]["cat"][seg_id]. Segment ids are contiguous integers in the range [0, #segments_in_the_image], with 0 always denoting the void areas. Category ids are contiguous integers in the set {0, 1, ..., #categories - 1, 255}, with 255 denoting void, {0, ..., #stuff_categories - 1} denoting the "stuff" categories and {#stuff_categories, ..., #categories - 1} denoting the "thing" categories. Finally, metadata[image_id]["iscrowd"][seg_id] = 1 for segments that correspond to "crowd" or "group" regions, i.e. regions belonging to a "thing" category where instances are not clearly separable, 0 otherwise.

The meta section of metadata.bin mainly contains information about the categories:

categories: original names of the categories
num_stuff, num_thing: number of "stuff" and "thing" categories, respectively
palette: default palette mapping from cat_id to RGB values
original_ids: original category ids before remapping to the seamseg format

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seamless Scene Segmentation dataset format

Folder structure

Metadata format

Data encoding

Clone this wiki locally