-
Notifications
You must be signed in to change notification settings - Fork 52
Seamless Scene Segmentation dataset format
This is our standardized dataset format for panoptic segmentation. Scripts to convert from specific datasets to the common format are located in the scripts/data_preparation folder.
dataset_root
|- img
|- [image_id1].{jpg|png}
|- [image_id2].{jpg|png}
...
|- msk
|- [image_id1].png
|- [image_id2].png
...
|- lst
|- [split1].txt
|- [split2].txt
...
|- coco
|- [split1].json
|- [split2].json
...
metadata.bin
-
img
: original RGB images, stored either as jpg or png -
msk
: panoptic segmentation masks, stored as 16 bit grayscale png -
lst
: dataset splits, stored as txt files containing lists of image_ids (one per line) -
coco
: annotations in COCO format -
metadata.bin
: metadata file, described below
metadata.bin
is a binarized dictionary, encoded using umsgpack
, which contains meta-data about the images and the dataset itself. Its structure is as follows:
{
"images" : [
{
"id": "image_id",
"size": (height, width),
"cat": [255, cat_id_of_seg_id1, cat_id_of_seg_id2, ...],
"iscrowd": [1, seg_id1_is_crowd, seg_id2_is_crowd, ...]
},
...
],
"meta": {
"categories": ["cat1", "cat2", ...],
"num_stuff": #stuff_categories,
"num_thing": #thing_categories,
"palette": [[r1, g1, b1], [r2, g2, b2], ...],
"original_ids": [original_cat_id1, original_cat_id2, ...]
}
}
The panoptic segmentation masks contain, for each pixel, the seg_id
of the segment that pixel belongs to.
These ids uniquely identify each segment in a particular image, being it an instance or a stuff area.
The cat_id
of the category a segment belongs to can be recovered from the metadata as: metadata[image_id]["cat"][seg_id]
.
Segment ids are contiguous integers in the range [0, #segments_in_the_image]
, with 0
always denoting the void areas.
Category ids are contiguous integers in the set {0, 1, ..., #categories - 1, 255}
, with 255
denoting void, {0, ..., #stuff_categories - 1}
denoting the "stuff" categories and {#stuff_categories, ..., #categories - 1}
denoting the "thing" categories.
Finally, metadata[image_id]["iscrowd"][seg_id] = 1
for segments that correspond to "crowd" or "group" regions, i.e. regions belonging to a "thing" category where instances are not clearly separable, 0
otherwise.
The meta
section of metadata.bin
mainly contains information about the categories:
-
categories
: original names of the categories -
num_stuff
,num_thing
: number of "stuff" and "thing" categories, respectively -
palette
: default palette mapping fromcat_id
to RGB values -
original_ids
: original category ids before remapping to theseamseg
format