Introduced by Zhou et al. in Scene Parsing Through ADE20K Dataset.
The ADE20K semantic segmentation dataset contains more than 20K scene-centric images exhaustively annotated with pixel-level objects and object parts labels. There are totally 150 semantic categories, which include stuffs like sky, road, grass, and discrete objects like person, car, bed.
backbone | resolution | mIoU (ss/ms) | train speed | train time | #param | FLOPs | Config | Download |
---|---|---|---|---|---|---|---|---|
InternImage-T | 512x512 | 47.9 / 48.1 | 0.23s / iter | 10.5h | 59M | 944G | config | ckpt | log |
InternImage-S | 512x512 | 50.1 / 50.9 | 0.25s / iter | 11.5h | 80M | 1017G | config | ckpt | log |
InternImage-B | 512x512 | 50.8 / 51.3 | 0.26s / iter | 12h | 128M | 1185G | config | ckpt | log |
InternImage-L | 640x640 | 53.9 / 54.1 | 0.42s / iter | 19h | 256M | 2526G | config | ckpt | log |
InternImage-XL | 640x640 | 55.0 / 55.3 | 0.47s / iter | 22h | 368M | 3142G | config | ckpt | log |
InternImage-H | 896x896 | 59.9 / 60.3 | 0.94s / iter | 2d (2n) | 1.12B | 3566G | config | ckpt | log |
- Training speed is measured with A100 GPU.
- Please set
with_cp=True
to save memory if you meetout-of-memory
issues. - The logs are our recent newly trained ones. There are slight differences between the results in logs and our paper.