Omnidirectional Image Quality Captioning: A Large-scale Database and a New Model

Jiebin Yan¹, Ziwen Tan¹, Yuming Fang¹, Junjie Chen¹, Wenhui Jiang¹, and Zhou Wang².

¹ School of Computing and Artificial Intelligence, Jiangxi University of Finance and Economics

² Department of Electrical and Computer Engineering, University of Waterloo

🍀News:

February 21, 2025: The arXiv version of our paper is released: https://arxiv.org/abs/2502.15271.
January 27, 2025: Our paper is accepted by IEEE T-IP!
October 6, 2024: We upload the OIQ-10K database and the IQCaption360 code.

🌱OIQ-10K Database

Introduction

OIQ-10K database contains 10,000 omnidirectional images with homogeneous and heterogeneous distortion, which demonstrated by four distortion situation: no perceptibly distorted region (2,500), one distorted region (2,508), two distorted regions (2,508), and global distortion (2,484). MOS is 1~3.

Visualization of omnidirectional images with different distorted regions in the proposed OIQ-10K database. The distortion region(s) of the visual examples in (b) and (c) are marked in red for better visual presentation.

Establish Details

Situation	Description	Coarse stage	Refine technique	Refinement stage
CnoDist	No perceptibly distorted region	3903=1001 (OIQA databases) +2498 (Flickr)+404 (Pixexid)	Deduplication, database shaping technique	2500
CdistR1	One distorted region	3096=258x4x3 (extend from JUFE)	Manual selection	2508=209x4x3
CdistR2	Two distorted regions	3096=258x4x3 (extend from JUFE)	Manual selection	2508=209x4x3
CdistGl	Global distortion	2484=2071 (OIQA databases)+237 (Flickr)+176 (Pixexid)	All save	2484
Total	-	12,579	-	10,000

CnoDist: no perceptibly distorted region, CdistR1: one distorted region, CdistR2: two distorted regions, CdistGl: global distortion

Distortion Composition

Situation	Distortion type	Distortion level	Number	Remark
CnoDist	invalid	invalid	2,500	source: JUFE (115), CVIQ (10), OIQA (10), Salient360! (60), Xu2021 (436), NBU-SOID (7), LIVE 3D VR IQA (12), Pixexid (150), Flickr (1,700)
CdistR1/CdistR2	Gaussian noise	1~3	627/627	source: extend from JUFE
	Gaussian blur	1~3	627/627	source: extend from JUFE
	Stitching	1~3	627/627	source: extend from JUFE
	brightness discontinuity	1~3	627/627	source: extend from JUFE
CdistGl	Compression	-	1,436	include: JPEG compression (590), JPEG2000 compression (212), AVC compression (176), HEVC compression (383), VP9 compression (75)
	Gaussian noise	-	248	source: LIVE 3D VR (75), OIQA (78), Flickr (95)
	Gaussian blur	-	155	source: LIVE 3D VR (75), OIQA (80)
	Stitching	1~5	75	source: LIVE 3D VR
	downsampling	1~5	75	source: LIVE 3D VR
	JPEG XT and TM	-	319	source: NBU-HOID
	Authentic distortion	invalid	176	source: Pixexid

The MOS and more detailed information about images see the file OIQ-10K_data_info.csv

Database Download

Click here: https://pan.baidu.com/s/1Uy0AR9B2oCAIJuLCuZEtLg (pass: jvga) to downloard the OIQ-10K database.

🎯IQCaption360 Architecture

The architecture of the proposed IQCaption. It contains four parts: (a) backbone, (b) adaptive feature aggregation module, (c) distortion range prediction network, and (d) quality score prediction network.

Textual Output

The Caption360 can output a caption to represent the perceptual quality of omnidirectional images.

Example1

Example2

Example3

Example4

👀Usage

Install

Clone this repo:

git clone https://github.com/WenJuing/IQCaption360
cd IQCaption360

Create an Anaconda environment with natten 0.14.6

Inference one Image

CUDA_VISIBLE_DEVICES=0 python inference_one_image.py --load_ckpt_path /home/xxxy/tzw/IQCaption360/ckpt/IQCaption_OIQ-10K.pth --test_img_path /home/xxxy/tzw/databases/viewports_8/ref2.jpg

Download weights [ google drive | baidu cloud (password: jeop) ] pretrained on OIQ-10K database
test_img offers a group of processed omnidirectional images

Train and Test

Edit config.py for configuration

Train

CUDA_VISIBLE_DEVICES=0 python train.py

or

sh run.sh

Test

CUDA_VISIBLE_DEVICES=0 python test.py

Citation

@article{yan2025omnidirectional,
  title={Omnidirectional image quality captioning: A large-scale database and a new model},
  author={Yan, Jiebin and Tan, Ziwen and Fang, Yuming and Chen, Junjie and Jiang, Wenhui and Wang, Zhou},
  journal={IEEE Transactions on Image Processing},
  year={2025},
  volume={34},
  pages={1326-1339},
}

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
imgs		imgs
test_img		test_img
IQCaption360.py		IQCaption360.py
README.md		README.md
config.py		config.py
inference_one_image.py		inference_one_image.py
my_dataset.py		my_dataset.py
nat.py		nat.py
run.sh		run.sh
test.py		test.py
train.py		train.py
utils.py		utils.py
weight_methods.py		weight_methods.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Omnidirectional Image Quality Captioning: A Large-scale Database and a New Model

🍀News:

🌱OIQ-10K Database

Introduction

Establish Details

Distortion Composition

Database Download

🎯IQCaption360 Architecture

Textual Output

👀Usage

Install

Inference one Image

Train and Test

Citation

About

Releases

Packages

Languages

WenJuing/IQCaption360

Folders and files

Latest commit

History

Repository files navigation

Omnidirectional Image Quality Captioning: A Large-scale Database and a New Model

🍀News:

🌱OIQ-10K Database

Introduction

Establish Details

Distortion Composition

Database Download

🎯IQCaption360 Architecture

Textual Output

👀Usage

Install

Inference one Image

Train and Test

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages