Skip to content

WenJuing/IQCaption360

Repository files navigation

Omnidirectional Image Quality Captioning: A Large-scale Database and a New Model

Jiebin Yan1, Ziwen Tan1, Yuming Fang1, Junjie Chen1, Wenhui Jiang1, and Zhou Wang2.

1 School of Computing and Artificial Intelligence, Jiangxi University of Finance and Economics

2 Department of Electrical and Computer Engineering, University of Waterloo

🍀News:

  • February 21, 2025: The arXiv version of our paper is released: https://arxiv.org/abs/2502.15271.

  • January 27, 2025: Our paper is accepted by IEEE T-IP!

  • October 6, 2024: We upload the OIQ-10K database and the IQCaption360 code.

🌱OIQ-10K Database

Introduction

OIQ-10K database contains 10,000 omnidirectional images with homogeneous and heterogeneous distortion, which demonstrated by four distortion situation: no perceptibly distorted region (2,500), one distorted region (2,508), two distorted regions (2,508), and global distortion (2,484). MOS is 1~3.

Visualization of omnidirectional images with different distorted regions in the proposed OIQ-10K database. The distortion region(s) of the visual examples in (b) and (c) are marked in red for better visual presentation.

Establish Details

Situation Description Coarse stage Refine technique Refinement stage
CnoDist No perceptibly distorted region 3903=1001 (OIQA databases) +2498 (Flickr)+404 (Pixexid) Deduplication, database shaping technique 2500
CdistR1 One distorted region 3096=258x4x3 (extend from JUFE) Manual selection 2508=209x4x3
CdistR2 Two distorted regions 3096=258x4x3 (extend from JUFE) Manual selection 2508=209x4x3
CdistGl Global distortion 2484=2071 (OIQA databases)+237 (Flickr)+176 (Pixexid) All save 2484
Total - 12,579 - 10,000
  • CnoDist: no perceptibly distorted region, CdistR1: one distorted region, CdistR2: two distorted regions, CdistGl: global distortion

Distortion Composition

Situation Distortion type Distortion level Number Remark
CnoDist invalid invalid 2,500 source: JUFE (115), CVIQ (10), OIQA (10), Salient360! (60), Xu2021 (436), NBU-SOID (7), LIVE 3D VR IQA (12), Pixexid (150), Flickr (1,700)
CdistR1/CdistR2 Gaussian noise 1~3 627/627 source: extend from JUFE
Gaussian blur 1~3 627/627 source: extend from JUFE
Stitching 1~3 627/627 source: extend from JUFE
brightness discontinuity 1~3 627/627 source: extend from JUFE
CdistGl Compression - 1,436 include: JPEG compression (590), JPEG2000 compression (212), AVC compression (176), HEVC compression (383), VP9 compression (75)
Gaussian noise - 248 source: LIVE 3D VR (75), OIQA (78), Flickr (95)
Gaussian blur - 155 source: LIVE 3D VR (75), OIQA (80)
Stitching 1~5 75 source: LIVE 3D VR
downsampling 1~5 75 source: LIVE 3D VR
JPEG XT and TM - 319 source: NBU-HOID
Authentic distortion invalid 176 source: Pixexid

Database Download

Click here: https://pan.baidu.com/s/1Uy0AR9B2oCAIJuLCuZEtLg (pass: jvga) to downloard the OIQ-10K database.

🎯IQCaption360 Architecture

The architecture of the proposed IQCaption. It contains four parts: (a) backbone, (b) adaptive feature aggregation module, (c) distortion range prediction network, and (d) quality score prediction network.

Textual Output

The Caption360 can output a caption to represent the perceptual quality of omnidirectional images.

  • Example1

  • Example2

  • Example3

  • Example4

👀Usage

Install

  1. Clone this repo:
git clone https://github.com/WenJuing/IQCaption360
cd IQCaption360
  1. Create an Anaconda environment with natten 0.14.6

Inference one Image

CUDA_VISIBLE_DEVICES=0 python inference_one_image.py --load_ckpt_path /home/xxxy/tzw/IQCaption360/ckpt/IQCaption_OIQ-10K.pth --test_img_path /home/xxxy/tzw/databases/viewports_8/ref2.jpg
  • Download weights [ google drive | baidu cloud (password: jeop) ] pretrained on OIQ-10K database
  • test_img offers a group of processed omnidirectional images

Train and Test

Edit config.py for configuration

  • Train
CUDA_VISIBLE_DEVICES=0 python train.py

or

sh run.sh
  • Test
CUDA_VISIBLE_DEVICES=0 python test.py

Citation

@article{yan2025omnidirectional,
  title={Omnidirectional image quality captioning: A large-scale database and a new model},
  author={Yan, Jiebin and Tan, Ziwen and Fang, Yuming and Chen, Junjie and Jiang, Wenhui and Wang, Zhou},
  journal={IEEE Transactions on Image Processing},
  year={2025},
  volume={34},
  pages={1326-1339},
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published