Jiebin Yan1, Ziwen Tan1, Yuming Fang1, Junjie Chen1, Wenhui Jiang1, and Zhou Wang2.
1 School of Computing and Artificial Intelligence, Jiangxi University of Finance and Economics
2 Department of Electrical and Computer Engineering, University of Waterloo
-
February 21, 2025: The arXiv version of our paper is released: https://arxiv.org/abs/2502.15271.
-
January 27, 2025: Our paper is accepted by IEEE T-IP!
-
October 6, 2024: We upload the OIQ-10K database and the IQCaption360 code.
OIQ-10K database contains 10,000 omnidirectional images with homogeneous and heterogeneous distortion, which demonstrated by four distortion situation: no perceptibly distorted region (2,500), one distorted region (2,508), two distorted regions (2,508), and global distortion (2,484). MOS is 1~3.
Visualization of omnidirectional images with different distorted regions in the proposed OIQ-10K database. The distortion region(s) of the visual examples in (b) and (c) are marked in red for better visual presentation.
Situation | Description | Coarse stage | Refine technique | Refinement stage |
---|---|---|---|---|
CnoDist | No perceptibly distorted region | 3903=1001 (OIQA databases) +2498 (Flickr)+404 (Pixexid) | Deduplication, database shaping technique | 2500 |
CdistR1 | One distorted region | 3096=258x4x3 (extend from JUFE) | Manual selection | 2508=209x4x3 |
CdistR2 | Two distorted regions | 3096=258x4x3 (extend from JUFE) | Manual selection | 2508=209x4x3 |
CdistGl | Global distortion | 2484=2071 (OIQA databases)+237 (Flickr)+176 (Pixexid) | All save | 2484 |
Total | - | 12,579 | - | 10,000 |
- CnoDist: no perceptibly distorted region, CdistR1: one distorted region, CdistR2: two distorted regions, CdistGl: global distortion
Situation | Distortion type | Distortion level | Number | Remark |
---|---|---|---|---|
CnoDist | invalid | invalid | 2,500 | source: JUFE (115), CVIQ (10), OIQA (10), Salient360! (60), Xu2021 (436), NBU-SOID (7), LIVE 3D VR IQA (12), Pixexid (150), Flickr (1,700) |
CdistR1/CdistR2 | Gaussian noise | 1~3 | 627/627 | source: extend from JUFE |
Gaussian blur | 1~3 | 627/627 | source: extend from JUFE | |
Stitching | 1~3 | 627/627 | source: extend from JUFE | |
brightness discontinuity | 1~3 | 627/627 | source: extend from JUFE | |
CdistGl | Compression | - | 1,436 | include: JPEG compression (590), JPEG2000 compression (212), AVC compression (176), HEVC compression (383), VP9 compression (75) |
Gaussian noise | - | 248 | source: LIVE 3D VR (75), OIQA (78), Flickr (95) | |
Gaussian blur | - | 155 | source: LIVE 3D VR (75), OIQA (80) | |
Stitching | 1~5 | 75 | source: LIVE 3D VR | |
downsampling | 1~5 | 75 | source: LIVE 3D VR | |
JPEG XT and TM | - | 319 | source: NBU-HOID | |
Authentic distortion | invalid | 176 | source: Pixexid |
- The MOS and more detailed information about images see the file OIQ-10K_data_info.csv
Click here: https://pan.baidu.com/s/1Uy0AR9B2oCAIJuLCuZEtLg (pass: jvga) to downloard the OIQ-10K database.
The architecture of the proposed IQCaption. It contains four parts: (a) backbone, (b) adaptive feature aggregation module, (c) distortion range prediction network, and (d) quality score prediction network.The Caption360 can output a caption to represent the perceptual quality of omnidirectional images.
- Example1
- Example2
- Example3
- Example4
- Clone this repo:
git clone https://github.com/WenJuing/IQCaption360
cd IQCaption360
- Create an Anaconda environment with natten 0.14.6
CUDA_VISIBLE_DEVICES=0 python inference_one_image.py --load_ckpt_path /home/xxxy/tzw/IQCaption360/ckpt/IQCaption_OIQ-10K.pth --test_img_path /home/xxxy/tzw/databases/viewports_8/ref2.jpg
- Download
weights
[ google drive | baidu cloud (password: jeop) ] pretrained on OIQ-10K database test_img
offers a group of processed omnidirectional images
Edit config.py
for configuration
- Train
CUDA_VISIBLE_DEVICES=0 python train.py
or
sh run.sh
- Test
CUDA_VISIBLE_DEVICES=0 python test.py
@article{yan2025omnidirectional,
title={Omnidirectional image quality captioning: A large-scale database and a new model},
author={Yan, Jiebin and Tan, Ziwen and Fang, Yuming and Chen, Junjie and Jiang, Wenhui and Wang, Zhou},
journal={IEEE Transactions on Image Processing},
year={2025},
volume={34},
pages={1326-1339},
}