Skip to content

mobarakol/PitVQA

Repository files navigation

PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering in Pituitary Surgery


[arXiv] [Paper] [Colab Demo]

The International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2024

PitVQA Net

PitVQA Dataset

Our PitVQA dataset comprises 25 videos of endoscopic pituitary surgeries from the The National Hospital of Neurology and Neurosurgery in London, United Kingdom. All videos were annotated for the surgical phases, steps, instruments present and operation notes guided by a standardised annotation framework, which was derived from a preceding international consensus study on pituitary surgery workflow [16]. Annotation was performed collaboratively by 2 neurosurgical residents with operative pituitary experience and checked by an attending neurosurgeon. We extracted image frames from each video at 1 fps and removed any frames that were blurred or occluded. Ultimately, we obtained a total of 109,173 frames, with the videos of minimum and maximum length yielding 2,443 and 7,179 frames, respectively. We acquired frame-wise question-answer pairs for all the categories of the annotation. Overall, there are 884,242 question-answer pairs from 109,173 frames, which is around 8 pairs for each frame. There are 59 classes overall, including 4 phases, 15 steps, 18 instruments, 3 variations of instruments present in a frame, 5 positions of the instruments, and 14 operation notes in the annotation classes.

How to Download PitVQA Dataset

Please download full PitVQA dataset from UCL RDR portal. The original videos were taken and preprocessed from MICCAI PitVis challenge

Training Command:

For EndoVis18-VQA dataset:

python main.py --dataset=endo18 --epochs=60 --batch_size=64 --lr=0.00002

For PitVQA dataset:

python main.py --dataset=pit24 --epochs=60 --batch_size=64 --lr=0.00002

Acknowledgement

The implementation of PitVQA relies on resources from BLIP, Huggingface Transformers, timm and our previous work SurgicalGPT. We thank the original authors for their open-sourcing.

Citation

If you use this code for your research, please cite our paper.

@inproceedings{he2024pitvqa,
  title={PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering in Pituitary Surgery},
  author={He, Runlong and Xu, Mengya and Das, Adrito and Z. Khan, Danyal and Bano, Sophia and J. Marcus, Hani and Stoyanov, Danail and J. Clarkson, Matthew and Islam, Mobarakol},
  booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)},
  pages={},
  year={2024},
  organization={}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published