GitHub

PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering in Pituitary Surgery

[`arXiv`]	[`Paper`]	[`Colab Demo`]

The International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2024

PitVQA Net

PitVQA Dataset

Our PitVQA dataset comprises 25 videos of endoscopic pituitary surgeries from the The National Hospital of Neurology and Neurosurgery in London, United Kingdom. All videos were annotated for the surgical phases, steps, instruments present and operation notes guided by a standardised annotation framework, which was derived from a preceding international consensus study on pituitary surgery workflow [16]. Annotation was performed collaboratively by 2 neurosurgical residents with operative pituitary experience and checked by an attending neurosurgeon. We extracted image frames from each video at 1 fps and removed any frames that were blurred or occluded. Ultimately, we obtained a total of 109,173 frames, with the videos of minimum and maximum length yielding 2,443 and 7,179 frames, respectively. We acquired frame-wise question-answer pairs for all the categories of the annotation. Overall, there are 884,242 question-answer pairs from 109,173 frames, which is around 8 pairs for each frame. There are 59 classes overall, including 4 phases, 15 steps, 18 instruments, 3 variations of instruments present in a frame, 5 positions of the instruments, and 14 operation notes in the annotation classes.

How to Download PitVQA Dataset

Please download full PitVQA dataset from UCL RDR portal. The original videos were taken and preprocessed from MICCAI PitVis challenge

Training Command:

For EndoVis18-VQA dataset:

python main.py --dataset=endo18 --epochs=60 --batch_size=64 --lr=0.00002

For PitVQA dataset:

python main.py --dataset=pit24 --epochs=60 --batch_size=64 --lr=0.00002

Acknowledgement

The implementation of PitVQA relies on resources from BLIP, Huggingface Transformers, timm and our previous work SurgicalGPT. We thank the original authors for their open-sourcing.

Citation

If you use this code for your research, please cite our paper.

@inproceedings{he2024pitvqa,
  title={PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering in Pituitary Surgery},
  author={He, Runlong and Xu, Mengya and Das, Adrito and Z. Khan, Danyal and Bano, Sophia and J. Marcus, Hani and Stoyanov, Danail and J. Clarkson, Matthew and Islam, Mobarakol},
  booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)},
  pages={},
  year={2024},
  organization={}
}

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
PitVQA_dataset		PitVQA_dataset
assets		assets
PitVQANet_endo18_demo.ipynb		PitVQANet_endo18_demo.ipynb
PitVQANet_pit24_demo.ipynb		PitVQANet_pit24_demo.ipynb
README.md		README.md
dataloader.py		dataloader.py
main.py		main.py
model.py		model.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering in Pituitary Surgery

The International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2024

PitVQA Net

PitVQA Dataset

How to Download PitVQA Dataset

Training Command:

Acknowledgement

Citation

About

Releases

Packages

Contributors 2

Languages

mobarakol/PitVQA

Folders and files

Latest commit

History

Repository files navigation

PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering in Pituitary Surgery

The International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2024

PitVQA Net

PitVQA Dataset

How to Download PitVQA Dataset

Training Command:

Acknowledgement

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages