F-ViTA

Code for the paper: F-ViTA: Foundation Model Guided Visible to Thermal Translation

Abstract

Thermal imaging is crucial for scene understanding, particularly in low-light and nighttime conditions. However, collecting large thermal datasets is costly and labor-intensive due to the specialized equipment required for infrared image capture. To address this challenge, researchers have explored visible-to-thermal image translation. Most existing methods rely on Generative Adversarial Networks (GANs) or Diffusion Models (DMs), treating the task as a style transfer problem. As a result, these approaches attempt to learn both the modality distribution shift and underlying physical principles from limited training data. In this paper, we propose F-ViTA, a novel approach that leverages the general world knowledge embedded in foundation models to guide the diffusion process for improved translation. Specifically, we condition an InstructPix2Pix Diffusion Model with zero-shot masks and labels from foundation models such as SAM and Grounded DINO. This allows the model to learn meaningful correlations between scene objects and their thermal signatures in infrared imagery. Extensive experiments on five public datasets demonstrate that F-ViTA outperforms state-of-the-art (SOTA) methods. Furthermore, our model generalizes well to out-of-distribution (OOD) scenarios and can generate Long-Wave Infrared (LWIR), Mid-Wave Infrared (MWIR), and Near-Infrared (NIR) translations from the same visible image.

Data preparation

For training on custom datasets, structure the data in the following format:

data_root
|---> train
      |---> Vis
            |---> img1.png
            |---> img2.png
            ...
      |---> Ir
            |---> img1.png
            |---> img2.png
            ...
|---> val
      |---> Vis
            |---> img1.png
            |---> img2.png
            ...
      |---> Ir
            |---> img1.png
            |---> img2.png
            ...

Here, Vis represents the visible image folder and Ir represents the corresponding thermal image folder

After this, add the dataset to the list of accepted datasets in finetune_instruct_pix2pix.py (line 855 onwards). Please follow the existing examples and add an additional conditional statement to add your dataset.

Checkpoint Preparation

clone the Grounded SAM folder from IDEA-Research

git clone https://github.com/IDEA-Research/Grounded-Segment-Anything.git

Download these checkpoints and paste them in the Grounded-Segment-Anything folder

Feel free to use other versions of these foundation models.

F-ViTA Checkpoints:

Training

Dev env setup

conda env create -f gsam.yml
conda activate gsam

Launching training

Make necessary changes in the train_scrip.sh files including name of the output directory, dataset id and any other hyperparameters if required.

bash train_script.sh

Inference

python inference_gsam.py <checkpoint-path> <save-name> <dataset-name>

An example is shown in the inference_gsam.sh

Acknowledgements

Thanks to the amazing work by Tim Brooks and IDEA-Research. Our work is built atop these repositories.

Citation

To be Added

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Grounded-Segment-Anything		Grounded-Segment-Anything
data_preparation		data_preparation
resources		resources
README.md		README.md
dataset.py		dataset.py
finetune_instruct_pix2pix.py		finetune_instruct_pix2pix.py
gsam.yml		gsam.yml
gsampipeline.py		gsampipeline.py
index.html		index.html
inference_gsam.py		inference_gsam.py
inference_gsam.sh		inference_gsam.sh
train_instruct_pix2pix.py		train_instruct_pix2pix.py
train_script.sh		train_script.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

F-ViTA

Table of contents

Abstract

Data preparation

Checkpoint Preparation

F-ViTA Checkpoints:

Training

Dev env setup

Launching training

Inference

Acknowledgements

Citation

About

Releases

Packages

Languages

JayParanjape/F-ViTA

Folders and files

Latest commit

History

Repository files navigation

F-ViTA

Table of contents

Abstract

Data preparation

Checkpoint Preparation

F-ViTA Checkpoints:

Training

Dev env setup

Launching training

Inference

Acknowledgements

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages