Name		Name	Last commit message	Last commit date
parent directory ..
data		data
docs		docs
eval		eval
util		util
.gitignore		.gitignore
README.md		README.md
engine_train.py		engine_train.py
main_train.py		main_train.py
models_painter.py		models_painter.py
requirements.txt		requirements.txt
train_painter_vit_large.sh		train_painter_vit_large.sh

README.md

Images Speak in Images:
A Generalist Painter for In-Context Visual Learning

Xinlong Wang^1*, Wen Wang^1,2*, Yue Cao^1*, Chunhua Shen², Tiejun Huang^1,3

¹BAAI, ²ZJU, ³PKU

CVPR 2023

We present Painter, a generalist model using an "image"-centric solution for in-context visual learning, that is, to redefine the output of core vision tasks as images, and specify task prompts as also images. With this idea, our training process is extremely simple, which performs standard masked image modeling on the stitch of input and output image pairs. This makes the model capable of performing tasks conditioned on visible image patches. Thus, during inference, we can adopt a pair of input and output images from the same task as the input condition, to indicate which task to perform. Examples of in-context inference are illustrated in the figure above, consisting of seven in-domain examples (seven rows at top) and three out-of-domain examples (three rows at bottom). Without bells and whistles, our generalist Painter can achieve competitive performance compared to well-established task-specific models, on seven representative vision tasks ranging from high-level visual understanding to low-level image processing. In addition, Painter significantly outperforms recent generalist models on several challenging tasks.

[Paper]

Hightlights

$\color{#2F6EBA}{Images\ Speak\ in\ Images}$

image as the general-purpose interface
redefine the output spaces of vision tasks as images

$\color{#2F6EBA}{A\ Generalist\ Painter}$

given an input image, prediction is to inpaint the desired but missing output "image"
excellent performance on 7 representative vision tasks with a single generalist model

$\color{#2F6EBA}{In{-}Context\ Visual\ Learning}$

automatically perform vision tasks according to the input task prompts
even the tasks do not exist in the training data

Installation

See installation instructions.

Data

See data instructions.

We also provide a toy training dataset, with 10 samples from each required datasets. You can put it in $Painter_ROOT/toy_datasets and set DATA_PATH=toy_datasets in $Painter_ROOT/train_painter_vit_large.sh for toy experiments.

Training

Download pre-trained MAE ViT-Large model from here and update path/to/mae_pretrain_vit_large.pth in $Painter_ROOT/train_painter_vit_large.sh.

We use 8 nodes (total_bsz = 8x8x32 = 2048) for training:

bash train_painter_vit_large.sh

Evaluation

See evaluation instructions.

A pre-trained Painter is available at 🤗 Hugging Face Models. The results on various tasks are summarized below:

depth estimation			semantic seg.	panoptic seg.	keypoint det.	denoising		deraining		enhance.
NYU v2			ADE20k	COCO 2017	COCO 2017	SIDD		5 datasets		LoL
RMSE	A.Rel	d1	mIoU	PQ	AP	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
0.288	0.080	0.950	49.9	43.4	72.1	38.66	0.954	29.42	0.867	22.34	0.872

Citation

@article{Painter,
  title={Images Speak in Images: A Generalist Painter for In-Context Visual Learning},
  author={Wang, Xinlong and Wang, Wen and Cao, Yue and Shen, Chunhua and Huang, Tiejun},
  journal={arXiv preprint arXiv:2212.02499},
  year={2022}
}

Acknowledgement

MAE, BEiT, detectron2, Mask2Former, bts, mmcv, mmdetetection, mmpose, MIRNet, MPRNet, and Uformer.

Contact

We are hiring at all levels at BAAI Vision Team, including full-time researchers, engineers and interns. If you are interested in working with us on foundation model, visual perception and multimodal learning, please contact Xinlong Wang (wangxinlong@baai.ac.cn) and Yue Cao (caoyue@baai.ac.cn).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Painter

Painter

README.md

Images Speak in Images:
A Generalist Painter for In-Context Visual Learning

Hightlights

$\color{#2F6EBA}{Images\ Speak\ in\ Images}$

$\color{#2F6EBA}{A\ Generalist\ Painter}$

$\color{#2F6EBA}{In{-}Context\ Visual\ Learning}$

Installation

Data

Training

Evaluation

Citation

Acknowledgement

Contact

Files

Painter

Directory actions

More options

Directory actions

More options

Latest commit

History

Painter

Folders and files

parent directory

README.md

Images Speak in Images: A Generalist Painter for In-Context Visual Learning

Hightlights

$\color{#2F6EBA}{Images\ Speak\ in\ Images}$

$\color{#2F6EBA}{A\ Generalist\ Painter}$

$\color{#2F6EBA}{In{-}Context\ Visual\ Learning}$

Installation

Data

Training

Evaluation

Citation

Acknowledgement

Contact

Images Speak in Images:
A Generalist Painter for In-Context Visual Learning