StyTips: Towards High-Quality, Efficient and Controllable Style Transfer via Transformer Filtering Prompts

Abstract

Image style transfer aims to merge the artistic characteristics of a style image with the spatial structure of a content image, maintaining the content identity and showing the desired style. It has been proven that previous works based on Convolutional Neural Networks (CNNs) lack the ability to capture global features, leading to the content leakage issue. In this work, we propose Transformer fIlter PromptS, dubbed StyTips, a novel approach for style transfer that distills essential content and injects style information via learn prompts. \textcolor{red}{To achieve efficient and high-quality stylization}, we adopt a hierarchical vision Transformer encoder to extract multi-scale features of the content and style images, learning disentangled style features of color and stroke. Inspired by the prompt learning technique, we design a fine-grained style alignment block in the decoder, involving self-attention with learnable filter tokens to distill necessary content information and cross-attention to inject styles. StyTips takes full advantage of the attention mechanism to align the color and stroke with the purified content features in appropriate proportions. Qualitative and quantitative experiments demonstrate that StyTips can prevent content leakage and generate high-quality stylized images. StyTips provides controllable style transfer with diverse ways for users to manipulate styles on fine-grained levels and to determine the strength of style, which has more practical significance to real-world applications than traditional approaches.

Datasets

In the training stage, MS-COCO which is a large-scale dataset for object detection is used as the content dataset, and WikiArt as the style dataset which contains art paintings from different artists.

Requirments

python == 3.8.13 pytorch == 1.10.0 torchvision == 0.11.0

Train

run python train.py

Examples

Acknowledgments

The codes of Swin-Transformer encoder are token from (https://github.com/microsoft/Swin-Transformer)

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
Images		Images
__pycache__		__pycache__
data		data
models		models
util		util
README.md		README.md
dataset_sampler.py		dataset_sampler.py
network.png		network.png
output		output
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StyTips: Towards High-Quality, Efficient and Controllable Style Transfer via Transformer Filtering Prompts

Abstract

Datasets

Requirments

Train

Examples

Acknowledgments

About

Releases

Packages

Languages

I2-Multimedia-Lab/StyTips

Folders and files

Latest commit

History

Repository files navigation

StyTips: Towards High-Quality, Efficient and Controllable Style Transfer via Transformer Filtering Prompts

Abstract

Datasets

Requirments

Train

Examples

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages