Multi IP-Adapter for Flux pipelines #10867

guiyrt · 2025-02-22T01:19:27Z

What does this PR do?

Fixes #10775. Adds support for multiple IP-Adapters on Flux pipelines. For testing, I tried using a single IP-Adapter with 0.5 scale, and then two equal IP-Adapters with 0.25 scale each, which should (and does) produce the same result. ~~Basic functionality is there, but I still want to clean-up some parts and add multi ip-adapter scale tests. While #10758 is not merged, I have the typing helping functions here as well.~~ All set!

Single IP-Adapter inference code

import torch
from diffusers import FluxPipeline
from diffusers.utils import load_image

pipe: FluxPipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16,
)

pipe.load_ip_adapter(
    "XLabs-AI/flux-ip-adapter",
    weight_name="ip_adapter.safetensors",
    image_encoder_pretrained_model_name_or_path="openai/clip-vit-large-patch14"
)

pipe.set_ip_adapter_scale(0.5)
pipe.enable_sequential_cpu_offload()

ip_adapter_image = load_image("https://huggingface.co/guiyrt/sample-images/resolve/main/astronaut.jpg")

image = pipe(
    width=1024,
    height=1024,
    prompt="A vintage picture of an astronaut in a starry sky",
    generator=torch.manual_seed(42),
    ip_adapter_image=ip_adapter_image
).images[0]

image.save('result_single.jpg')

Multi IP-Adapter inference code

import torch
from diffusers import FluxPipeline
from diffusers.utils import load_image

pipe: FluxPipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16,
)

pipe.load_ip_adapter(
    ["XLabs-AI/flux-ip-adapter", "XLabs-AI/flux-ip-adapter"],
    weight_name=["ip_adapter.safetensors", "ip_adapter.safetensors"],
    image_encoder_pretrained_model_name_or_path="openai/clip-vit-large-patch14"
)

pipe.set_ip_adapter_scale([0.25, 0.25])
pipe.enable_sequential_cpu_offload()

ip_adapter_image = load_image("https://huggingface.co/guiyrt/sample-images/resolve/main/astronaut.jpg")

image = pipe(
    width=1024,
    height=1024,
    prompt="A vintage picture of an astronaut in a starry sky",
    generator=torch.manual_seed(42),
    ip_adapter_image=[ip_adapter_image,ip_adapter_image]
).images[0]

image.save('result_multi.jpg')

main (sanity check)	single IP-Adapter, 0.5 scale	dual IP-Adapter, 2x 0.25 scale

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@hlky @yiyixuxu

src/diffusers/pipelines/flux/pipeline_flux.py

Co-authored-by: hlky <hlky@hlky.ac>

src/diffusers/pipelines/flux/pipeline_flux.py

Co-authored-by: hlky <hlky@hlky.ac>

guiyrt · 2025-02-24T16:12:52Z

Thanks for the review @hlky! For this PR, are there any meaningful tests you would like for me to add? I took a look at FluxIPAdapterTesterMixin, and it's not using MultiIPAdapterImageProjection, so tests for multi ip-adapter would probably need some refactor. Is it relevant atm?

hlky · 2025-02-24T16:55:47Z

FluxIPAdapterTesterMixin can be updated for multi IPAdapters with IPAdapterTesterMixin as reference.

diffusers/tests/pipelines/test_pipelines_common.py

Lines 286 to 377 in 170833c

    
               def test_ip_adapter(self, expected_max_diff: float = 1e-4, expected_pipe_slice=None): 
        
                   r"""Tests for IP-Adapter. 
        
                   The following scenarios are tested: 
        
                     - Single IP-Adapter with scale=0 should produce same output as no IP-Adapter. 
        
                     - Multi IP-Adapter with scale=0 should produce same output as no IP-Adapter. 
        
                     - Single IP-Adapter with scale!=0 should produce different output compared to no IP-Adapter. 
        
                     - Multi IP-Adapter with scale!=0 should produce different output compared to no IP-Adapter. 
        
                   """ 
        
                   # Raising the tolerance for this test when it's run on a CPU because we 
        
                   # compare against static slices and that can be shaky (with a VVVV low probability). 
        
                   expected_max_diff = 9e-4 if torch_device == "cpu" else expected_max_diff 
        
                   components = self.get_dummy_components() 
        
                   pipe = self.pipeline_class(**components).to(torch_device) 
        
                   pipe.set_progress_bar_config(disable=None) 
        
                   cross_attention_dim = pipe.unet.config.get("cross_attention_dim", 32) 
        
                   # forward pass without ip adapter 
        
                   inputs = self._modify_inputs_for_ip_adapter_test(self.get_dummy_inputs(torch_device)) 
        
                   if expected_pipe_slice is None: 
        
                       output_without_adapter = pipe(**inputs)[0] 
        
                   else: 
        
                       output_without_adapter = expected_pipe_slice 
        
                   # 1. Single IP-Adapter test cases 
        
                   adapter_state_dict = create_ip_adapter_state_dict(pipe.unet) 
        
                   pipe.unet._load_ip_adapter_weights(adapter_state_dict) 
        
                   # forward pass with single ip adapter, but scale=0 which should have no effect 
        
                   inputs = self._modify_inputs_for_ip_adapter_test(self.get_dummy_inputs(torch_device)) 
        
                   inputs["ip_adapter_image_embeds"] = [self._get_dummy_image_embeds(cross_attention_dim)] 
        
                   pipe.set_ip_adapter_scale(0.0) 
        
                   output_without_adapter_scale = pipe(**inputs)[0] 
        
                   if expected_pipe_slice is not None: 
        
                       output_without_adapter_scale = output_without_adapter_scale[0, -3:, -3:, -1].flatten() 
        
                   # forward pass with single ip adapter, but with scale of adapter weights 
        
                   inputs = self._modify_inputs_for_ip_adapter_test(self.get_dummy_inputs(torch_device)) 
        
                   inputs["ip_adapter_image_embeds"] = [self._get_dummy_image_embeds(cross_attention_dim)] 
        
                   pipe.set_ip_adapter_scale(42.0) 
        
                   output_with_adapter_scale = pipe(**inputs)[0] 
        
                   if expected_pipe_slice is not None: 
        
                       output_with_adapter_scale = output_with_adapter_scale[0, -3:, -3:, -1].flatten() 
        
                   max_diff_without_adapter_scale = np.abs(output_without_adapter_scale - output_without_adapter).max() 
        
                   max_diff_with_adapter_scale = np.abs(output_with_adapter_scale - output_without_adapter).max() 
        
                   self.assertLess( 
        
                       max_diff_without_adapter_scale, 
        
                       expected_max_diff, 
        
                       "Output without ip-adapter must be same as normal inference", 
        
                   ) 
        
                   self.assertGreater( 
        
                       max_diff_with_adapter_scale, 1e-2, "Output with ip-adapter must be different from normal inference" 
        
                   ) 
        
                   # 2. Multi IP-Adapter test cases 
        
                   adapter_state_dict_1 = create_ip_adapter_state_dict(pipe.unet) 
        
                   adapter_state_dict_2 = create_ip_adapter_state_dict(pipe.unet) 
        
                   pipe.unet._load_ip_adapter_weights([adapter_state_dict_1, adapter_state_dict_2]) 
        
                   # forward pass with multi ip adapter, but scale=0 which should have no effect 
        
                   inputs = self._modify_inputs_for_ip_adapter_test(self.get_dummy_inputs(torch_device)) 
        
                   inputs["ip_adapter_image_embeds"] = [self._get_dummy_image_embeds(cross_attention_dim)] * 2 
        
                   pipe.set_ip_adapter_scale([0.0, 0.0]) 
        
                   output_without_multi_adapter_scale = pipe(**inputs)[0] 
        
                   if expected_pipe_slice is not None: 
        
                       output_without_multi_adapter_scale = output_without_multi_adapter_scale[0, -3:, -3:, -1].flatten() 
        
                   # forward pass with multi ip adapter, but with scale of adapter weights 
        
                   inputs = self._modify_inputs_for_ip_adapter_test(self.get_dummy_inputs(torch_device)) 
        
                   inputs["ip_adapter_image_embeds"] = [self._get_dummy_image_embeds(cross_attention_dim)] * 2 
        
                   pipe.set_ip_adapter_scale([42.0, 42.0]) 
        
                   output_with_multi_adapter_scale = pipe(**inputs)[0] 
        
                   if expected_pipe_slice is not None: 
        
                       output_with_multi_adapter_scale = output_with_multi_adapter_scale[0, -3:, -3:, -1].flatten() 
        
                   max_diff_without_multi_adapter_scale = np.abs( 
        
                       output_without_multi_adapter_scale - output_without_adapter 
        
                   ).max() 
        
                   max_diff_with_multi_adapter_scale = np.abs(output_with_multi_adapter_scale - output_without_adapter).max() 
        
                   self.assertLess( 
        
                       max_diff_without_multi_adapter_scale, 
        
                       expected_max_diff, 
        
                       "Output without multi-ip-adapter must be same as normal inference", 
        
                   ) 
        
                   self.assertGreater( 
        
                       max_diff_with_multi_adapter_scale, 
        
                       1e-2, 
        
                       "Output with multi-ip-adapter scale must be different from normal inference", 
        
                   )

guiyrt · 2025-02-24T17:00:50Z

FluxIPAdapterTesterMixin can be updated for multi IPAdapters with IPAdapterTesterMixin as reference.

Perfect, will do!

Another thing, in #10775 (comment), when you mentioned multi image support for flux-ip-adapter-v2 (to be done on a separate PR), you said ip_adapter_image would need to support List[PipelineImageInput]. You meant the IP-Adapter interface, not the pipeline interface, right? Because atm ip_adapter_image on FluxPipeline is still Optional[PipelineImageInput], and I think we need to update to Optional[Union[PipelineImageInput, List[PipelineImageInput]] (one image per IP-Adapter).

hlky · 2025-02-24T17:07:24Z

PipelineImageInput is

PipelineImageInput = Union[
    PIL.Image.Image,
    np.ndarray,
    torch.Tensor,
    List[PIL.Image.Image],
    List[np.ndarray],
    List[torch.Tensor],
]

The List is for one image per IPAdapter or one image per ControlNet.

List[PipelineImageInput] would be like in ControlNetUnion where each ControlNet can take multiple inputs (with the experimental scale type, not MultiControlNet).

guiyrt · 2025-02-24T17:39:03Z

Ready for review :) I added the ability to mix and match per-layer scale and single scale. For example, you can pass scale for two IP-Adapters as [list_with_19_values, 0.3]. You can also pass a single value and it's used for every layer of every IP-Adapter. I also made use of _is_valid_type to check types as part of processing scale input, so it doesn't fail somewhere later, with minimal additional complexity.

guiyrt · 2025-02-24T17:53:01Z

@hlky failed check_code_quality comes from docs/source/en/_toctree.yml

hlky · 2025-02-24T18:05:51Z

@guiyrt It looks like this has introduced a circular dependency. Let's move _get_detailed_type and _is_valid_type under diffusers.utils and create typing_utils.py cc @yiyixuxu

from ..pipelines.pipeline_loading_utils import _get_detailed_type, _is_valid_type

guiyrt · 2025-02-24T18:35:38Z

Just sharing some cool outputs I got, creating image variations by pushing noise through the IP-Adapter using random image embeds instead of outputs from CLIP vision model. Somewhat by accident while testing for ip_adapter_image_embeds, but I really liked the vibe from lucky seed 423 🤩

Inference code

No IP-Adapter

import torch
from diffusers import FluxPipeline
from diffusers.utils import load_image

pipe: FluxPipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16,
)

pipe.enable_sequential_cpu_offload()

image = pipe(
    width=1024,
    height=1024,
    prompt="A vintage picture of an astronaut in a starry sky",
    generator=torch.manual_seed(423),
).images[0]

image.save('result_no_ipa.jpg')

Single IP-Adapter

import torch
from diffusers import FluxPipeline
from diffusers.utils import load_image

pipe: FluxPipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16,
)

pipe.load_ip_adapter(
    "XLabs-AI/flux-ip-adapter",
    weight_name="ip_adapter.safetensors",
    image_encoder_pretrained_model_name_or_path="openai/clip-vit-large-patch14"
)

pipe.set_ip_adapter_scale(0.5)
pipe.enable_sequential_cpu_offload()

image = pipe(
    width=1024,
    height=1024,
    prompt="A vintage picture of an astronaut in a starry sky",
    generator=torch.manual_seed(423),
    ip_adapter_image_embeds=[torch.rand(1, 1, 768)]
).images[0]

image.save('result_single.jpg')

Dual IP-Adapter

import torch
from diffusers import FluxPipeline
from diffusers.utils import load_image

pipe: FluxPipeline = FluxPipeline.from_pretrained(
   "black-forest-labs/FLUX.1-dev",
   torch_dtype=torch.bfloat16,
)

pipe.load_ip_adapter(
   ["XLabs-AI/flux-ip-adapter", "XLabs-AI/flux-ip-adapter"],
   weight_name=["ip_adapter.safetensors", "ip_adapter.safetensors"],
   image_encoder_pretrained_model_name_or_path="openai/clip-vit-large-patch14"
)

pipe.set_ip_adapter_scale([0.25, 0.25])
pipe.enable_sequential_cpu_offload()

image = pipe(
   width=1024,
   height=1024,
   prompt="A vintage picture of an astronaut in a starry sky",
   generator=torch.manual_seed(423),
   ip_adapter_image_embeds=[torch.rand(1,1,768), torch.rand(1,1,768)]
).images[0]

image.save('result_multi.jpg')

No IP-Adapter	Single IP-Adapter, 0.5 scale	Dual IP-Adapter, 2x 0.25 scale

HuggingFaceDocBuilderDev · 2025-02-24T21:34:05Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

hlky

Thanks @guiyrt!

Initial implementation of Flux multi IP-Adapter

a71cfce

guiyrt marked this pull request as draft February 22, 2025 01:19

hlky reviewed Feb 22, 2025

View reviewed changes

src/diffusers/pipelines/flux/pipeline_flux.py Outdated Show resolved Hide resolved

src/diffusers/pipelines/flux/pipeline_flux.py Outdated Show resolved Hide resolved

src/diffusers/pipelines/flux/pipeline_flux.py Outdated Show resolved Hide resolved

hlky and others added 4 commits February 22, 2025 13:15

Merge branch 'main' into flux_multi_ipa

5d482ca

Update src/diffusers/pipelines/flux/pipeline_flux.py

79d1617

Co-authored-by: hlky <hlky@hlky.ac>

Update src/diffusers/pipelines/flux/pipeline_flux.py

339c0b4

Co-authored-by: hlky <hlky@hlky.ac>

Changes for ipa image embeds

9e9b0f8

hlky reviewed Feb 24, 2025

View reviewed changes

src/diffusers/pipelines/flux/pipeline_flux.py Outdated Show resolved Hide resolved

src/diffusers/pipelines/flux/pipeline_flux.py Outdated Show resolved Hide resolved

guiyrt and others added 3 commits February 24, 2025 12:06

Update src/diffusers/pipelines/flux/pipeline_flux.py

cd6f48a

Co-authored-by: hlky <hlky@hlky.ac>

Update src/diffusers/pipelines/flux/pipeline_flux.py

20840d1

Co-authored-by: hlky <hlky@hlky.ac>

make style && make quality

1956da6

guiyrt marked this pull request as ready for review February 24, 2025 12:13

Merge branch 'main' into flux_multi_ipa

f85e1a7

Updated ip_adapter test

f7f9223

Merge branch 'main' into flux_multi_ipa

7c2e5d8

guiyrt and others added 2 commits February 24, 2025 18:20

Created typing_utils.py

905e8d7

Merge branch 'main' into flux_multi_ipa

62009b5

Merge branch 'main' into flux_multi_ipa

11327ac

hlky approved these changes Feb 25, 2025

View reviewed changes

hlky merged commit 1450c2a into huggingface:main Feb 25, 2025
38 of 43 checks passed

guiyrt deleted the flux_multi_ipa branch February 25, 2025 11:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi IP-Adapter for Flux pipelines #10867

Multi IP-Adapter for Flux pipelines #10867

guiyrt commented Feb 22, 2025 •

edited

Loading

guiyrt commented Feb 24, 2025

hlky commented Feb 24, 2025

guiyrt commented Feb 24, 2025

hlky commented Feb 24, 2025

guiyrt commented Feb 24, 2025

guiyrt commented Feb 24, 2025

hlky commented Feb 24, 2025

guiyrt commented Feb 24, 2025

HuggingFaceDocBuilderDev commented Feb 24, 2025

hlky left a comment

Multi IP-Adapter for Flux pipelines #10867

Multi IP-Adapter for Flux pipelines #10867

Conversation

guiyrt commented Feb 22, 2025 • edited Loading

What does this PR do?

Before submitting

Who can review?

guiyrt commented Feb 24, 2025

hlky commented Feb 24, 2025

guiyrt commented Feb 24, 2025

hlky commented Feb 24, 2025

guiyrt commented Feb 24, 2025

guiyrt commented Feb 24, 2025

hlky commented Feb 24, 2025

guiyrt commented Feb 24, 2025

HuggingFaceDocBuilderDev commented Feb 24, 2025

hlky left a comment

Choose a reason for hiding this comment

guiyrt commented Feb 22, 2025 •

edited

Loading