Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi IP-Adapter for Flux pipelines #10867

Merged
merged 14 commits into from
Feb 25, 2025
Merged

Conversation

guiyrt
Copy link
Contributor

@guiyrt guiyrt commented Feb 22, 2025

What does this PR do?

Fixes #10775. Adds support for multiple IP-Adapters on Flux pipelines. For testing, I tried using a single IP-Adapter with 0.5 scale, and then two equal IP-Adapters with 0.25 scale each, which should (and does) produce the same result. Basic functionality is there, but I still want to clean-up some parts and add multi ip-adapter scale tests. While #10758 is not merged, I have the typing helping functions here as well. All set!

Single IP-Adapter inference code
import torch
from diffusers import FluxPipeline
from diffusers.utils import load_image

pipe: FluxPipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16,
)

pipe.load_ip_adapter(
    "XLabs-AI/flux-ip-adapter",
    weight_name="ip_adapter.safetensors",
    image_encoder_pretrained_model_name_or_path="openai/clip-vit-large-patch14"
)

pipe.set_ip_adapter_scale(0.5)
pipe.enable_sequential_cpu_offload()

ip_adapter_image = load_image("https://huggingface.co/guiyrt/sample-images/resolve/main/astronaut.jpg")

image = pipe(
    width=1024,
    height=1024,
    prompt="A vintage picture of an astronaut in a starry sky",
    generator=torch.manual_seed(42),
    ip_adapter_image=ip_adapter_image
).images[0]

image.save('result_single.jpg')
Multi IP-Adapter inference code
import torch
from diffusers import FluxPipeline
from diffusers.utils import load_image

pipe: FluxPipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16,
)

pipe.load_ip_adapter(
    ["XLabs-AI/flux-ip-adapter", "XLabs-AI/flux-ip-adapter"],
    weight_name=["ip_adapter.safetensors", "ip_adapter.safetensors"],
    image_encoder_pretrained_model_name_or_path="openai/clip-vit-large-patch14"
)

pipe.set_ip_adapter_scale([0.25, 0.25])
pipe.enable_sequential_cpu_offload()

ip_adapter_image = load_image("https://huggingface.co/guiyrt/sample-images/resolve/main/astronaut.jpg")

image = pipe(
    width=1024,
    height=1024,
    prompt="A vintage picture of an astronaut in a starry sky",
    generator=torch.manual_seed(42),
    ip_adapter_image=[ip_adapter_image,ip_adapter_image]
).images[0]

image.save('result_multi.jpg')
main (sanity check) single IP-Adapter, 0.5 scale dual IP-Adapter, 2x 0.25 scale

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@hlky @yiyixuxu

@guiyrt guiyrt marked this pull request as draft February 22, 2025 01:19
@guiyrt guiyrt marked this pull request as ready for review February 24, 2025 12:13
@guiyrt
Copy link
Contributor Author

guiyrt commented Feb 24, 2025

Thanks for the review @hlky! For this PR, are there any meaningful tests you would like for me to add? I took a look at FluxIPAdapterTesterMixin, and it's not using MultiIPAdapterImageProjection, so tests for multi ip-adapter would probably need some refactor. Is it relevant atm?

@hlky
Copy link
Collaborator

hlky commented Feb 24, 2025

FluxIPAdapterTesterMixin can be updated for multi IPAdapters with IPAdapterTesterMixin as reference.

def test_ip_adapter(self, expected_max_diff: float = 1e-4, expected_pipe_slice=None):
r"""Tests for IP-Adapter.
The following scenarios are tested:
- Single IP-Adapter with scale=0 should produce same output as no IP-Adapter.
- Multi IP-Adapter with scale=0 should produce same output as no IP-Adapter.
- Single IP-Adapter with scale!=0 should produce different output compared to no IP-Adapter.
- Multi IP-Adapter with scale!=0 should produce different output compared to no IP-Adapter.
"""
# Raising the tolerance for this test when it's run on a CPU because we
# compare against static slices and that can be shaky (with a VVVV low probability).
expected_max_diff = 9e-4 if torch_device == "cpu" else expected_max_diff
components = self.get_dummy_components()
pipe = self.pipeline_class(**components).to(torch_device)
pipe.set_progress_bar_config(disable=None)
cross_attention_dim = pipe.unet.config.get("cross_attention_dim", 32)
# forward pass without ip adapter
inputs = self._modify_inputs_for_ip_adapter_test(self.get_dummy_inputs(torch_device))
if expected_pipe_slice is None:
output_without_adapter = pipe(**inputs)[0]
else:
output_without_adapter = expected_pipe_slice
# 1. Single IP-Adapter test cases
adapter_state_dict = create_ip_adapter_state_dict(pipe.unet)
pipe.unet._load_ip_adapter_weights(adapter_state_dict)
# forward pass with single ip adapter, but scale=0 which should have no effect
inputs = self._modify_inputs_for_ip_adapter_test(self.get_dummy_inputs(torch_device))
inputs["ip_adapter_image_embeds"] = [self._get_dummy_image_embeds(cross_attention_dim)]
pipe.set_ip_adapter_scale(0.0)
output_without_adapter_scale = pipe(**inputs)[0]
if expected_pipe_slice is not None:
output_without_adapter_scale = output_without_adapter_scale[0, -3:, -3:, -1].flatten()
# forward pass with single ip adapter, but with scale of adapter weights
inputs = self._modify_inputs_for_ip_adapter_test(self.get_dummy_inputs(torch_device))
inputs["ip_adapter_image_embeds"] = [self._get_dummy_image_embeds(cross_attention_dim)]
pipe.set_ip_adapter_scale(42.0)
output_with_adapter_scale = pipe(**inputs)[0]
if expected_pipe_slice is not None:
output_with_adapter_scale = output_with_adapter_scale[0, -3:, -3:, -1].flatten()
max_diff_without_adapter_scale = np.abs(output_without_adapter_scale - output_without_adapter).max()
max_diff_with_adapter_scale = np.abs(output_with_adapter_scale - output_without_adapter).max()
self.assertLess(
max_diff_without_adapter_scale,
expected_max_diff,
"Output without ip-adapter must be same as normal inference",
)
self.assertGreater(
max_diff_with_adapter_scale, 1e-2, "Output with ip-adapter must be different from normal inference"
)
# 2. Multi IP-Adapter test cases
adapter_state_dict_1 = create_ip_adapter_state_dict(pipe.unet)
adapter_state_dict_2 = create_ip_adapter_state_dict(pipe.unet)
pipe.unet._load_ip_adapter_weights([adapter_state_dict_1, adapter_state_dict_2])
# forward pass with multi ip adapter, but scale=0 which should have no effect
inputs = self._modify_inputs_for_ip_adapter_test(self.get_dummy_inputs(torch_device))
inputs["ip_adapter_image_embeds"] = [self._get_dummy_image_embeds(cross_attention_dim)] * 2
pipe.set_ip_adapter_scale([0.0, 0.0])
output_without_multi_adapter_scale = pipe(**inputs)[0]
if expected_pipe_slice is not None:
output_without_multi_adapter_scale = output_without_multi_adapter_scale[0, -3:, -3:, -1].flatten()
# forward pass with multi ip adapter, but with scale of adapter weights
inputs = self._modify_inputs_for_ip_adapter_test(self.get_dummy_inputs(torch_device))
inputs["ip_adapter_image_embeds"] = [self._get_dummy_image_embeds(cross_attention_dim)] * 2
pipe.set_ip_adapter_scale([42.0, 42.0])
output_with_multi_adapter_scale = pipe(**inputs)[0]
if expected_pipe_slice is not None:
output_with_multi_adapter_scale = output_with_multi_adapter_scale[0, -3:, -3:, -1].flatten()
max_diff_without_multi_adapter_scale = np.abs(
output_without_multi_adapter_scale - output_without_adapter
).max()
max_diff_with_multi_adapter_scale = np.abs(output_with_multi_adapter_scale - output_without_adapter).max()
self.assertLess(
max_diff_without_multi_adapter_scale,
expected_max_diff,
"Output without multi-ip-adapter must be same as normal inference",
)
self.assertGreater(
max_diff_with_multi_adapter_scale,
1e-2,
"Output with multi-ip-adapter scale must be different from normal inference",
)

@guiyrt
Copy link
Contributor Author

guiyrt commented Feb 24, 2025

FluxIPAdapterTesterMixin can be updated for multi IPAdapters with IPAdapterTesterMixin as reference.

Perfect, will do!

Another thing, in #10775 (comment), when you mentioned multi image support for flux-ip-adapter-v2 (to be done on a separate PR), you said ip_adapter_image would need to support List[PipelineImageInput]. You meant the IP-Adapter interface, not the pipeline interface, right? Because atm ip_adapter_image on FluxPipeline is still Optional[PipelineImageInput], and I think we need to update to Optional[Union[PipelineImageInput, List[PipelineImageInput]] (one image per IP-Adapter).

@hlky
Copy link
Collaborator

hlky commented Feb 24, 2025

PipelineImageInput is

PipelineImageInput = Union[
    PIL.Image.Image,
    np.ndarray,
    torch.Tensor,
    List[PIL.Image.Image],
    List[np.ndarray],
    List[torch.Tensor],
]

The List is for one image per IPAdapter or one image per ControlNet.

List[PipelineImageInput] would be like in ControlNetUnion where each ControlNet can take multiple inputs (with the experimental scale type, not MultiControlNet).

@guiyrt
Copy link
Contributor Author

guiyrt commented Feb 24, 2025

Ready for review :) I added the ability to mix and match per-layer scale and single scale. For example, you can pass scale for two IP-Adapters as [list_with_19_values, 0.3]. You can also pass a single value and it's used for every layer of every IP-Adapter. I also made use of _is_valid_type to check types as part of processing scale input, so it doesn't fail somewhere later, with minimal additional complexity.

@guiyrt
Copy link
Contributor Author

guiyrt commented Feb 24, 2025

@hlky failed check_code_quality comes from docs/source/en/_toctree.yml

@hlky
Copy link
Collaborator

hlky commented Feb 24, 2025

@guiyrt It looks like this has introduced a circular dependency. Let's move _get_detailed_type and _is_valid_type under diffusers.utils and create typing_utils.py cc @yiyixuxu

from ..pipelines.pipeline_loading_utils import _get_detailed_type, _is_valid_type

@guiyrt
Copy link
Contributor Author

guiyrt commented Feb 24, 2025

Just sharing some cool outputs I got, creating image variations by pushing noise through the IP-Adapter using random image embeds instead of outputs from CLIP vision model. Somewhat by accident while testing for ip_adapter_image_embeds, but I really liked the vibe from lucky seed 423 🤩

Inference code
No IP-Adapter
import torch
from diffusers import FluxPipeline
from diffusers.utils import load_image

pipe: FluxPipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16,
)

pipe.enable_sequential_cpu_offload()

image = pipe(
    width=1024,
    height=1024,
    prompt="A vintage picture of an astronaut in a starry sky",
    generator=torch.manual_seed(423),
).images[0]

image.save('result_no_ipa.jpg')
Single IP-Adapter
import torch
from diffusers import FluxPipeline
from diffusers.utils import load_image

pipe: FluxPipeline = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16,
)

pipe.load_ip_adapter(
    "XLabs-AI/flux-ip-adapter",
    weight_name="ip_adapter.safetensors",
    image_encoder_pretrained_model_name_or_path="openai/clip-vit-large-patch14"
)

pipe.set_ip_adapter_scale(0.5)
pipe.enable_sequential_cpu_offload()

image = pipe(
    width=1024,
    height=1024,
    prompt="A vintage picture of an astronaut in a starry sky",
    generator=torch.manual_seed(423),
    ip_adapter_image_embeds=[torch.rand(1, 1, 768)]
).images[0]

image.save('result_single.jpg')
Dual IP-Adapter
import torch
from diffusers import FluxPipeline
from diffusers.utils import load_image

pipe: FluxPipeline = FluxPipeline.from_pretrained(
   "black-forest-labs/FLUX.1-dev",
   torch_dtype=torch.bfloat16,
)

pipe.load_ip_adapter(
   ["XLabs-AI/flux-ip-adapter", "XLabs-AI/flux-ip-adapter"],
   weight_name=["ip_adapter.safetensors", "ip_adapter.safetensors"],
   image_encoder_pretrained_model_name_or_path="openai/clip-vit-large-patch14"
)

pipe.set_ip_adapter_scale([0.25, 0.25])
pipe.enable_sequential_cpu_offload()

image = pipe(
   width=1024,
   height=1024,
   prompt="A vintage picture of an astronaut in a starry sky",
   generator=torch.manual_seed(423),
   ip_adapter_image_embeds=[torch.rand(1,1,768), torch.rand(1,1,768)]
).images[0]

image.save('result_multi.jpg')
No IP-Adapter Single IP-Adapter, 0.5 scale Dual IP-Adapter, 2x 0.25 scale

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@hlky hlky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @guiyrt!

@hlky hlky merged commit 1450c2a into huggingface:main Feb 25, 2025
38 of 43 checks passed
@guiyrt guiyrt deleted the flux_multi_ipa branch February 25, 2025 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support multiple IP adapter in Flux
3 participants