Guidance on Integrating xformers with PyTorch 2.5.1 + CUDA 12.6.3 + cuDNN 9.5.1 + Flash Attention 2.7.0.post2 #1163

kksspoi · 2024-11-26T02:50:17Z

❓ Questions and Help

Currently, I have successfully built PyTorch 2.5.1 from source (with CUDA 12.6.3 and cuDNN 9.5.1) and installed it into a Miniconda3 Python 3.11.10 virtual environment. I also built Flash Attention 2.7.0.post2 with CUDA 12.6.3 and installed it into the same environment as a wheel file.

Next, I built and installed xformers from source into this environment. However, upon running python -m xformers.info, I see the following output indicating that fa2F@v2.5.7-pt and fa2B@v2.5.7-pt are being used. This appears to be a version mismatch, as I expected Flash Attention 2.7.0.post2 to integrate.

Previously, I used the following environment: PyTorch 2.5.1 (with CUDA 12.4.1 and cuDNN 9.5.1) and Flash Attention 2.6.3. In this case, xformers successfully integrated with the Flash Attention 2.6.3 that was installed in the virtual environment. However, in the current environment with CUDA 12.6.3, xformers appears unable to integrate with Flash Attention 2.7.0.post2, even though it was also built from source.

Is this a compatibility issue specific to CUDA 12.6.3? Or is there a known method to integrate xformers with Flash Attention 2.7.0.post2 in this setup? Any guidance or suggestions for enabling the integration would be greatly appreciated.

Below is the relevant output from python -m xformers.info:　　　　　　　　　　　　　　　　　　　　　　　　　　　　xFormers 0.0.29+6e10bd21.d20241126
memory_efficient_attention.ckF: unavailable
memory_efficient_attention.ckB: unavailable
memory_efficient_attention.ck_decoderF: unavailable
memory_efficient_attention.ck_splitKF: unavailable
memory_efficient_attention.cutlassF-pt: available
memory_efficient_attention.cutlassB-pt: available
memory_efficient_attention.fa2F@v2.5.7-pt: available
memory_efficient_attention.fa2B@v2.5.7-pt: available
memory_efficient_attention.fa3F@0.0.0: unavailable
memory_efficient_attention.fa3B@0.0.0: unavailable
memory_efficient_attention.triton_splitKF: available
indexing.scaled_index_addF: available
indexing.scaled_index_addB: available
indexing.index_select: available
sequence_parallel_fused.write_values: available
sequence_parallel_fused.wait_values: available
sequence_parallel_fused.cuda_memset_32b_async: available
sp24.sparse24_sparsify_both_ways: available
sp24.sparse24_apply: available
sp24.sparse24_apply_dense_output: available
sp24._sparse24_gemm: available
sp24._cslt_sparse_mm_search@0.0.0: available
sp24._cslt_sparse_mm@0.0.0: available
swiglu.dual_gemm_silu: available
swiglu.gemm_fused_operand_sum: available
swiglu.fused.p.cpp: available
is_triton_available: True
pytorch.version: 2.5.1
pytorch.cuda: available
gpu.compute_capability: 8.9
gpu.name: NVIDIA GeForce RTX 4060 Ti
dcgm_profiler: unavailable
build.info: available
build.cuda_version: 1206
build.hip_version: None
build.python_version: 3.11.10
build.torch_version: 2.5.1
build.env.TORCH_CUDA_ARCH_LIST: 8.0;8.6;8.9
build.env.PYTORCH_ROCM_ARCH: None
build.env.XFORMERS_BUILD_TYPE: None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None
build.env.NVCC_FLAGS: None
build.env.XFORMERS_PACKAGE_FROM: None
build.nvcc_version: 12.6.85
source.privacy: open source
　　　　

lw · 2024-11-26T08:55:47Z

Apparently, we only support up to FlashAttention 2.6.3 (inclusive). If that's not met, we fall back onto the FlashAttention provided by PyTorch, which is what is happening for you.

xformers/xformers/ops/fmha/flash.py

Line 66 in 6e10bd2

FLASH_VER_LAST = (2, 6, 3) # last supported, inclusive

I'm not sure why this is the case, it's possible that it's just that we need to test compatibility before increasing that value. Would you be able to modify it manually and check if everything works (including running all the xFormers tests)? If so, we could consider raising that value.

kksspoi · 2024-11-27T02:37:33Z

Dear Luca Wehrstedt,

Thank you for your response yesterday. I tried the method you suggested, modifying line 66 of xformers/xformers/ops/fmha/flash.py to change LAST from 2.6.3 to 2.7.0. After applying this change, I checked the information using the command python -m xformers.info and saw that the outputs had been updated to:　　　　　　　　　　　　　　　　　　　　　　　　　　　　　memory_efficient_attention.fa2F@v2.7.0 post2-pt: available
memory_efficient_attention.fa2B@v2.7.0 post2-pt: available　　　　　　　　　　　　　　　　However, when I attempted to generate images in ComfyUI using PyTorch 2.5.1 (built from source with CUDA 12.6.3 and cuDNN 9.5.1), along with xFormers 0.0.29 and FlashAttention 2.7.0.post2, a ValueError occurred, and the generation failed.

To investigate further, I tested various library configurations and made the following observations:

Using PyTorch 2.5.1 + FlashAttention 2.7.0.post2 alone, image generation worked without errors.
Combining PyTorch 2.5.1 + FlashAttention 2.6.3 + xFormers 0.0.29 also allowed successful image generation.
If I reverted line 66 in xformers/xformers/ops/fmha/flash.py back to 2.6.3, I could successfully generate images even with FlashAttention 2.7.0 installed in the virtual environment, as PyTorch defaulted to using FlashAttention 2.5.7 when integrated with xFormers.

From a performance perspective—although I understand this is just my personal observation and might not be statistically significant—using PyTorch 2.5.1 + FlashAttention 2.7.0.post2 alone reduced image generation time by approximately 0.15 seconds compared to integrating FlashAttention 2.6.3 with xFormers.

Based on these results, I’m inclined to think that there may be a compatibility issue between xFormers and FlashAttention 2.7.0.post2. Do you think this is the case? If so, does it mean that, for now, xFormers cannot be integrated with FlashAttention 2.7.0.post2? While I find the pattern using xFormers to be more versatile, it seems that achieving this integration may not be feasible at the moment. I’d greatly appreciate any further advice you can provide.

Best regards,　
kkspoi　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　

lw · 2024-11-29T11:25:10Z

You'll need to tell us exactly what error you got with FlashAttention 2.7.0 if you want us to help

kksspoi · 2024-11-29T13:50:37Z

!!! Exception during processing !!! not enough values to unpack (expected 8, got 4)
Traceback (most recent call last):
File "/home/sk/src/ComfyUI/execution.py", line 323, in execute
output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/ComfyUI/execution.py", line 198, in get_output_data
return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/ComfyUI/execution.py", line 169, in _map_node_over_list
process_inputs(input_dict, i)
File "/home/sk/src/ComfyUI/execution.py", line 158, in process_inputs
results.append(getattr(obj, func)(**inputs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/ComfyUI/nodes.py", line 1457, in sample
return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/ComfyUI/nodes.py", line 1424, in common_ksampler
samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/ComfyUI/comfy/sample.py", line 43, in sample
samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/ComfyUI/comfy/samplers.py", line 855, in sample
return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/ComfyUI/comfy/samplers.py", line 753, in sample
return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/ComfyUI/comfy/samplers.py", line 740, in sample
output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/ComfyUI/comfy/samplers.py", line 719, in inner_sample
samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/ComfyUI/comfy/samplers.py", line 624, in sample
samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/miniconda3/envs/tb/lib/python3.11/site-packages/torch/utils/contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/ComfyUI/comfy/k_diffusion/sampling.py", line 155, in sample_euler
denoised = model(x, sigma_hat * s_in, **extra_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/ComfyUI/comfy/samplers.py", line 299, in call
out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/ComfyUI/comfy/samplers.py", line 706, in call
return self.predict_noise(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/ComfyUI/comfy/samplers.py", line 709, in predict_noise
return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/ComfyUI/comfy/samplers.py", line 279, in sampling_function
out = calc_cond_batch(model, conds, x, timestep, model_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/ComfyUI/comfy/samplers.py", line 228, in calc_cond_batch
output = model.apply_model(input_x, timestep, **c).chunk(batch_chunks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/ComfyUI/comfy/model_base.py", line 145, in apply_model
model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/miniconda3/envs/tb/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/miniconda3/envs/tb/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/ComfyUI/comfy/ldm/modules/diffusionmodules/openaimodel.py", line 857, in forward
h = forward_timestep_embed(module, h, emb, context, transformer_options, time_context=time_context, num_video_frames=num_video_frames, image_only_indicator=image_only_indicator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/ComfyUI/comfy/ldm/modules/diffusionmodules/openaimodel.py", line 44, in forward_timestep_embed
x = layer(x, context, transformer_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/miniconda3/envs/tb/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/miniconda3/envs/tb/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/ComfyUI/comfy/ldm/modules/attention.py", line 709, in forward
x = block(x, context=context[i], transformer_options=transformer_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/miniconda3/envs/tb/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/miniconda3/envs/tb/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/ComfyUI/comfy/ldm/modules/attention.py", line 596, in forward
n = self.attn1(n, context=context_attn1, value=value_attn1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/miniconda3/envs/tb/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/miniconda3/envs/tb/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/ComfyUI/comfy/ldm/modules/attention.py", line 490, in forward
out = optimized_attention(q, k, v, self.heads, attn_precision=self.attn_precision)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/ComfyUI/comfy/ldm/modules/attention.py", line 383, in attention_xformers
out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=mask)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/xformers/xformers/ops/fmha/init.py", line 306, in memory_efficient_attention
return _memory_efficient_attention(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/xformers/xformers/ops/fmha/init.py", line 467, in _memory_efficient_attention
return _memory_efficient_attention_forward(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/xformers/xformers/ops/fmha/init.py", line 490, in memory_efficient_attention_forward
out, * = op.apply(inp, needs_gradient=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/xformers/xformers/ops/fmha/flash.py", line 677, in apply
out, softmax_lse, rng_state = cls.OPERATOR(
^^^^^^^^^^^^^
File "/home/sk/miniconda3/envs/tb/lib/python3.11/site-packages/torch/_ops.py", line 1116, in call
return self._op(*args, **(kwargs or {}))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/miniconda3/envs/tb/lib/python3.11/site-packages/torch/_library/custom_ops.py", line 324, in backend_impl
result = self._backend_fns[device_type](*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/miniconda3/envs/tb/lib/python3.11/site-packages/torch/_compile.py", line 32, in inner
return disable_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sk/miniconda3/envs/tb/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/sk/miniconda3/envs/tb/lib/python3.11/site-packages/torch/_library/custom_ops.py", line 367, in wrapped_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/sk/src/xformers/xformers/ops/fmha/flash.py", line 139, in _flash_fwd
(
ValueError: not enough values to unpack (expected 8, got 4)

The above error occurs during image generation execution in ComfyUI when integrating PyTorch 2.5.1 (CUDA 12.6.3 + cuDNN 9.5.1) with FlashAttention 2.7.1post2 and xformers 0.0.29. However, this issue does not occur when using a different combination: PyTorch 2.5.1 with only FlashAttention 2.7.1post2. Since FlashAttention has been successfully tested and integrated with CUDA 12.6.3, the remaining potential cause seems to be either a failure in building xformers or that simply modifying line 66 of /home/sk/src/xformers/xformers/ops/fmha/flash.py is insufficient for xformers to function properly.

Could you advise on specific steps I should take to resolve this issue?

For reference, here are the commands and environment variables used during the build process:
　　Environment Variables　
export MAX_JOBS=6
export USE_NINJA=1
export TORCH_CUDA_ARCH_LIST="8.0;8.6;8.9"
export USE_CUDA=1
export CUDA_HOME=/usr/local/cuda-12.6
export CUDA_NVCC_EXECUTABLE=$CUDA_HOME/bin/nvcc
export CUDNN_LIB_DIR=/lib/x86_64-linux-gnu
export CUDNN_INCLUDE_DIR=/usr/include
export CUDNN_LIBRARY=$CUDNN_LIB_DIR/libcudnn.so
export CC=/usr/bin/gcc-12
export CXX=/usr/bin/g++-12
export CUDAHOSTCXX=/usr/bin/g++-12　　　　　　　　　　　　　　　　　
Build Commands

First, I run:

python setup.py build_ext　　　　　　　　　　　　　　
If the build completes successfully, I proceed with the following command to install the development version in the Miniconda3 (Python 3.11.10) virtual environment:　　　　　　　　python setup.py develop　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　Any guidance or recommendations for resolving this issue would be greatly appreciated.

lw · 2024-12-02T09:44:17Z

The error you reported shows that indeed FlashAttention v2.7.0 changed the API of some of their function (specifically, mha_fwd reduced the number of returned values from 8 to 4, see Dao-AILab/flash-attention#1139). This means that we'll need to change the xFormers code that calls into FlashAttention, and this will require some time and effort. I can't guarantee an ETA.

lw · 2024-12-11T15:44:06Z

We updated FlashAttention to 2.7.2: 839c4ec

ACGNnsj · 2024-12-27T09:34:21Z

We updated FlashAttention to 2.7.2: 839c4ec

Execuse me, will this help resolve the problem?
I've tried the latest commit, but it seems the same.

File "G:\pycharm\ComfyUI\comfy\ldm\modules\attention.py", line 396, in attention_xformers
    out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=mask)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\packages\poetry\virtualenvs\stable-diffusion-webui-_SfT44pY-py3.12\Lib\site-packages\xformers\ops\fmha\__init__.py", line 306, in memory_efficient_attention
    return _memory_efficient_attention(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\packages\poetry\virtualenvs\stable-diffusion-webui-_SfT44pY-py3.12\Lib\site-packages\xformers\ops\fmha\__init__.py", line 467, in _memory_efficient_attention
    return _memory_efficient_attention_forward(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\packages\poetry\virtualenvs\stable-diffusion-webui-_SfT44pY-py3.12\Lib\site-packages\xformers\ops\fmha\__init__.py", line 490, in _memory_efficient_attention_forward
    out, *_ = op.apply(inp, needs_gradient=False)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\packages\poetry\virtualenvs\stable-diffusion-webui-_SfT44pY-py3.12\Lib\site-packages\xformers\ops\fmha\flash.py", line 677, in apply
    out, softmax_lse, rng_state = cls.OPERATOR(
                                  ^^^^^^^^^^^^^
  File "G:\packages\poetry\virtualenvs\stable-diffusion-webui-_SfT44pY-py3.12\Lib\site-packages\torch\_ops.py", line 1116, in __call__
    return self._op(*args, **(kwargs or {}))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\packages\poetry\virtualenvs\stable-diffusion-webui-_SfT44pY-py3.12\Lib\site-packages\torch\_library\custom_ops.py", line 324, in backend_impl
    result = self._backend_fns[device_type](*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\packages\poetry\virtualenvs\stable-diffusion-webui-_SfT44pY-py3.12\Lib\site-packages\torch\_compile.py", line 32, in inner
    return disable_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "G:\packages\poetry\virtualenvs\stable-diffusion-webui-_SfT44pY-py3.12\Lib\site-packages\torch\_dynamo\eval_frame.py", line 632, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "G:\packages\poetry\virtualenvs\stable-diffusion-webui-_SfT44pY-py3.12\Lib\site-packages\torch\_library\custom_ops.py", line 367, in wrapped_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "G:\packages\poetry\virtualenvs\stable-diffusion-webui-_SfT44pY-py3.12\Lib\site-packages\xformers\ops\fmha\flash.py", line 139, in _flash_fwd
    (
ValueError: not enough values to unpack (expected 8, got 4)

lw · 2024-12-27T10:13:40Z

Please make sure you reinstall all of Flash + xFormers from scratch, with full compilation. It looks like a version mismatch to me.

ACGNnsj · 2024-12-27T11:06:19Z

Well, mha_fwd here returns 4 elements, while fwd here is expected to return 8 elements.

abrahamezzeddine · 2024-12-30T12:35:10Z

Hello,

I am getting the exact same issue.
Error during memory_efficient_attention: not enough values to unpack (expected 8, got 4) with release 0.29 after the update.

Running this on Windows using the installation instructoins provided in this library;

xFormers 0.0.29
memory_efficient_attention.ckF: unavailable
memory_efficient_attention.ckB: unavailable
memory_efficient_attention.ck_decoderF: unavailable
memory_efficient_attention.ck_splitKF: unavailable
memory_efficient_attention.cutlassF-pt: available
memory_efficient_attention.cutlassB-pt: available
memory_efficient_attention.fa2F@v2.7.2.post1: available
memory_efficient_attention.fa2B@v2.7.2.post1: available
memory_efficient_attention.fa3F@0.0.0: unavailable
memory_efficient_attention.fa3B@0.0.0: unavailable
memory_efficient_attention.triton_splitKF: available
indexing.scaled_index_addF: available
indexing.scaled_index_addB: available
indexing.index_select: available
sequence_parallel_fused.write_values: available
sequence_parallel_fused.wait_values: available
sequence_parallel_fused.cuda_memset_32b_async: available
sp24.sparse24_sparsify_both_ways: available
sp24.sparse24_apply: available
sp24.sparse24_apply_dense_output: available
sp24._sparse24_gemm: available
sp24._cslt_sparse_mm_search@0.0.0: available
sp24._cslt_sparse_mm@0.0.0: available
swiglu.dual_gemm_silu: available
swiglu.gemm_fused_operand_sum: available
swiglu.fused.p.cpp: available
is_triton_available: True
pytorch.version: 2.5.1
pytorch.cuda: available
gpu.compute_capability: 8.6
gpu.name: NVIDIA RTX A6000
dcgm_profiler: unavailable
build.info: available
build.cuda_version: 1204
build.hip_version: None
build.python_version: 3.10.11
build.torch_version: 2.5.1+cu124
build.env.TORCH_CUDA_ARCH_LIST: 6.0+PTX 7.0 7.5 8.0+PTX 9.0a
build.env.PYTORCH_ROCM_ARCH: None
build.env.XFORMERS_BUILD_TYPE: Release
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None
build.env.NVCC_FLAGS: -allow-unsupported-compiler
build.env.XFORMERS_PACKAGE_FROM: wheel-v0.0.29
build.nvcc_version: 12.4.131
source.privacy: open source

Edit: I downgraded to 0.28 and it worked again so i guess there is something with 0.29 that have become broken? @lw

danthe3rd · 2024-12-30T15:54:09Z

Hi,
Thanks for reporting it, this is indeed a bug in the new release. Let me get a fix out there for you

abrahamezzeddine · 2024-12-30T17:38:35Z

Wonderful to hear a fix is on the way! This library is gold!

danthe3rd · 2024-12-30T18:23:19Z

The fix is out: 46a02df
Can you verify it works for you?
I'll create a new version tomorrow (0.0.29.post1) to fix it for everyone

abrahamezzeddine · 2024-12-30T18:26:14Z

Currently running training at the moment but I will fix it as soon as it is done. I’ll report back to you tomorrow early hopefully.

abrahamezzeddine · 2024-12-30T22:01:40Z

@danthe3rd
Managed to try it out now. I modified the files directly to incorporate the changes because I am getting ninja errors with 260 lengths issues on Windows (even with latest ninja 1.12 that was supposed to handle similar length issues).

But I can confirm that it is working now, with release 0.29 with the changes you implemented!

Many thanks for taking such quick action on this matter. Happy new year!

danthe3rd · 2024-12-31T10:11:22Z

The build is in progress for 0.0.29.post1, thanks for the confirmation :)
Happy new year to you too!

rltgjqmcpgjadyd · 2024-12-31T12:30:06Z

I had the same ValueError issue but it was fixed in 0.0.29.post1

lw closed this as completed Dec 11, 2024

ACGNnsj mentioned this issue Dec 27, 2024

Inference is broken on newer dev versions of xformers (0.0.29.dev948 and newer) comfyanonymous/ComfyUI#6106

Closed

danthe3rd reopened this Dec 30, 2024

danthe3rd closed this as completed Dec 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guidance on Integrating xformers with PyTorch 2.5.1 + CUDA 12.6.3 + cuDNN 9.5.1 + Flash Attention 2.7.0.post2 #1163

Guidance on Integrating xformers with PyTorch 2.5.1 + CUDA 12.6.3 + cuDNN 9.5.1 + Flash Attention 2.7.0.post2 #1163

kksspoi commented Nov 26, 2024

lw commented Nov 26, 2024

kksspoi commented Nov 27, 2024

lw commented Nov 29, 2024

kksspoi commented Nov 29, 2024

lw commented Dec 2, 2024

lw commented Dec 11, 2024

ACGNnsj commented Dec 27, 2024

lw commented Dec 27, 2024

ACGNnsj commented Dec 27, 2024

abrahamezzeddine commented Dec 30, 2024 •

edited

Loading

danthe3rd commented Dec 30, 2024

abrahamezzeddine commented Dec 30, 2024

danthe3rd commented Dec 30, 2024

abrahamezzeddine commented Dec 30, 2024

abrahamezzeddine commented Dec 30, 2024

danthe3rd commented Dec 31, 2024

rltgjqmcpgjadyd commented Dec 31, 2024

Guidance on Integrating xformers with PyTorch 2.5.1 + CUDA 12.6.3 + cuDNN 9.5.1 + Flash Attention 2.7.0.post2 #1163

Guidance on Integrating xformers with PyTorch 2.5.1 + CUDA 12.6.3 + cuDNN 9.5.1 + Flash Attention 2.7.0.post2 #1163

Comments

kksspoi commented Nov 26, 2024

❓ Questions and Help

lw commented Nov 26, 2024

kksspoi commented Nov 27, 2024

lw commented Nov 29, 2024

kksspoi commented Nov 29, 2024

lw commented Dec 2, 2024

lw commented Dec 11, 2024

ACGNnsj commented Dec 27, 2024

lw commented Dec 27, 2024

ACGNnsj commented Dec 27, 2024

abrahamezzeddine commented Dec 30, 2024 • edited Loading

danthe3rd commented Dec 30, 2024

abrahamezzeddine commented Dec 30, 2024

danthe3rd commented Dec 30, 2024

abrahamezzeddine commented Dec 30, 2024

abrahamezzeddine commented Dec 30, 2024

danthe3rd commented Dec 31, 2024

rltgjqmcpgjadyd commented Dec 31, 2024

abrahamezzeddine commented Dec 30, 2024 •

edited

Loading