-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guidance on Integrating xformers with PyTorch 2.5.1 + CUDA 12.6.3 + cuDNN 9.5.1 + Flash Attention 2.7.0.post2 #1163
Comments
Apparently, we only support up to FlashAttention 2.6.3 (inclusive). If that's not met, we fall back onto the FlashAttention provided by PyTorch, which is what is happening for you. xformers/xformers/ops/fmha/flash.py Line 66 in 6e10bd2
I'm not sure why this is the case, it's possible that it's just that we need to test compatibility before increasing that value. Would you be able to modify it manually and check if everything works (including running all the xFormers tests)? If so, we could consider raising that value. |
Dear Luca Wehrstedt, Thank you for your response yesterday. I tried the method you suggested, modifying line 66 of xformers/xformers/ops/fmha/flash.py to change LAST from 2.6.3 to 2.7.0. After applying this change, I checked the information using the command python -m xformers.info and saw that the outputs had been updated to: memory_efficient_attention.fa2F@v2.7.0 post2-pt: available To investigate further, I tested various library configurations and made the following observations:
From a performance perspective—although I understand this is just my personal observation and might not be statistically significant—using PyTorch 2.5.1 + FlashAttention 2.7.0.post2 alone reduced image generation time by approximately 0.15 seconds compared to integrating FlashAttention 2.6.3 with xFormers. Based on these results, I’m inclined to think that there may be a compatibility issue between xFormers and FlashAttention 2.7.0.post2. Do you think this is the case? If so, does it mean that, for now, xFormers cannot be integrated with FlashAttention 2.7.0.post2? While I find the pattern using xFormers to be more versatile, it seems that achieving this integration may not be feasible at the moment. I’d greatly appreciate any further advice you can provide. Best regards, |
You'll need to tell us exactly what error you got with FlashAttention 2.7.0 if you want us to help |
!!! Exception during processing !!! not enough values to unpack (expected 8, got 4) The above error occurs during image generation execution in ComfyUI when integrating PyTorch 2.5.1 (CUDA 12.6.3 + cuDNN 9.5.1) with FlashAttention 2.7.1post2 and xformers 0.0.29. However, this issue does not occur when using a different combination: PyTorch 2.5.1 with only FlashAttention 2.7.1post2. Since FlashAttention has been successfully tested and integrated with CUDA 12.6.3, the remaining potential cause seems to be either a failure in building xformers or that simply modifying line 66 of /home/sk/src/xformers/xformers/ops/fmha/flash.py is insufficient for xformers to function properly. Could you advise on specific steps I should take to resolve this issue? For reference, here are the commands and environment variables used during the build process:
python setup.py build_ext |
The error you reported shows that indeed FlashAttention v2.7.0 changed the API of some of their function (specifically, |
We updated FlashAttention to 2.7.2: 839c4ec |
Execuse me, will this help resolve the problem? File "G:\pycharm\ComfyUI\comfy\ldm\modules\attention.py", line 396, in attention_xformers
out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=mask)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\packages\poetry\virtualenvs\stable-diffusion-webui-_SfT44pY-py3.12\Lib\site-packages\xformers\ops\fmha\__init__.py", line 306, in memory_efficient_attention
return _memory_efficient_attention(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\packages\poetry\virtualenvs\stable-diffusion-webui-_SfT44pY-py3.12\Lib\site-packages\xformers\ops\fmha\__init__.py", line 467, in _memory_efficient_attention
return _memory_efficient_attention_forward(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\packages\poetry\virtualenvs\stable-diffusion-webui-_SfT44pY-py3.12\Lib\site-packages\xformers\ops\fmha\__init__.py", line 490, in _memory_efficient_attention_forward
out, *_ = op.apply(inp, needs_gradient=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\packages\poetry\virtualenvs\stable-diffusion-webui-_SfT44pY-py3.12\Lib\site-packages\xformers\ops\fmha\flash.py", line 677, in apply
out, softmax_lse, rng_state = cls.OPERATOR(
^^^^^^^^^^^^^
File "G:\packages\poetry\virtualenvs\stable-diffusion-webui-_SfT44pY-py3.12\Lib\site-packages\torch\_ops.py", line 1116, in __call__
return self._op(*args, **(kwargs or {}))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\packages\poetry\virtualenvs\stable-diffusion-webui-_SfT44pY-py3.12\Lib\site-packages\torch\_library\custom_ops.py", line 324, in backend_impl
result = self._backend_fns[device_type](*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\packages\poetry\virtualenvs\stable-diffusion-webui-_SfT44pY-py3.12\Lib\site-packages\torch\_compile.py", line 32, in inner
return disable_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\packages\poetry\virtualenvs\stable-diffusion-webui-_SfT44pY-py3.12\Lib\site-packages\torch\_dynamo\eval_frame.py", line 632, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "G:\packages\poetry\virtualenvs\stable-diffusion-webui-_SfT44pY-py3.12\Lib\site-packages\torch\_library\custom_ops.py", line 367, in wrapped_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "G:\packages\poetry\virtualenvs\stable-diffusion-webui-_SfT44pY-py3.12\Lib\site-packages\xformers\ops\fmha\flash.py", line 139, in _flash_fwd
(
ValueError: not enough values to unpack (expected 8, got 4) |
Please make sure you reinstall all of Flash + xFormers from scratch, with full compilation. It looks like a version mismatch to me. |
Hello, I am getting the exact same issue. Running this on Windows using the installation instructoins provided in this library; xFormers 0.0.29 Edit: I downgraded to 0.28 and it worked again so i guess there is something with 0.29 that have become broken? @lw |
Hi, |
Wonderful to hear a fix is on the way! This library is gold! |
The fix is out: 46a02df |
Currently running training at the moment but I will fix it as soon as it is done. I’ll report back to you tomorrow early hopefully. |
@danthe3rd But I can confirm that it is working now, with release 0.29 with the changes you implemented! Many thanks for taking such quick action on this matter. Happy new year! |
The build is in progress for 0.0.29.post1, thanks for the confirmation :) |
I had the same ValueError issue but it was fixed in 0.0.29.post1 |
❓ Questions and Help
Currently, I have successfully built PyTorch 2.5.1 from source (with CUDA 12.6.3 and cuDNN 9.5.1) and installed it into a Miniconda3 Python 3.11.10 virtual environment. I also built Flash Attention 2.7.0.post2 with CUDA 12.6.3 and installed it into the same environment as a wheel file.
Next, I built and installed xformers from source into this environment. However, upon running python -m xformers.info, I see the following output indicating that fa2F@v2.5.7-pt and fa2B@v2.5.7-pt are being used. This appears to be a version mismatch, as I expected Flash Attention 2.7.0.post2 to integrate.
Previously, I used the following environment: PyTorch 2.5.1 (with CUDA 12.4.1 and cuDNN 9.5.1) and Flash Attention 2.6.3. In this case, xformers successfully integrated with the Flash Attention 2.6.3 that was installed in the virtual environment. However, in the current environment with CUDA 12.6.3, xformers appears unable to integrate with Flash Attention 2.7.0.post2, even though it was also built from source.
Is this a compatibility issue specific to CUDA 12.6.3? Or is there a known method to integrate xformers with Flash Attention 2.7.0.post2 in this setup? Any guidance or suggestions for enabling the integration would be greatly appreciated.
Below is the relevant output from python -m xformers.info: xFormers 0.0.29+6e10bd21.d20241126
memory_efficient_attention.ckF: unavailable
memory_efficient_attention.ckB: unavailable
memory_efficient_attention.ck_decoderF: unavailable
memory_efficient_attention.ck_splitKF: unavailable
memory_efficient_attention.cutlassF-pt: available
memory_efficient_attention.cutlassB-pt: available
memory_efficient_attention.fa2F@v2.5.7-pt: available
memory_efficient_attention.fa2B@v2.5.7-pt: available
memory_efficient_attention.fa3F@0.0.0: unavailable
memory_efficient_attention.fa3B@0.0.0: unavailable
memory_efficient_attention.triton_splitKF: available
indexing.scaled_index_addF: available
indexing.scaled_index_addB: available
indexing.index_select: available
sequence_parallel_fused.write_values: available
sequence_parallel_fused.wait_values: available
sequence_parallel_fused.cuda_memset_32b_async: available
sp24.sparse24_sparsify_both_ways: available
sp24.sparse24_apply: available
sp24.sparse24_apply_dense_output: available
sp24._sparse24_gemm: available
sp24._cslt_sparse_mm_search@0.0.0: available
sp24._cslt_sparse_mm@0.0.0: available
swiglu.dual_gemm_silu: available
swiglu.gemm_fused_operand_sum: available
swiglu.fused.p.cpp: available
is_triton_available: True
pytorch.version: 2.5.1
pytorch.cuda: available
gpu.compute_capability: 8.9
gpu.name: NVIDIA GeForce RTX 4060 Ti
dcgm_profiler: unavailable
build.info: available
build.cuda_version: 1206
build.hip_version: None
build.python_version: 3.11.10
build.torch_version: 2.5.1
build.env.TORCH_CUDA_ARCH_LIST: 8.0;8.6;8.9
build.env.PYTORCH_ROCM_ARCH: None
build.env.XFORMERS_BUILD_TYPE: None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None
build.env.NVCC_FLAGS: None
build.env.XFORMERS_PACKAGE_FROM: None
build.nvcc_version: 12.6.85
source.privacy: open source
The text was updated successfully, but these errors were encountered: