Adding flash attention to one click installer #4015

trihardseven · 2023-09-21T08:41:18Z

Description

Adding flash attention to one click installer, for usage with exllamaV2

Additional Context

Me and others not so tech savvy people are having issues installing it manually on windows

maddog7667 · 2023-09-22T12:49:14Z

I agree, considering the instructions to get flash attention working are vague af and assume that the user has years of tech school and are computer gurus.

Panchovix · 2023-09-24T18:06:45Z

Flash-attention 2 doesn't works for now on Windows. I have trying building it but no luck so far.

CamiloMM · 2023-09-24T20:35:44Z

Hey! I recognize Panchovix. If he can't get it to build imma give up right now.

I hope someone with a PhD in Python bullshit saves the day.

donQx · 2023-09-30T10:35:07Z

2023-09-30 12:29:14 WARNING:You are running ExLlamaV2 without flash-attention. This will cause the VRAM usage to be a lot higher than it could be.
Try installing flash-attention following the instructions here: https://github.com/Dao-AILab/flash-attention#installation-and-features

Does this Manual work ? on windows?

redyandsalted · 2023-09-30T10:58:25Z

2023-09-30 12:29:14 WARNING:You are running ExLlamaV2 without flash-attention. This will cause the VRAM usage to be a lot higher than it could be. Try installing flash-attention following the instructions here: https://github.com/Dao-AILab/flash-attention#installation-and-features

Does this Manual work ? on windows?

I'm running Windows and this is the manual I was using for installing Flash Attention 2, after having it complain about my Cuda version not matching my Pytorch Cuda version, I installed the correct one (11.7 for me) and uninstalled Cuda 12, I ran into a different error:

(C:\text-generation-webui\installer_files\env) C:\Users\no-one>pip install flash-attn --no-build-isolation
Collecting flash-attn
  Using cached flash_attn-2.3.0.tar.gz (2.3 MB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [22 lines of output]
      error: pathspec 'csrc/cutlass' did not match any file(s) known to git
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\no-one\AppData\Local\Temp\pip-install-lanm8n3l\flash-attn_18a22cfa604e4f58be0406b9a1517187\setup.py", line 115, in <module>
          _, bare_metal_version = get_cuda_bare_metal_version(CUDA_HOME)
        File "C:\Users\no-one\AppData\Local\Temp\pip-install-lanm8n3l\flash-attn_18a22cfa604e4f58be0406b9a1517187\setup.py", line 66, in get_cuda_bare_metal_version
          raw_output = subprocess.check_output([cuda_dir + "/bin/nvcc", "-V"], universal_newlines=True)
        File "C:\text-generation-webui\installer_files\env\lib\subprocess.py", line 421, in check_output
          return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
        File "C:\text-generation-webui\installer_files\env\lib\subprocess.py", line 503, in run
          with Popen(*popenargs, **kwargs) as process:
        File "C:\text-generation-webui\installer_files\env\lib\subprocess.py", line 971, in __init__
          self._execute_child(args, executable, preexec_fn, close_fds,
        File "C:\text-generation-webui\installer_files\env\lib\subprocess.py", line 1456, in _execute_child
          hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
      FileNotFoundError: [WinError 2] The system cannot find the file specified


      torch.__version__  = 2.0.1+cu117


      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

the alternative it gives of cloning the repo and running the setup file results in a very similar error

Submodule 'csrc/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'csrc/cutlass'
Cloning into 'C:/text-generation-webui/flash-attention/csrc/cutlass'...
Submodule path 'csrc/cutlass': checked out 'e0aaa3c3b38db9a89c31f04fef91e92123ad5e2e'


torch.__version__  = 2.0.1+cu117


Traceback (most recent call last):
  File "C:\text-generation-webui\flash-attention\setup.py", line 115, in <module>
    _, bare_metal_version = get_cuda_bare_metal_version(CUDA_HOME)
  File "C:\text-generation-webui\flash-attention\setup.py", line 66, in get_cuda_bare_metal_version
    raw_output = subprocess.check_output([cuda_dir + "/bin/nvcc", "-V"], universal_newlines=True)
  File "C:\text-generation-webui\installer_files\env\lib\subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "C:\text-generation-webui\installer_files\env\lib\subprocess.py", line 503, in run
    with Popen(*popenargs, **kwargs) as process:
  File "C:\text-generation-webui\installer_files\env\lib\subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\text-generation-webui\installer_files\env\lib\subprocess.py", line 1456, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

right now it seems it just isn't ready for windows

CamiloMM · 2023-10-02T17:47:56Z

Maybe Oobabooga should either suppress this message, or add a clarification, at least on Windows?

Nicoolodion2 · 2023-10-04T12:42:23Z

Yeah, it doesn't work for windows right now. Only for Linux (macOS) I think. We'll have to see who is earlier. Flash attention supporting windows or Oobabooga giving an statment.

Panchovix · 2023-10-09T02:37:57Z

I managed to build it on Windows. But, you will need CUDA 12.1 and torch+cu121, else it won't compile.

More info Dao-AILab/flash-attention#595

bdashore3 · 2023-10-09T06:48:08Z

See #4235

github-actions · 2023-11-20T23:16:44Z

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

trihardseven added the enhancement New feature or request label Sep 21, 2023

github-actions bot added the stale label Nov 20, 2023

github-actions bot closed this as completed Nov 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding flash attention to one click installer #4015

Adding flash attention to one click installer #4015

trihardseven commented Sep 21, 2023

maddog7667 commented Sep 22, 2023

Panchovix commented Sep 24, 2023

CamiloMM commented Sep 24, 2023

donQx commented Sep 30, 2023

redyandsalted commented Sep 30, 2023 •

edited

Loading

CamiloMM commented Oct 2, 2023

Nicoolodion2 commented Oct 4, 2023

Panchovix commented Oct 9, 2023

bdashore3 commented Oct 9, 2023

github-actions bot commented Nov 20, 2023

Adding flash attention to one click installer #4015

Adding flash attention to one click installer #4015

Comments

trihardseven commented Sep 21, 2023

maddog7667 commented Sep 22, 2023

Panchovix commented Sep 24, 2023

CamiloMM commented Sep 24, 2023

donQx commented Sep 30, 2023

redyandsalted commented Sep 30, 2023 • edited Loading

CamiloMM commented Oct 2, 2023

Nicoolodion2 commented Oct 4, 2023

Panchovix commented Oct 9, 2023

bdashore3 commented Oct 9, 2023

github-actions bot commented Nov 20, 2023

redyandsalted commented Sep 30, 2023 •

edited

Loading