Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{lib}[foss/2023a] flash-attention v2.6.3 w/ CUDA 12.1.1 #21083

Conversation

pavelToman
Copy link
Collaborator

@pavelToman pavelToman commented Jul 29, 2024

@pavelToman
Copy link
Collaborator Author

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@pavelToman: Request for testing this PR well received on login1

PR test command 'EB_PR=21083 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_21083 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 13964

Test results coming soon (I hope)...

- notification for comment with ID 2255560031 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
cns1 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/834f9cd375bd5a75d4ff5ea0fa6a07e2 for a full test report.

@pavelToman
Copy link
Collaborator Author

@boegelbot please test @ jsc-zen3

@boegelbot
Copy link
Collaborator

@pavelToman: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=21083 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_21083 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 4599

Test results coming soon (I hope)...

- notification for comment with ID 2255590569 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.18
See https://gist.github.com/boegelbot/006048dc175b4519dbd2ebceb088c6d0 for a full test report.

@verdurin
Copy link
Member

Test report by @verdurin
FAILED
Build succeeded for 27 out of 28 (1 easyconfigs in total)
easybuild-el8.cloud.in.bmrc.ox.ac.uk - Linux Rocky Linux 8.10, x86_64, Intel Xeon Processor (Skylake, IBRS), Python 3.6.8
See https://gist.github.com/verdurin/2fa5466554ae2a4e7b19e92c4b00aac2 for a full test report.

@verdurin
Copy link
Member

Processing /dev/shm/flashattention/2.6.3/foss-2023a-CUDA-12.1.1/flashattention/flash-attention-2.6.3
  Preparing metadata (setup.py): started
  Running command python setup.py egg_info
  No CUDA runtime is found, using CUDA_HOME='/apps/eb/el8/upstream/software/CUDA/12.1.1'
  fatal: not a git repository (or any parent up to mount point /dev)
  Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
  /apps/eb/el8/upstream/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/site-packages/setuptools/__init__.py:84: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
  !!

          ********************************************************************************
          Requirements should be satisfied by a PEP 517 installer.
          If you are using pip, you can try `pip install --use-pep517`.
          ********************************************************************************

  !!
    dist.fetch_build_eggs(dist.setup_requires)


  torch.__version__  = 2.1.2


  running egg_info
  creating /tmp/eb-lsqh835v/pip-pip-egg-info-cal0mavz/flash_attn.egg-info
  writing /tmp/eb-lsqh835v/pip-pip-egg-info-cal0mavz/flash_attn.egg-info/PKG-INFO
  writing dependency_links to /tmp/eb-lsqh835v/pip-pip-egg-info-cal0mavz/flash_attn.egg-info/dependency_links.txt
  writing requirements to /tmp/eb-lsqh835v/pip-pip-egg-info-cal0mavz/flash_attn.egg-info/requires.txt
  writing top-level names to /tmp/eb-lsqh835v/pip-pip-egg-info-cal0mavz/flash_attn.egg-info/top_level.txt
  writing manifest file '/tmp/eb-lsqh835v/pip-pip-egg-info-cal0mavz/flash_attn.egg-info/SOURCES.txt'
  listing git files failed - pretending there aren't any
  reading manifest file '/tmp/eb-lsqh835v/pip-pip-egg-info-cal0mavz/flash_attn.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  warning: no files found matching '*.cu' under directory 'flash_attn'
  warning: no files found matching '*.h' under directory 'flash_attn'
  warning: no files found matching '*.cuh' under directory 'flash_attn'
  warning: no files found matching '*.cpp' under directory 'flash_attn'
  warning: no files found matching '*.hpp' under directory 'flash_attn'
  adding license file 'LICENSE'
  adding license file 'AUTHORS'
  writing manifest file '/tmp/eb-lsqh835v/pip-pip-egg-info-cal0mavz/flash_attn.egg-info/SOURCES.txt'
  Preparing metadata (setup.py): finished with status 'done'

@verdurin
Copy link
Member

Building wheels for collected packages: flash-attn
  Building wheel for flash-attn (setup.py): started
  Running command python setup.py bdist_wheel
  No CUDA runtime is found, using CUDA_HOME='/apps/eb/el8/upstream/software/CUDA/12.1.1'
  fatal: not a git repository (or any parent up to mount point /dev)
  Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).


  torch.__version__  = 2.1.2


  /apps/eb/el8/upstream/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/site-packages/setuptools/__init__.py:84: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
  !!

          ********************************************************************************
          Requirements should be satisfied by a PEP 517 installer.
          If you are using pip, you can try `pip install --use-pep517`.
          ********************************************************************************

  !!
    dist.fetch_build_eggs(dist.setup_requires)
  running bdist_wheel
  Guessing wheel URL:  https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu123torch2.1cxx11abiTRUE-cp311-cp311-linux_x86_64.whl
  Raw wheel path /tmp/eb-lsqh835v/pip-wheel-teg_ahm9/flash_attn-2.6.3-cp311-cp311-linux_x86_64.whl
  error: [Errno 18] Invalid cross-device link: 'flash_attn-2.6.3+cu123torch2.1cxx11abiTRUE-cp311-cp311-linux_x86_64.whl' -> '/tmp/eb-lsqh835v/pip-wheel-teg_ahm9/flash_attn-2.6.3-cp311-cp311-linux_x86_64.whl'
  error: subprocess-exited-with-error

   python setup.py bdist_wheel did not run successfully.
   exit code: 1
  > See above for output.

@boegel
Copy link
Member

boegel commented Aug 2, 2024

@verdurin error: [Errno 18] Invalid cross-device link => that looks weird to me, can you provide some more info on your system setup here, in particular which filesystem you're using for EasyBuild build directories?

@pavelToman
Copy link
Collaborator Author

@boegelbot please test @ jsc-zen3-a100

@boegelbot
Copy link
Collaborator

@pavelToman: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=21083 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_21083 --ntasks=8 --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 4621

Test results coming soon (I hope)...

- notification for comment with ID 2265162853 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@laraPPr
Copy link
Contributor

laraPPr commented Aug 7, 2024

@boegel the testresults of job 4621 have still not come in and it was submitted 5 days ago is this normal?

@fizwit
Copy link
Contributor

fizwit commented Aug 13, 2024

I get the same build errors that @verdurin had reported. Adding --use-pep517 to build command does not help.
I have also tested with older version of flash-attn with the same errors.

Building wheels for collected packages: flash-attn
  Building wheel for flash-attn (setup.py): started
  Building wheel for flash-attn (setup.py): finished with status 'error'
  error: subprocess-exited-with-error

   python setup.py bdist_wheel did not run successfully.
   exit code: 1
  > [21 lines of output]
      fatal: not a git repository (or any parent up to mount point /)
      Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).


      torch.__version__  = 2.1.2


      /app/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/site-packages/setuptools/__init__.py:84: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
      !!

              ********************************************************************************
              Requirements should be satisfied by a PEP 517 installer.
              If you are using pip, you can try `pip install --use-pep517`.
              ********************************************************************************

      !!
        dist.fetch_build_eggs(dist.setup_requires)
      running bdist_wheel
      Guessing wheel URL:  https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu123torch2.1cxx11abiTRUE-cp311-cp311-linux_x86_64.whl
      Raw wheel path /tmp/eb-5w60321o/pip-wheel-4tip0_4n/flash_attn-2.6.3-cp311-cp311-linux_x86_64.whl
      error: [Errno 18] Invalid cross-device link: 'flash_attn-2.6.3+cu123torch2.1cxx11abiTRUE-cp311-cp311-linux_x86_64.whl' -> '/tmp/eb-5w60321o/pip-wheel-4tip0_4n/flash_attn-2.6.3-cp311-cp311-linux_x86_64.whl'

@pavelToman
Copy link
Collaborator Author

pavelToman commented Aug 14, 2024

seems others have the same problem: Dao-AILab/flash-attention#875
Gonna try replace os.rename by shutil.move in setup.py

@pavelToman
Copy link
Collaborator Author

@fizwit @verdurin could you please test it again? Invalid cross-device link error should be fixed

@pavelToman
Copy link
Collaborator Author

@boegelbot please test @ jsc-zen3

@boegelbot
Copy link
Collaborator

@pavelToman: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=21083 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_21083 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 4674

Test results coming soon (I hope)...

- notification for comment with ID 2288235486 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.18
See https://gist.github.com/boegelbot/26e117fe1bc2a7eba768deb05db94a21 for a full test report.

@fizwit
Copy link
Contributor

fizwit commented Aug 14, 2024

@pavelToman re-tested with your excellent patch and got much further. This fatal error happened, then I added CUTLASS-3.4.0-foss-2023a-CUDA-12.1.1.eb as a dep, and it builds! See PR #21184

 g++ -MMD -MF /build/flashattention/2.6.3/foss-2023a-CUDA-12.1.1/flashattention/flash-attention-2.6.3/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/flash_api.o.d -DNDEBUG -g -fwrapv -O3 -Wall -O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -O2 -ftree-vectorize -march=native -fno-math-errno -I/app/software/FFTW/3.3.10-GCC-12.3.0/include -I/app/software/FlexiBLAS/3.3.1-GCC-12.3.0/include -I/app/software/FlexiBLAS/3.3.1-GCC-12.3.0/include/flexiblas -fPIC -I/build/flashattention/2.6.3/foss-2023a-CUDA-12.1.1/flashattention/flash-attention-2.6.3/csrc/flash_attn -I/build/flashattention/2.6.3/foss-2023a-CUDA-12.1.1/flashattention/flash-attention-2.6.3/csrc/flash_attn/src -I/build/flashattention/2.6.3/foss-2023a-CUDA-12.1.1/flashattention/flash-attention-2.6.3/csrc/cutlass/include -I/app/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include -I/app/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/app/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/TH -I/app/software/PyTorch/2.1.2-foss-2023a-CUDA-12.1.1/lib/python3.11/site-packages/torch/include/THC -I/app/software/CUDA/12.1.1/include -I/app/software/Python/3.11.3-GCCcore-12.3.0/include/python3.11 -c -c /build/flashattention/2.6.3/foss-2023a-CUDA-12.1.1/flashattention/flash-attention-2.6.3/csrc/flash_attn/flash_api.cpp -o /build/flashattention/2.6.3/foss-2023a-CUDA-12.1.1/flashattention/flash-attention-2.6.3/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/flash_api.o -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1017"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=1
      /build/flashattention/2.6.3/foss-2023a-CUDA-12.1.1/flashattention/flash-attention-2.6.3/csrc/flash_attn/flash_api.cpp:11:10: fatal error: cutlass/numeric_types.h: No such file or directory
         11 | #include <cutlass/numeric_types.h>
            |          ^~~~~~~~~~~~~~~~~~~~~~~~~
      compilation terminated.
      ninja: build stopped: subcommand failed.

@boegel boegel added the new label Aug 20, 2024
@boegel boegel added this to the 4.x milestone Aug 20, 2024
Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel
Copy link
Member

boegel commented Aug 20, 2024

Test report by @boegel
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
node3304.joltik.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz, 1 x NVIDIA Tesla V100-SXM2-32GB, 545.23.08, Python 3.6.8
See https://gist.github.com/boegel/f0b637f363a0d92541468076edda9c03 for a full test report.

@boegel
Copy link
Member

boegel commented Aug 20, 2024

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on login1

PR test command 'EB_PR=21083 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_21083 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 14099

Test results coming soon (I hope)...

- notification for comment with ID 2299214807 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
cns1 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/eee35dded0faaf80bd54c2c8d94e958c for a full test report.

@boegel
Copy link
Member

boegel commented Aug 20, 2024

@boegelbot please test @ jsc-zen3

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=21083 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_21083 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 4707

Test results coming soon (I hope)...

- notification for comment with ID 2299283599 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
jsczen3c2.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.18
See https://gist.github.com/boegelbot/84c9f4f4f69cd77a6456367c19497a07 for a full test report.

@boegel
Copy link
Member

boegel commented Aug 20, 2024

Going in, thanks @pavelToman!

@boegel boegel merged commit 12e76eb into easybuilders:develop Aug 20, 2024
9 checks passed
@boegel boegel modified the milestones: 4.x, release after 4.9.2 Aug 20, 2024
@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 555.42.06, Python 3.9.18
See https://gist.github.com/boegelbot/a6564f6cac6d7348a4792f71c0949b67 for a full test report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

flash-attention
6 participants