Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom CUDA extensions #135

Merged
merged 83 commits into from
Apr 26, 2024
Merged

Custom CUDA extensions #135

merged 83 commits into from
Apr 26, 2024

Conversation

msaroufim
Copy link
Member

@msaroufim msaroufim commented Apr 15, 2024

This is the mergaeble version of #130 - some updates I have to make

  • Add a skip test unless pytorch 2.4+ is used and Add a skip test if cuda is not available
  • Add ninja to dev dependencies
  • Locally had to ensure conda install nvidia/label/cuda-12.1.0::cuda-toolkit to get things working
  • Added continue-on-error: true to CI otherwise was difficult to debug
  • Add a new pyproject.toml for build time dependencies
  • Do everything from within a docker container and then everything within a conda env
  • Add a pyproject.toml with torch as a build time dependency to fix CI issue (even though this was working locally) or do python setup.py install instead
  • Add verbose pip logging so we can tell that the cuda extension was not built
  • Move kernels to a test namespace so we can keep testing in CI
  • Using both a custom docker image and conda environment is more than doubling the total CI runtime, not great

OK for manylinux I managed to repro the missing torch error locally (FINALLY) using this

CIBW_BUILD=cp310-macosx_* python -m cibuildwheel --output-dir wheelhouse

Follow up PRs

  • Setup manylinux in github action CI: should i use cibuildwheel? in which case how do I configure CUDA? If I instead use the pytorch images, how do I get setup?
  • Binary tests, how do i ensure that the cuda kernels we built actually work
  • Add an actually useful kernel as a first example and remove the rms norm kernel like paged attention

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 15, 2024
@msaroufim msaroufim changed the title Custom CUDA extensionsCo-authored-by: Mark Saroufim <marksaroufim@meta.com> Custom CUDA extensions Apr 15, 2024
@msaroufim msaroufim changed the title Custom CUDA extensions WIP Custom CUDA extensions Apr 15, 2024
@msaroufim msaroufim changed the title WIP Custom CUDA extensions Custom CUDA extensions Apr 15, 2024
@msaroufim msaroufim marked this pull request as ready for review April 25, 2024 23:00
@msaroufim msaroufim merged commit 6122f8e into main Apr 26, 2024
13 checks passed
@msaroufim msaroufim deleted the msaroufim/manylinux branch April 26, 2024 02:11
@@ -62,5 +62,5 @@ jobs:
pip install ${{ matrix.torch-spec }}
pip install -r requirements.txt
pip install -r dev-requirements.txt
pip install .
python setup.py install
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to update README? for building from source, is it python setup.py developer now? I spent some time to understand why pip install -e . does not work anymore and find this PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argh, lemme update now

@liangan1
Copy link

liangan1 commented May 7, 2024

@msaroufim whether CPU kernel extension works with this PR?

@msaroufim
Copy link
Member Author

Yup cpu extensions work perfectly fine

dbyoung18 pushed a commit to dbyoung18/ao that referenced this pull request Jul 31, 2024
* Custom CUDA extensionsCo-authored-by: Mark Saroufim <marksaroufim@meta.com>

* ci fix

* add continue on ci fail

* add verbose pip logging

* update

* remove pip upgrade

* remove torch from requirements

* add python -m

* readd upgrade

* make everything python pip install

* typo

* bla

* add torch again to dependencies

* upd

* udpate

* one more

* bla

* uipda

* bla

* bla

* add venv for pytest

* skip test if no cuda

* push

* fix test

* fix test

* fix skip tests

* reuse pip install .

* add more debugging tools

* yolo

* revert yolo change

* fix

* remove custom install

* print torch version

* print torch version

* teest'

* try pip3?

* remove pip3

* reshuffling torch install

* no deps?

* revert nodeps

* ad

* bla

* bla

* lazy

* use conda and cuda toolkit

* push

* update

* 1 more try

* push

* bla

* bla

* conda init

* clowntown

* this works locally

* update

* remove cache

* push

* alternate cuda install command

* bla

* bla

* update

* push

* bla

* bla

* bla

* bla

* bla

* bla

* yolo

* yolo

* yolo

* yolo

* yolo

* yolo

* yolo

* yolo

---------

Co-authored-by: Richard Zou <rzou@meta.com>
yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants