Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python312Packages.triton*: 3.0.0 -> 3.1.0 #349159

Merged
merged 1 commit into from
Oct 17, 2024

Conversation

GaetanLepage
Copy link
Contributor

@GaetanLepage GaetanLepage commented Oct 16, 2024

Things done

Diff: triton-lang/triton@91f24d8...cf34004

Changelog: one day maybe

cc @SomeoneSerge

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.11 Release Notes (or backporting 23.11 and 24.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@DerDennisOP
Copy link
Contributor

lgtm

@GaetanLepage
Copy link
Contributor Author

It builds lol. First try X)

@SomeoneSerge
Copy link
Contributor

❯ nom build -I nixpkgs=https://github.com/GaetanLepage/nixpkgs/archive/openai-triton.tar.gz -f "<nixpkgs>" python3Packages.triton.gpuCheck
...
triton-pytest> FAILED language/test_line_info.py::test_line_info[dot_combine] - subprocess.CalledProcessError: Command '['/nix/store/97ba8v5ffpm7r7z8grpaqj...
triton-pytest> FAILED language/test_subprocess.py::test_print[device_print_large-int32] - assert False
triton-pytest> FAILED operators/test_flash_attention.py::test_op[True-True-dtype0-2-4-512-128] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Requi...
triton-pytest> FAILED operators/test_flash_attention.py::test_op[True-True-dtype1-2-4-512-128] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Requi...
triton-pytest> FAILED operators/test_flash_attention.py::test_op[True-False-dtype0-2-4-512-128] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Requi...
triton-pytest> FAILED operators/test_flash_attention.py::test_op[True-False-dtype1-2-4-512-128] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Requi...
triton-pytest> FAILED operators/test_flash_attention.py::test_op[False-True-dtype0-2-4-512-128] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Requi...
triton-pytest> FAILED operators/test_flash_attention.py::test_op[False-True-dtype1-2-4-512-128] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Requi...
triton-pytest> FAILED operators/test_flash_attention.py::test_op[False-False-dtype0-2-4-512-128] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Requi...
triton-pytest> FAILED operators/test_flash_attention.py::test_op[False-False-dtype1-2-4-512-128] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Requi...
triton-pytest> FAILED tools/test_aot.py::test_compile_link_matmul_no_specialization - subprocess.CalledProcessError: Command '['./test', '/build/tmpd7xmo9c2/a.cs...
triton-pytest> FAILED tools/test_aot.py::test_compile_link_matmul - subprocess.CalledProcessError: Command '['./test', '/build/tmph4nb393y/a.cs...
triton-pytest> FAILED tools/test_aot.py::test_launcher_has_no_available_kernel - AssertionError: assert -11 == -6
triton-pytest> FAILED tools/test_aot.py::test_compile_link_autotune_matmul - subprocess.CalledProcessError: Command '['./test_0', '/build/tmp0g352yw6/a....
triton-pytest> === 23 failed, 9134 passed, 2196 skipped, 181 warnings in 1224.10s (0:20:24) ===
error: builder for '/nix/store/gvcldv6vz16xkr2cpjrwp3a76isg00ph-triton-pytest-3.1.0.drv' failed with exit code 1;
       last 10 log lines:
       > FAILED operators/test_flash_attention.py::test_op[True-False-dtype1-2-4-512-128] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Requi...
       > FAILED operators/test_flash_attention.py::test_op[False-True-dtype0-2-4-512-128] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Requi...
       > FAILED operators/test_flash_attention.py::test_op[False-True-dtype1-2-4-512-128] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Requi...
       > FAILED operators/test_flash_attention.py::test_op[False-False-dtype0-2-4-512-128] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Requi...
       > FAILED operators/test_flash_attention.py::test_op[False-False-dtype1-2-4-512-128] - triton.runtime.errors.OutOfResources: out of resource: shared memory, Requi...
       > FAILED tools/test_aot.py::test_compile_link_matmul_no_specialization - subprocess.CalledProcessError: Command '['./test', '/build/tmpd7xmo9c2/a.cs...
       > FAILED tools/test_aot.py::test_compile_link_matmul - subprocess.CalledProcessError: Command '['./test', '/build/tmph4nb393y/a.cs...
       > FAILED tools/test_aot.py::test_launcher_has_no_available_kernel - AssertionError: assert -11 == -6
       > FAILED tools/test_aot.py::test_compile_link_autotune_matmul - subprocess.CalledProcessError: Command '['./test_0', '/build/tmp0g352yw6/a....
       > === 23 failed, 9134 passed, 2196 skipped, 181 warnings in 1224.10s (0:20:24) ===
       For full logs, run 'nix log /nix/store/gvcldv6vz16xkr2cpjrwp3a76isg00ph-triton-pytest-3.1.0.drv'.
┏━ 1 Errors: 
┃ error: builder for '/nix/store/gvcldv6vz16xkr2cpjrwp3a76isg00ph-triton-pytest-3.1.0.drv' failed with exit code 1;
┣━ Dependency Graph showing 1 of 2 roots:
┃          ┌─ ✔ triton-llvm-19.1.0-rc1 ⏱ 55m57s
┃       ┌─ ✔ python3.12-triton-3.1.0 ⏱ 5m0s
┃       ├─ ✔ magma-2.7.2 ⏱ 1h10m27s
┃    ┌─ ✔ python3.12-torch-2.4.1 ⏱ 1h30m28s
┃ ┌─ ✔ python3-3.12.6-env ⏱ 2s
┃ ├─ ⏸ source
┃ ⚠ triton-pytest-3.1.0 failed with exit code 1 after ⏱ 20m27s in checkPhase

No regression compares to the previous PR

@GaetanLepage
Copy link
Contributor Author

No regression compares to the previous PR

Is it sarcastic or was this test really already broken ?

@SomeoneSerge
Copy link
Contributor

Is it sarcastic or was this test really already broken ?

An understandable confusion! Yes it was broken, which I just took as a given since we've never run the pytest suite before ("pytorch is our test"):

... #328247 (comment)
I currently observe about 20 tests failing and spitting out junk: https://gist.github.com/SomeoneSerge/f9bd9ececc3a16438bd087edadf0fef4

The "OutOfResource" parts can be probably put in disabledTests, but iirc a few tests spat out outright gibberish

@DerDennisOP
Copy link
Contributor

DerDennisOP commented Oct 17, 2024

Is it sarcastic or was this test really already broken ?

An understandable confusion! Yes it was broken, which I just took as a given since we've never run the pytest suite before ("pytorch is our test"):

... #328247 (comment)
I currently observe about 20 tests failing and spitting out junk: https://gist.github.com/SomeoneSerge/f9bd9ececc3a16438bd087edadf0fef4

The "OutOfResource" parts can be probably put in disabledTests, but iirc a few tests spat out outright gibberish

The "OutOfResource" Tests work for me, I guess you need more resources. If I build it on 64C/128T CPU with a GPU piped into the nix builder it works.

@SomeoneSerge
Copy link
Contributor

The "OutOfResource" Tests work for me, I guess you need more resources. If I build it on 64C/128T CPU with a GPU piped into the nix builder it works.

🥲 24G VRAM too little, rtx 3090 too old

May I ask what your GPU is?

@DerDennisOP
Copy link
Contributor

The "OutOfResource" Tests work for me, I guess you need more resources. If I build it on 64C/128T CPU with a GPU piped into the nix builder it works.

🥲 24G VRAM too little, rtx 3090 too old

May I ask what your GPU is?

Specs:
CPU: AMD Epyc 7702P (64C/128T)
RAM: 512 GB (ECC)
STORAGE: 38 TB
GPU: 1x H100
NETWORKING: 600 Mbit/s

@GaetanLepage
Copy link
Contributor Author

Specs:
CPU: AMD Epyc 7702P (64C/128T)
RAM: 512 GB (ECC)
STORAGE: 38 TB
GPU: 1x H100
NETWORKING: 600 Mbit/s

Not bad X)

@SomeoneSerge
Copy link
Contributor

GPU: 1x H100

Maybe we should consider marking tests with a grading of "system features", e.g. based on vram size (e.g. "cuda-80gb"). We're not doing that yet because there would be many orthogonal dimensions, like cuda capabilities. An alternative is to hard-code GPU names for derivations where it matters

@DerDennisOP
Copy link
Contributor

GPU: 1x H100

Maybe we should consider marking tests with a grading of "system features", e.g. based on vram size (e.g. "cuda-80gb"). We're not doing that yet because there would be many orthogonal dimensions, like cuda capabilities. An alternative is to hard-code GPU names for derivations where it matters

in my nix-ai project, I'm using hard-coded GPU names. Its not optimal but the best solution I could find.

@GaetanLepage GaetanLepage merged commit baaa9d5 into NixOS:master Oct 17, 2024
39 of 40 checks passed
@GaetanLepage GaetanLepage deleted the openai-triton branch October 17, 2024 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants