Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[do not merge] sccache test #170

Closed
wants to merge 77 commits into from
Closed

[do not merge] sccache test #170

wants to merge 77 commits into from

Conversation

joerunde
Copy link

testing if this hits public vllm cache, based on top of #169

z103cb and others added 30 commits September 13, 2024 11:27
fixes RHOAIENG-8043

Co-authored-by: Chih-Chieh-Yang <chih.chieh.yang@ibm.com>
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
- get rid cuda-devel stage, use cuda 12.4
- add build flags
- remove useless installs
add libsodium for tensorizer encryption

---------

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Daniele <36171005+dtrifiro@users.noreply.github.com>
this is the default when `--worker-use-ray` is not provided and
world-size > 1
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
youkaichao and others added 21 commits September 16, 2024 12:43
- get rid of non-essential dependencies
- consolidate package installs
- do not copy wheels in final stage
- fix ccache usage
- use flashattention with triton backend by default:
    - clone main_perf branch
    - build rocm target
    - set up triton rocm env var
- configure numba, outlines and triton cache directory
this is a torch dependency when installed from the pytorch/rocm6.1
index: https://download.pytorch.org/whl/nightly/rocm6.1
Dockerfile.ubi.rocm: fix build
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Copy link

openshift-ci bot commented Sep 26, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: joerunde
Once this PR has been reviewed and has the lgtm label, please assign heyselbi for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Copy link

openshift-ci bot commented Sep 26, 2024

@joerunde: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/smoke-test 8abfa38 link true /test smoke-test

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@dtrifiro
Copy link

@joerunde
No luck:

2024-09-26T22:54:23.111307509Z Compile requests                      133
2024-09-26T22:54:23.111307509Z Compile requests executed             133
2024-09-26T22:54:23.111307509Z Cache hits                              0
2024-09-26T22:54:23.111307509Z Cache misses                          133
2024-09-26T22:54:23.111307509Z Cache misses (C/C++)                    4
2024-09-26T22:54:23.111307509Z Cache misses (CUDA)                   129
2024-09-26T22:54:23.111307509Z Cache timeouts                          0
2024-09-26T22:54:23.111307509Z Cache read errors                       0
2024-09-26T22:54:23.111307509Z Forced recaches                         0
2024-09-26T22:54:23.111307509Z Cache write errors                      0
2024-09-26T22:54:23.111307509Z Compilation failures                    0
2024-09-26T22:54:23.111307509Z Cache errors                            0
2024-09-26T22:54:23.111307509Z Non-cacheable compilations              0
2024-09-26T22:54:23.111307509Z Non-cacheable calls                     0
2024-09-26T22:54:23.111307509Z Non-compilation calls                   0
2024-09-26T22:54:23.111307509Z Unsupported compiler calls              0
2024-09-26T22:54:23.111307509Z Average cache write                 0.001 s
2024-09-26T22:54:23.111307509Z Average compiler                  145.212 s
2024-09-26T22:54:23.111307509Z Average cache read hit              0.000 s
2024-09-26T22:54:23.111307509Z Failed distributed compilations         0
2024-09-26T22:54:23.111307509Z Cache location                  Local disk: "/root/.cache/sccache"
2024-09-26T22:54:23.111307509Z Use direct/preprocessor mode?   yes
2024-09-26T22:54:23.111307509Z Version (client)                0.8.1
2024-09-26T22:54:23.111307509Z Cache size                            172 MiB
2024-09-26T22:54:23.111307509Z Max cache size                         10 GiB

@joerunde joerunde closed this Sep 27, 2024
Xaenalt pushed a commit that referenced this pull request Oct 14, 2024
This PR enables LoRA support in HPU.

* Implemented custom BGMV for LoRA modules using index-select operator.
* Support for both single and multi card scenarios has been tested

---------

Co-authored-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com>
Co-authored-by: Himangshu Lahkar <hlahkar@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.