-
Notifications
You must be signed in to change notification settings - Fork 10
Conversation
Co-authored-by: Alexey Kondratiev <alexey.kondratiev@amd.com>
Allow dummy load format for fp8, torch.uniform_ doesn't support FP8 at the moment Co-authored-by: Mor Zusman <morz@ai21.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Pass the CUDA stream into the CUTLASS GEMMs, to avoid future issues with CUDA graphs
…ct#4893) The 2nd PR for vllm-project#4532. This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
…project#4985) Co-authored-by: Elisei Smirnov <el.smirnov@innopolis.university>
@@ -147,7 +147,7 @@ def __init__( | |||
self, | |||
model_name: str, | |||
dtype: str = "half", | |||
access_token: Optional[str] = None, | |||
**kwargs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- access token not needed (we set
HF_TOKEN
in automation) - kwargs enables us to pass whatever we want for hf runner
@@ -472,21 +472,21 @@ def _decode_token_by_position_index( | |||
|
|||
def generate_greedy_logprobs_nm_use_tokens( | |||
self, | |||
prompts: List[str], | |||
input_ids_lst: List[torch.Tensor], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- previously, we were passing a prompt, formatted with a chat template (which appends bos token)
- then, we tokenize here which appends another bos token
This change prevents that double permanently, by forcing the test to tokenize everything fully
…/nm-vllm into remote-push-refactor
@@ -37,7 +37,7 @@ jobs: | |||
test_label_solo: gcp-k8s-l4-solo | |||
test_label_multi: ignore | |||
test_timeout: 480 | |||
test_skip_list: neuralmagic/tests/skip-for-remote-push-tmp.txt | |||
test_skip_env_vars: neuralmagic/tests/test_skip_env_vars/smoke.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For remote-push, maybe we don't need to run all 4 python versions, and only 38/311 is good enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, let's get to that after this is merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd prefer not having a file as input. instead, it'd be more flexible to just have a string listing the ENV with setting. something like,
TEST_ACCURACY=DISABLE,TEST_CORE=ENABLE
@@ -37,7 +37,7 @@ jobs: | |||
test_label_solo: gcp-k8s-l4-solo | |||
test_label_multi: ignore | |||
test_timeout: 480 | |||
test_skip_list: neuralmagic/tests/skip-for-remote-push-tmp.txt | |||
test_skip_env_vars: neuralmagic/tests/test_skip_env_vars/smoke.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, let's get to that after this is merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! we can adjust calling parameters and style later.
SUMMARY: * updated model test structure to focus on core models * refactored tests to use environment variables (currently at "test group" level - so each folder has an env variable). All tests are off by default and they are explicitly enabled * refactored workflows build-test workflow to use a list of env variables rather than skip test list WHY: * this enables us to be more sane about what is and is not on - as opposed to a long list of files * this enables us to actually track what is run and what is not run (via testmo, which tracks skipped tests) * this enables us to have more fine-grained control over what is run vs not run (we can add more env vars at the sub-group level to turn off more tests) --------- Signed-off-by: kerthcet <kerthcet@gmail.com> Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Signed-off-by: pandyamarut <pandyamarut@gmail.com> Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Wenwei Zhang <40779233+ZwwWayne@users.noreply.github.com> Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com> Co-authored-by: Alexey Kondratiev <alexey.kondratiev@amd.com> Co-authored-by: Mor Zusman <mor.zusmann@gmail.com> Co-authored-by: Mor Zusman <morz@ai21.com> Co-authored-by: Aurick Qiao <aurickq@users.noreply.github.com> Co-authored-by: Kuntai Du <kuntai@uchicago.edu> Co-authored-by: Antoni Baum <antoni.baum@protonmail.com> Co-authored-by: HUANG Fei <hzhwcmhf@gmail.com> Co-authored-by: Isotr0py <41363108+Isotr0py@users.noreply.github.com> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: Kante Yin <kerthcet@gmail.com> Co-authored-by: sasha0552 <admin@sasha0552.org> Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Co-authored-by: raywanb <112235519+raywanb@users.noreply.github.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Philipp Moritz <pcmoritz@gmail.com> Co-authored-by: Letian Li <lotianmail@gmail.com> Co-authored-by: Murali Andoorveedu <37849411+andoorve@users.noreply.github.com> Co-authored-by: Dipika Sikka <ds3822@columbia.edu> Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Elisei Smirnov <61423871+kezouke@users.noreply.github.com> Co-authored-by: Elisei Smirnov <el.smirnov@innopolis.university> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: leiwen83 <leiwen83@users.noreply.github.com> Co-authored-by: Lei Wen <wenlei03@qiyi.com> Co-authored-by: Eric Xihui Lin <xihuil.silence@gmail.com> Co-authored-by: beagleski <yunanzhang@microsoft.com> Co-authored-by: bapatra <bapatra@microsoft.com> Co-authored-by: Barun Patra <codedecde@users.noreply.github.com> Co-authored-by: Lily Liu <lilyliupku@gmail.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Michał Moskal <michal@moskal.me> Co-authored-by: Ruth Evans <ruthevans@Ruths-MacBook-Pro.local> Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: Junichi Sato <junichi.sato@sbintuitions.co.jp> Co-authored-by: Marut Pandya <pandyamarut@gmail.com> Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com> Co-authored-by: Ronen Schaffer <ronen.schaffer@ibm.com> Co-authored-by: Itay Etelis <92247226+Etelis@users.noreply.github.com> Co-authored-by: omkar kakarparthi <75638701+okakarpa@users.noreply.github.com> Co-authored-by: Alexei V. Ivanov <alexei.ivanov@amd.com> Co-authored-by: Breno Faria <breno@veltefaria.de> Co-authored-by: Breno Faria <breno.faria@intrafind.com> Co-authored-by: Hyunsung Lee <ita9naiwa@gmail.com> Co-authored-by: Chansung Park <deep.diver.csp@gmail.com> Co-authored-by: SnowDist <quxingwei25@gmail.com> Co-authored-by: functionxu123 <1229853312@qq.com> Co-authored-by: xuhao <xuhao@cambricon.com> Co-authored-by: Domenic Barbuzzi <domenic@neuralmagic.com>
SUMMARY:
WHY: