Remote push refactor #297

robertgshaw2-redhat · 2024-06-11T00:55:09Z

SUMMARY:

updated model test structure to focus on core models
refactored tests to use environment variables (currently at "test group" level - so each folder has an env variable). All tests are off by default and they are explicitly enabled
refactored workflows build-test workflow to use a list of env variables rather than skip test list

WHY:

this enables us to be more sane about what is and is not on - as opposed to a long list of files
this enables us to actually track what is run and what is not run (via testmo, which tracks skipped tests)
this enables us to have more fine-grained control over what is run vs not run (we can add more env vars at the sub-group level to turn off more tests)

Co-authored-by: Alexey Kondratiev <alexey.kondratiev@amd.com>

Allow dummy load format for fp8, torch.uniform_ doesn't support FP8 at the moment Co-authored-by: Mor Zusman <morz@ai21.com>

…project#4920)

Signed-off-by: kerthcet <kerthcet@gmail.com>

…llm-project#4944)

…llm-project#4722)

…#4977)

Pass the CUDA stream into the CUTLASS GEMMs, to avoid future issues with CUDA graphs

…ct#4893) The 2nd PR for vllm-project#4532. This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).

…llm-project#4894)

…Config (vllm-project#4991)

…e) (vllm-project#4983)

…ot defined (vllm-project#5009)

Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>

Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>

…project#4985) Co-authored-by: Elisei Smirnov <el.smirnov@innopolis.university>

robertgshaw2-redhat · 2024-06-11T00:59:59Z

tests/conftest.py

@@ -147,7 +147,7 @@ def __init__(
        self,
        model_name: str,
        dtype: str = "half",
-        access_token: Optional[str] = None,
+        **kwargs,


access token not needed (we set HF_TOKEN in automation)

kwargs enables us to pass whatever we want for hf runner

robertgshaw2-redhat · 2024-06-11T01:01:31Z

tests/conftest.py

@@ -472,21 +472,21 @@ def _decode_token_by_position_index(

    def generate_greedy_logprobs_nm_use_tokens(
        self,
-        prompts: List[str],
+        input_ids_lst: List[torch.Tensor],


previously, we were passing a prompt, formatted with a chat template (which appends bos token)

then, we tokenize here which appends another bos token

This change prevents that double permanently, by forcing the test to tokenize everything fully

…/nm-vllm into remote-push-refactor

dhuangnm · 2024-06-11T02:18:53Z

.github/workflows/nm-remote-push.yml

@@ -37,7 +37,7 @@ jobs:
            test_label_solo: gcp-k8s-l4-solo
            test_label_multi: ignore
            test_timeout: 480
-            test_skip_list: neuralmagic/tests/skip-for-remote-push-tmp.txt
+            test_skip_env_vars: neuralmagic/tests/test_skip_env_vars/smoke.txt


For remote-push, maybe we don't need to run all 4 python versions, and only 38/311 is good enough?

yeah, let's get to that after this is merged

tests/nm_utils/utils_skip.py

andy-neuma

i'd prefer not having a file as input. instead, it'd be more flexible to just have a string listing the ENV with setting. something like,

TEST_ACCURACY=DISABLE,TEST_CORE=ENABLE

andy-neuma · 2024-06-13T19:32:13Z

.github/workflows/nm-remote-push.yml

@@ -37,7 +37,7 @@ jobs:
            test_label_solo: gcp-k8s-l4-solo
            test_label_multi: ignore
            test_timeout: 480
-            test_skip_list: neuralmagic/tests/skip-for-remote-push-tmp.txt
+            test_skip_env_vars: neuralmagic/tests/test_skip_env_vars/smoke.txt


yeah, let's get to that after this is merged

andy-neuma

thanks! we can adjust calling parameters and style later.

SUMMARY: * updated model test structure to focus on core models * refactored tests to use environment variables (currently at "test group" level - so each folder has an env variable). All tests are off by default and they are explicitly enabled * refactored workflows build-test workflow to use a list of env variables rather than skip test list WHY: * this enables us to be more sane about what is and is not on - as opposed to a long list of files * this enables us to actually track what is run and what is not run (via testmo, which tracks skipped tests) * this enables us to have more fine-grained control over what is run vs not run (we can add more env vars at the sub-group level to turn off more tests) --------- Signed-off-by: kerthcet <kerthcet@gmail.com> Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Signed-off-by: pandyamarut <pandyamarut@gmail.com> Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Wenwei Zhang <40779233+ZwwWayne@users.noreply.github.com> Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com> Co-authored-by: Alexey Kondratiev <alexey.kondratiev@amd.com> Co-authored-by: Mor Zusman <mor.zusmann@gmail.com> Co-authored-by: Mor Zusman <morz@ai21.com> Co-authored-by: Aurick Qiao <aurickq@users.noreply.github.com> Co-authored-by: Kuntai Du <kuntai@uchicago.edu> Co-authored-by: Antoni Baum <antoni.baum@protonmail.com> Co-authored-by: HUANG Fei <hzhwcmhf@gmail.com> Co-authored-by: Isotr0py <41363108+Isotr0py@users.noreply.github.com> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: Kante Yin <kerthcet@gmail.com> Co-authored-by: sasha0552 <admin@sasha0552.org> Co-authored-by: SangBin Cho <rkooo567@gmail.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Co-authored-by: raywanb <112235519+raywanb@users.noreply.github.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Philipp Moritz <pcmoritz@gmail.com> Co-authored-by: Letian Li <lotianmail@gmail.com> Co-authored-by: Murali Andoorveedu <37849411+andoorve@users.noreply.github.com> Co-authored-by: Dipika Sikka <ds3822@columbia.edu> Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Elisei Smirnov <61423871+kezouke@users.noreply.github.com> Co-authored-by: Elisei Smirnov <el.smirnov@innopolis.university> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: leiwen83 <leiwen83@users.noreply.github.com> Co-authored-by: Lei Wen <wenlei03@qiyi.com> Co-authored-by: Eric Xihui Lin <xihuil.silence@gmail.com> Co-authored-by: beagleski <yunanzhang@microsoft.com> Co-authored-by: bapatra <bapatra@microsoft.com> Co-authored-by: Barun Patra <codedecde@users.noreply.github.com> Co-authored-by: Lily Liu <lilyliupku@gmail.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Michał Moskal <michal@moskal.me> Co-authored-by: Ruth Evans <ruthevans@Ruths-MacBook-Pro.local> Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: Junichi Sato <junichi.sato@sbintuitions.co.jp> Co-authored-by: Marut Pandya <pandyamarut@gmail.com> Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com> Co-authored-by: Ronen Schaffer <ronen.schaffer@ibm.com> Co-authored-by: Itay Etelis <92247226+Etelis@users.noreply.github.com> Co-authored-by: omkar kakarparthi <75638701+okakarpa@users.noreply.github.com> Co-authored-by: Alexei V. Ivanov <alexei.ivanov@amd.com> Co-authored-by: Breno Faria <breno@veltefaria.de> Co-authored-by: Breno Faria <breno.faria@intrafind.com> Co-authored-by: Hyunsung Lee <ita9naiwa@gmail.com> Co-authored-by: Chansung Park <deep.diver.csp@gmail.com> Co-authored-by: SnowDist <quxingwei25@gmail.com> Co-authored-by: functionxu123 <1229853312@qq.com> Co-authored-by: xuhao <xuhao@cambricon.com> Co-authored-by: Domenic Barbuzzi <domenic@neuralmagic.com>

alexm-redhat and others added 30 commits June 8, 2024 16:39

[Kernel] Add marlin_24 unit tests (vllm-project#4901)

e69d23b

[Kernel] Add flash-attn back (vllm-project#4907)

81ec16b

[Model] LLaVA model refactor (vllm-project#4910)

5500975

Remove marlin warning (vllm-project#4918)

b913d04

[Misc]: allow user to specify port in distributed setting (vllm-proje…

683a30b

…ct#4914)

[Build/CI] Enabling AMD Entrypoints Test (vllm-project#4834)

c8794c3

Co-authored-by: Alexey Kondratiev <alexey.kondratiev@amd.com>

[Bugfix] Fix dummy weight for fp8 (vllm-project#4916)

5b6a7b5

Allow dummy load format for fp8, torch.uniform_ doesn't support FP8 at the moment Co-authored-by: Mor Zusman <morz@ai21.com>

[Core] Sharded State Loader download from HF (vllm-project#4889)

a5e66c7

[Doc]Add documentation to benchmarking script when running TGI (vllm-…

8a78ed8

…project#4920)

[Core] Fix scheduler considering "no LoRA" as "LoRA" (vllm-project#4897)

6b46dcf

[Model] add rope_scaling support for qwen2 (vllm-project#4930)

907d48a

[Model] Add Phi-2 LoRA support (vllm-project#4886)

11d6f7e

[Docs] Add acknowledgment for sponsors (vllm-project#4925)

5d98989

[CI/Build] Codespell ignore build/ directory (vllm-project#4945)

58a235b

[Bugfix] Fix flag name for max_seq_len_to_capture (vllm-project#4935)

253d8fb

Signed-off-by: kerthcet <kerthcet@gmail.com>

[Bugfix][Kernel] Add head size check for attention backend selection (v…

f744125

…llm-project#4944)

[Frontend] Dynamic RoPE scaling (vllm-project#4638)

c1672a9

[CI/Build] Enforce style for C++ and CUDA code with clang-format (v…

4b6c961

…llm-project#4722)

[misc] remove comments that were supposed to be removed (vllm-project…

4b74974

…#4977)

[Kernel] Fixup for CUTLASS kernels in CUDA graphs (vllm-project#4954)

39c15ee

Pass the CUDA stream into the CUTLASS GEMMs, to avoid future issues with CUDA graphs

[Misc] Load FP8 kv-cache scaling factors from checkpoints (vllm-proje…

2835fc6

…ct#4893) The 2nd PR for vllm-project#4532. This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).

[Model] LoRA gptbigcode implementation (vllm-project#3949)

3db99a6

[Core] Eliminate parallel worker per-step task scheduling overhead (v…

39a0a40

…llm-project#4894)

[Minor] Fix small typo in llama.py: QKVParallelLinear -> Quantization…

847ca88

…Config (vllm-project#4991)

[Misc] Take user preference in attention selector (vllm-project#4960)

c60384c

Marlin 24 prefill performance improvement (about 25% better on averag…

dae5aaf

…e) (vllm-project#4983)

[Bugfix] Update Dockerfile.cpu to fix NameError: name 'vllm_ops' is n…

05a4f64

…ot defined (vllm-project#5009)

[Core][1/N] Support send/recv in PyNCCL Groups (vllm-project#4988)

bf4c411

Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>

[Kernel] Initial Activation Quantization Support (vllm-project#4525)

c623663

Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>

[Core]: Option To Use Prompt Token Ids Inside Logits Processor (vllm-…

a9ca32d

…project#4985) Co-authored-by: Elisei Smirnov <el.smirnov@innopolis.university>

robertgshaw2-redhat added 12 commits June 10, 2024 12:17

skip samplers during remote push

389bdcd

cleanup newline nit

5dd3f5d

switch to enable / disable

a475844

readded

397cfe2

convert workflows to use new files

8c6d1f3

updated each comment

e093e61

updated missed core files

e95ad95

updated test core

fe0be9e

format

4fabe98

Merge branch 'main' into remote-push-refactor

ae39285

fix bad merge llm_generate

14dedf1

fix bad merge oot_registration

4b078bd

robertgshaw2-redhat commented Jun 11, 2024

View reviewed changes

duplicate mark

05c5702

robertgshaw2-redhat requested review from andy-neuma and dhuangnm June 11, 2024 01:09

Merge branch 'main' into remote-push-refactor

e8166df

robertgshaw2-redhat mentioned this pull request Jun 11, 2024

[Rel Eng] Dial In Test Skipping #293

Closed

robertgshaw2-redhat added 2 commits June 11, 2024 01:14

Merge branch 'remote-push-refactor' of https://github.com/neuralmagic…

08c8e55

…/nm-vllm into remote-push-refactor

yapf on models core

62f6283

dhuangnm reviewed Jun 11, 2024

View reviewed changes

dbarbuzzi added 3 commits June 13, 2024 18:49

Replace '0' with 'ENABLE'

9b2d02f

Merge branch 'main' into remote-push-refactor

4b691b9

Small fixes from conflict resolution

ef38251

andy-neuma reviewed Jun 13, 2024

View reviewed changes

andy-neuma approved these changes Jun 13, 2024

View reviewed changes

dbarbuzzi merged commit a3cc7a8 into main Jun 14, 2024
30 of 34 checks passed

dbarbuzzi deleted the remote-push-refactor branch June 14, 2024 14:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remote push refactor #297

Remote push refactor #297

robertgshaw2-redhat commented Jun 11, 2024

robertgshaw2-redhat Jun 11, 2024 •

edited

Loading

robertgshaw2-redhat Jun 11, 2024

dhuangnm Jun 11, 2024

andy-neuma Jun 13, 2024

andy-neuma left a comment

andy-neuma Jun 13, 2024

andy-neuma left a comment

Remote push refactor #297

Remote push refactor #297

Conversation

robertgshaw2-redhat commented Jun 11, 2024

robertgshaw2-redhat Jun 11, 2024 • edited Loading

Choose a reason for hiding this comment

robertgshaw2-redhat Jun 11, 2024

Choose a reason for hiding this comment

dhuangnm Jun 11, 2024

Choose a reason for hiding this comment

andy-neuma Jun 13, 2024

Choose a reason for hiding this comment

andy-neuma left a comment

Choose a reason for hiding this comment

andy-neuma Jun 13, 2024

Choose a reason for hiding this comment

andy-neuma left a comment

Choose a reason for hiding this comment

robertgshaw2-redhat Jun 11, 2024 •

edited

Loading