Skip to content
This repository was archived by the owner on Oct 11, 2024. It is now read-only.

Remote push refactor #297

Merged
merged 159 commits into from
Jun 14, 2024
Merged

Remote push refactor #297

merged 159 commits into from
Jun 14, 2024

Conversation

robertgshaw2-redhat
Copy link
Collaborator

SUMMARY:

  • updated model test structure to focus on core models
  • refactored tests to use environment variables (currently at "test group" level - so each folder has an env variable). All tests are off by default and they are explicitly enabled
  • refactored workflows build-test workflow to use a list of env variables rather than skip test list

WHY:

  • this enables us to be more sane about what is and is not on - as opposed to a long list of files
  • this enables us to actually track what is run and what is not run (via testmo, which tracks skipped tests)
  • this enables us to have more fine-grained control over what is run vs not run (we can add more env vars at the sub-group level to turn off more tests)

alexm-redhat and others added 30 commits June 8, 2024 16:39
Co-authored-by: Alexey Kondratiev <alexey.kondratiev@amd.com>
Allow dummy load format for fp8,
torch.uniform_ doesn't support FP8 at the moment

Co-authored-by: Mor Zusman <morz@ai21.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Pass the CUDA stream into the CUTLASS GEMMs, to avoid future issues with CUDA graphs
…ct#4893)

The 2nd PR for vllm-project#4532.

This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
…project#4985)

Co-authored-by: Elisei Smirnov <el.smirnov@innopolis.university>
@@ -147,7 +147,7 @@ def __init__(
self,
model_name: str,
dtype: str = "half",
access_token: Optional[str] = None,
**kwargs,
Copy link
Collaborator Author

@robertgshaw2-redhat robertgshaw2-redhat Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • access token not needed (we set HF_TOKEN in automation)
  • kwargs enables us to pass whatever we want for hf runner

@@ -472,21 +472,21 @@ def _decode_token_by_position_index(

def generate_greedy_logprobs_nm_use_tokens(
self,
prompts: List[str],
input_ids_lst: List[torch.Tensor],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • previously, we were passing a prompt, formatted with a chat template (which appends bos token)
  • then, we tokenize here which appends another bos token

This change prevents that double permanently, by forcing the test to tokenize everything fully

@@ -37,7 +37,7 @@ jobs:
test_label_solo: gcp-k8s-l4-solo
test_label_multi: ignore
test_timeout: 480
test_skip_list: neuralmagic/tests/skip-for-remote-push-tmp.txt
test_skip_env_vars: neuralmagic/tests/test_skip_env_vars/smoke.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For remote-push, maybe we don't need to run all 4 python versions, and only 38/311 is good enough?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, let's get to that after this is merged

Copy link
Member

@andy-neuma andy-neuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd prefer not having a file as input. instead, it'd be more flexible to just have a string listing the ENV with setting. something like,

TEST_ACCURACY=DISABLE,TEST_CORE=ENABLE

@@ -37,7 +37,7 @@ jobs:
test_label_solo: gcp-k8s-l4-solo
test_label_multi: ignore
test_timeout: 480
test_skip_list: neuralmagic/tests/skip-for-remote-push-tmp.txt
test_skip_env_vars: neuralmagic/tests/test_skip_env_vars/smoke.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, let's get to that after this is merged

Copy link
Member

@andy-neuma andy-neuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! we can adjust calling parameters and style later.

@dbarbuzzi dbarbuzzi merged commit a3cc7a8 into main Jun 14, 2024
30 of 34 checks passed
@dbarbuzzi dbarbuzzi deleted the remote-push-refactor branch June 14, 2024 14:18
derekk-nm pushed a commit that referenced this pull request Jun 24, 2024
SUMMARY:

* updated model test structure to focus on core models
* refactored tests to use environment variables (currently at "test
group" level - so each folder has an env variable). All tests are off by
default and they are explicitly enabled
* refactored workflows build-test workflow to use a list of env
variables rather than skip test list

WHY:
* this enables us to be more sane about what is and is not on - as
opposed to a long list of files
* this enables us to actually track what is run and what is not run (via
testmo, which tracks skipped tests)
* this enables us to have more fine-grained control over what is run vs
not run (we can add more env vars at the sub-group level to turn off
more tests)

---------

Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Signed-off-by: pandyamarut <pandyamarut@gmail.com>
Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Wenwei Zhang <40779233+ZwwWayne@users.noreply.github.com>
Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com>
Co-authored-by: Alexey Kondratiev <alexey.kondratiev@amd.com>
Co-authored-by: Mor Zusman <mor.zusmann@gmail.com>
Co-authored-by: Mor Zusman <morz@ai21.com>
Co-authored-by: Aurick Qiao <aurickq@users.noreply.github.com>
Co-authored-by: Kuntai Du <kuntai@uchicago.edu>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: HUANG Fei <hzhwcmhf@gmail.com>
Co-authored-by: Isotr0py <41363108+Isotr0py@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Kante Yin <kerthcet@gmail.com>
Co-authored-by: sasha0552 <admin@sasha0552.org>
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: raywanb <112235519+raywanb@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: Letian Li <lotianmail@gmail.com>
Co-authored-by: Murali Andoorveedu <37849411+andoorve@users.noreply.github.com>
Co-authored-by: Dipika Sikka <ds3822@columbia.edu>
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Elisei Smirnov <61423871+kezouke@users.noreply.github.com>
Co-authored-by: Elisei Smirnov <el.smirnov@innopolis.university>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: leiwen83 <leiwen83@users.noreply.github.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Eric Xihui Lin <xihuil.silence@gmail.com>
Co-authored-by: beagleski <yunanzhang@microsoft.com>
Co-authored-by: bapatra <bapatra@microsoft.com>
Co-authored-by: Barun Patra <codedecde@users.noreply.github.com>
Co-authored-by: Lily Liu <lilyliupku@gmail.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Michał Moskal <michal@moskal.me>
Co-authored-by: Ruth Evans <ruthevans@Ruths-MacBook-Pro.local>
Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Junichi Sato <junichi.sato@sbintuitions.co.jp>
Co-authored-by: Marut Pandya <pandyamarut@gmail.com>
Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>
Co-authored-by: Ronen Schaffer <ronen.schaffer@ibm.com>
Co-authored-by: Itay Etelis <92247226+Etelis@users.noreply.github.com>
Co-authored-by: omkar kakarparthi <75638701+okakarpa@users.noreply.github.com>
Co-authored-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Co-authored-by: Breno Faria <breno@veltefaria.de>
Co-authored-by: Breno Faria <breno.faria@intrafind.com>
Co-authored-by: Hyunsung Lee <ita9naiwa@gmail.com>
Co-authored-by: Chansung Park <deep.diver.csp@gmail.com>
Co-authored-by: SnowDist <quxingwei25@gmail.com>
Co-authored-by: functionxu123 <1229853312@qq.com>
Co-authored-by: xuhao <xuhao@cambricon.com>
Co-authored-by: Domenic Barbuzzi <domenic@neuralmagic.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.