[Rel Eng] Dial In LM Eval Tests Phase 1 #289

robertgshaw2-redhat · 2024-06-08T20:15:21Z

WAIT UNTIL UPSTREAM SYNC LANDS TO MERGE

SUMMARY:

refactored lm-eval workflows to use a single script for generating a baseline
refactored lm-eval workflows to accept a config file so we can parameterize for the different length runs
added configuration for remote-push -> running llama-3-8b on 250 GSM prompts
removed lm-eval-smoke such that we have one single pathway for running lm-eval tests

nit newline

.github/actions/nm-lm-eval/action.yml

tests/accuracy/lm-eval-tasks.yaml

* apply changes made to `remote-push` to other workflows

…euralmagic/nm-vllm into rel-eng/dial-in-accuracy-tests

.github/scripts/nm-run-lm-eval-gsm-hf-baseline

.github/workflows/nm-build-test.yml

derekk-nm

Looks like it should work.
Just raised a potential concern about the type of arg values passed to lm-eval, and a nit about the units on an arg.

updated with better comment

dbarbuzzi

One question about using a new input (that seems otherwise unused) prior to approval.

.github/workflows/nm-build-test.yml

.github/scripts/nm-run-lm-eval-gsm-hf-baseline

.github/scripts/nm-run-lm-eval-vllm

.github/scripts/nm-run-lm-eval-gsm-hf-baseline

dhuangnm

LGTM, thanks

WAIT UNTIL UPSTREAM SYNC LANDS TO MERGE SUMMARY: * refactored lm-eval workflows to use a single script for generating a baseline * refactored lm-eval workflows to accept a config file so we can parameterize for the different length runs * added configuration for `remote-push` -> running `llama-3-8b` on 250 GSM prompts * removed lm-eval-smoke such that we have one single pathway for running lm-eval tests

robertgshaw2-redhat and others added 8 commits June 8, 2024 19:55

dialing in accuracy tests

1009145

added new files

09d6786

format

251112d

refactored workflow

4e98546

plumbing

0689d1e

newline nit

b66dfc1

Update remote-push.yaml

3828a7e

nit newline

make workflows work

4a98aeb

robertgshaw2-redhat commented Jun 8, 2024

View reviewed changes

.github/actions/nm-lm-eval/action.yml Show resolved Hide resolved

robertgshaw2-redhat changed the title ~~Rel eng/dial in accuracy tests~~ Rel eng/dial in accuracy tests part 1 Jun 8, 2024

robertgshaw2-redhat requested review from andy-neuma, mgoin, dbarbuzzi, derekk-nm and dhuangnm June 8, 2024 20:33

robertgshaw2-redhat commented Jun 8, 2024

View reviewed changes

tests/accuracy/lm-eval-tasks.yaml Show resolved Hide resolved

robertgshaw2-redhat and others added 7 commits June 8, 2024 21:39

hopefully fix

9f1cc2c

update other workflows for lm-eval changes (#292)

bc2b3f6

* apply changes made to `remote-push` to other workflows

added missing configs

9109677

Merge branch 'rel-eng/dial-in-accuracy-tests' of https://github.com/n…

bf4b830

…euralmagic/nm-vllm into rel-eng/dial-in-accuracy-tests

added gptq models

e9106b1

readded

f20641d

updated configs

c51beba

robertgshaw2-redhat changed the title ~~Rel eng/dial in accuracy tests part 1~~ [ REL ENG] Dial In Accuracy Tests Phase 1 Jun 9, 2024

robertgshaw2-redhat changed the title ~~[ REL ENG] Dial In Accuracy Tests Phase 1~~ [ Rel Eng ] Dial In Accuracy Tests Phase 1 Jun 9, 2024

robertgshaw2-redhat changed the title ~~[ Rel Eng ] Dial In Accuracy Tests Phase 1~~ [Rel Eng] Dial In Accuracy Tests Phase 1 Jun 9, 2024

robertgshaw2-redhat changed the title ~~[Rel Eng] Dial In Accuracy Tests Phase 1~~ [Rel Eng] Dial In LM Eval Tests Phase 1 Jun 10, 2024

derekk-nm reviewed Jun 10, 2024

View reviewed changes

.github/scripts/nm-run-lm-eval-gsm-hf-baseline Outdated Show resolved Hide resolved

derekk-nm reviewed Jun 10, 2024

View reviewed changes

.github/workflows/nm-build-test.yml Outdated Show resolved Hide resolved

derekk-nm approved these changes Jun 10, 2024

View reviewed changes

robertgshaw2-redhat added 2 commits June 10, 2024 16:00

Update nm-build-test.yml

3bb927c

updated with better comment

Merge branch 'main' into rel-eng/dial-in-accuracy-tests

6c6818e

dbarbuzzi reviewed Jun 10, 2024

View reviewed changes

.github/workflows/nm-build-test.yml Outdated Show resolved Hide resolved

mgoin reviewed Jun 10, 2024

View reviewed changes

robertgshaw2-redhat and others added 3 commits June 11, 2024 00:25

updated to include .sh

c130028

updated to use lm_eval_label

cbb4380

Merge branch 'main' into rel-eng/dial-in-accuracy-tests

60c0309

dhuangnm approved these changes Jun 11, 2024

View reviewed changes

robertgshaw2-redhat added 6 commits June 11, 2024 16:02

Merge branch 'main' into rel-eng/dial-in-accuracy-tests

344bf78

Merge branch 'main' into rel-eng/dial-in-accuracy-tests

a79075d

Merge branch 'main' into rel-eng/dial-in-accuracy-tests

bebf6d3

Merge branch 'main' into rel-eng/dial-in-accuracy-tests

ce0fcdb

Merge branch 'main' into rel-eng/dial-in-accuracy-tests

0dba00a

Merge branch 'main' into rel-eng/dial-in-accuracy-tests

593c8e9

robertgshaw2-redhat merged commit 7c46a95 into main Jun 21, 2024
37 checks passed

robertgshaw2-redhat deleted the rel-eng/dial-in-accuracy-tests branch June 21, 2024 16:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Rel Eng] Dial In LM Eval Tests Phase 1 #289

[Rel Eng] Dial In LM Eval Tests Phase 1 #289

robertgshaw2-redhat commented Jun 8, 2024 •

edited

Loading

derekk-nm left a comment

dbarbuzzi left a comment

dhuangnm left a comment

[Rel Eng] Dial In LM Eval Tests Phase 1 #289

[Rel Eng] Dial In LM Eval Tests Phase 1 #289

Conversation

robertgshaw2-redhat commented Jun 8, 2024 • edited Loading

derekk-nm left a comment

Choose a reason for hiding this comment

dbarbuzzi left a comment

Choose a reason for hiding this comment

dhuangnm left a comment

Choose a reason for hiding this comment

robertgshaw2-redhat commented Jun 8, 2024 •

edited

Loading