Skip to content
This repository was archived by the owner on Oct 11, 2024. It is now read-only.

[Rel Eng] Dial In LM Eval Tests Phase 1 #289

Merged
merged 26 commits into from
Jun 21, 2024

Conversation

robertgshaw2-redhat
Copy link
Collaborator

@robertgshaw2-redhat robertgshaw2-redhat commented Jun 8, 2024

WAIT UNTIL UPSTREAM SYNC LANDS TO MERGE

SUMMARY:

  • refactored lm-eval workflows to use a single script for generating a baseline
  • refactored lm-eval workflows to accept a config file so we can parameterize for the different length runs
  • added configuration for remote-push -> running llama-3-8b on 250 GSM prompts
  • removed lm-eval-smoke such that we have one single pathway for running lm-eval tests

@robertgshaw2-redhat robertgshaw2-redhat changed the title Rel eng/dial in accuracy tests Rel eng/dial in accuracy tests part 1 Jun 8, 2024
@robertgshaw2-redhat robertgshaw2-redhat changed the title Rel eng/dial in accuracy tests part 1 [ REL ENG] Dial In Accuracy Tests Phase 1 Jun 9, 2024
@robertgshaw2-redhat robertgshaw2-redhat changed the title [ REL ENG] Dial In Accuracy Tests Phase 1 [ Rel Eng ] Dial In Accuracy Tests Phase 1 Jun 9, 2024
@robertgshaw2-redhat robertgshaw2-redhat changed the title [ Rel Eng ] Dial In Accuracy Tests Phase 1 [Rel Eng] Dial In Accuracy Tests Phase 1 Jun 9, 2024
@robertgshaw2-redhat robertgshaw2-redhat changed the title [Rel Eng] Dial In Accuracy Tests Phase 1 [Rel Eng] Dial In LM Eval Tests Phase 1 Jun 10, 2024
Copy link

@derekk-nm derekk-nm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it should work.
Just raised a potential concern about the type of arg values passed to lm-eval, and a nit about the units on an arg.

Copy link

@dbarbuzzi dbarbuzzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question about using a new input (that seems otherwise unused) prior to approval.

Copy link
Member

@dhuangnm dhuangnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@robertgshaw2-redhat robertgshaw2-redhat merged commit 7c46a95 into main Jun 21, 2024
37 checks passed
@robertgshaw2-redhat robertgshaw2-redhat deleted the rel-eng/dial-in-accuracy-tests branch June 21, 2024 16:40
derekk-nm pushed a commit that referenced this pull request Jun 24, 2024
WAIT UNTIL UPSTREAM SYNC LANDS TO MERGE

SUMMARY:
* refactored lm-eval workflows to use a single script for generating a
baseline
* refactored lm-eval workflows to accept a config file so we can
parameterize for the different length runs
* added configuration for `remote-push` -> running `llama-3-8b` on 250
GSM prompts
* removed lm-eval-smoke such that we have one single pathway for running
lm-eval tests
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants