Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple the prompt set type (practice, official) from its file name #801

Merged
merged 8 commits into from
Jan 21, 2025

Conversation

rogthefrog
Copy link
Contributor

@rogthefrog rogthefrog commented Jan 16, 2025

Story

#781

Rationale

Filenames

#744

Test Items

#780

Main Change

New uploaded files need to be added to the PROMPT_SETS dict in src/modelgauge/prompt_sets.py and they need to contain the whole key. E.g. if you want to use the key "demo_hi_in" then your prompt file name should include "demo_hi_in" in it.

To use new prompt sets in modellab, you need to add them to the modellab drop down, and to the modelrunner list of prompt sets. We may want to change that so it's less manual.

@rogthefrog rogthefrog requested a review from a team as a code owner January 16, 2025 03:51
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing January 16, 2025 03:51 — with GitHub Actions Inactive
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing January 16, 2025 03:51 — with GitHub Actions Inactive
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing January 16, 2025 03:51 — with GitHub Actions Inactive
Copy link

github-actions bot commented Jan 16, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Copy link
Contributor

@wpietri wpietri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this code get cleaner if there were a PromptSet object and possibly an enum for the known good ones?

Copy link
Contributor

@bkorycki bkorycki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much better organized now. Thanks!

src/modelgauge/prompt_sets.py Outdated Show resolved Hide resolved
src/modelgauge/prompt_sets.py Outdated Show resolved Hide resolved
@rogthefrog
Copy link
Contributor Author

rogthefrog commented Jan 16, 2025

Would this code get cleaner if there were a PromptSet object and possibly an enum for the known good ones?

That was my initial approach, and it got more complicated than I wanted. I can take a fresh pass.

@wpietri Putting in the changes inspired by Barbara's review made me decide against the Enum approach for two reasons.

  1. we changed locales from Enums to strings in part because using a string was just simpler syntax-wise and we didn't lose much.
  2. conceptually, an Enum is a type, with well-defined values defined by the programmer that follow some kind of internal logic or structure. By contrast, the prompt sets (their short names and filenames) are user-defined, unbounded, and arbitrary. It feels more right (to me) to use a bag of strings to represent a bag of strings (short names and filenames).

@rogthefrog rogthefrog marked this pull request as draft January 17, 2025 00:46
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing January 17, 2025 02:31 — with GitHub Actions Inactive
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing January 17, 2025 02:31 — with GitHub Actions Inactive
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing January 17, 2025 02:31 — with GitHub Actions Inactive
@rogthefrog rogthefrog marked this pull request as ready for review January 17, 2025 03:03
@rogthefrog
Copy link
Contributor Author

@bkorycki @wpietri @dhosterman ready for re-review when you have a chance!

@wpietri
Copy link
Contributor

wpietri commented Jan 17, 2025

Would this code get cleaner if there were a PromptSet object and possibly an enum for the known good ones?
By contrast, the prompt sets (their short names and filenames) are user-defined, unbounded, and arbitrary. It feels more right (to me) to use a bag of strings to represent a bag of strings (short names and filenames).

To me an object is a name plus data plus behavior. I think we have those conditions here, if marginally. The name is already in the filename, and it's a domain term. You have the data in the dict. And there's a bit of behavior in the file itself, with possibly more elsewhere. But it is pretty basic, so I can live with it staying as procedural programming until more behavior turns up.

But this is a problem for me:

class GeneralPurposeAiChatBenchmarkV1(BenchmarkDefinition):
    def __init__(self, locale: str, prompt_set: str, prompt_sets: dict = PROMPT_SETS, evaluator: str = "default"):
        validate_locale(locale)
        validate_prompt_set(prompt_set, prompt_sets)
[...]
    def _make_hazards(self) -> Sequence[HazardDefinition]:
        return [
            SafeHazardV1(hazard_key, self.locale, self.prompt_set, PROMPT_SETS, self.evaluator)

[...]

Previously, what defined a GeneralPurposeAiChatBenchmarkV1 was a locale, a prompt set, and an evaluator. The code more or less matched our domain language. But now PROMPT_SETS is part of the definition, which is definitely outside our domain language, and it's weird from an object modelling perspective.

What typing would get us here is a clearer interface, and using the type system to do the validation that ends up here. Even going with strings, I'd rather validation of UI/CLI inputs end up much closer to the UI/CLI. (If we wanted to be extra careful here and have a double-check, I'd rather have it be something like assert prompt_set.valid_set_id(prompt_set). But as long as things will blow up harmlessly with the wrong prompt set key, I wouldn't go that far.) And PROMPT_SETS definitely shouldn't be part of the constructor signature for a benchmark or a hazard; that's an implementation detail distorting the domain model.

@rogthefrog
Copy link
Contributor Author

@wpietri @bkorycki third pass :)

@wpietri I still think using a bag of strings for prompt sets is preferable at this stage. There's a world in which we may need to add functionality or data to the concept of a prompt set, e.g. a locale if it only contains one locale, and then we can use a class for that. But I don't think we're there yet, and because types aren't enforced and we need to support plain strings to identify prompt sets (e.g. as arguments in the CLI) then I think it's best to stick with strings at this stage.

Copy link
Contributor

@wpietri wpietri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for persisting!

Copy link
Contributor

@bkorycki bkorycki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@@ -9,6 +9,8 @@

# add the other languages after we have official and practice prompt sets
LOCALES = (EN_US, FR_FR)
# all the languages we have official and practice prompt sets for
PUBLISHED_LOCALES = (EN_US,)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the set of locales that we can run in modelbench?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOCALES should be runnable in modelbench. PUBLISHED_LOCALES is there to avoid hardcoded exceptions deeper in the code, e.g. the funky exceptions we had to put in to run French practice tests.

@rogthefrog rogthefrog merged commit 9337d40 into main Jan 21, 2025
4 checks passed
@rogthefrog rogthefrog deleted the fix/781/prompt-sets-simpler branch January 21, 2025 18:37
@github-actions github-actions bot locked and limited conversation to collaborators Jan 21, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants