Decouple the prompt set type (practice, official) from its file name #801

rogthefrog · 2025-01-16T03:51:53Z

Story

#781

Rationale

Filenames

#744

Test Items

#780

Main Change

New uploaded files need to be added to the PROMPT_SETS dict in src/modelgauge/prompt_sets.py and they need to contain the whole key. E.g. if you want to use the key "demo_hi_in" then your prompt file name should include "demo_hi_in" in it.

To use new prompt sets in modellab, you need to add them to the modellab drop down, and to the modelrunner list of prompt sets. We may want to change that so it's less manual.

…e from its filename

github-actions · 2025-01-16T03:52:06Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

wpietri

Would this code get cleaner if there were a PromptSet object and possibly an enum for the known good ones?

bkorycki

Looks much better organized now. Thanks!

src/modelgauge/prompt_sets.py

rogthefrog · 2025-01-16T21:51:07Z

Would this code get cleaner if there were a PromptSet object and possibly an enum for the known good ones?

That was my initial approach, and it got more complicated than I wanted. I can take a fresh pass.

@wpietri Putting in the changes inspired by Barbara's review made me decide against the Enum approach for two reasons.

we changed locales from Enums to strings in part because using a string was just simpler syntax-wise and we didn't lose much.
conceptually, an Enum is a type, with well-defined values defined by the programmer that follow some kind of internal logic or structure. By contrast, the prompt sets (their short names and filenames) are user-defined, unbounded, and arbitrary. It feels more right (to me) to use a bag of strings to represent a bag of strings (short names and filenames).

…or better testing

rogthefrog · 2025-01-17T03:04:13Z

@bkorycki @wpietri @dhosterman ready for re-review when you have a chance!

wpietri · 2025-01-17T14:36:22Z

Would this code get cleaner if there were a PromptSet object and possibly an enum for the known good ones?
By contrast, the prompt sets (their short names and filenames) are user-defined, unbounded, and arbitrary. It feels more right (to me) to use a bag of strings to represent a bag of strings (short names and filenames).

To me an object is a name plus data plus behavior. I think we have those conditions here, if marginally. The name is already in the filename, and it's a domain term. You have the data in the dict. And there's a bit of behavior in the file itself, with possibly more elsewhere. But it is pretty basic, so I can live with it staying as procedural programming until more behavior turns up.

But this is a problem for me:

class GeneralPurposeAiChatBenchmarkV1(BenchmarkDefinition):
    def __init__(self, locale: str, prompt_set: str, prompt_sets: dict = PROMPT_SETS, evaluator: str = "default"):
        validate_locale(locale)
        validate_prompt_set(prompt_set, prompt_sets)
[...]
    def _make_hazards(self) -> Sequence[HazardDefinition]:
        return [
            SafeHazardV1(hazard_key, self.locale, self.prompt_set, PROMPT_SETS, self.evaluator)

[...]

Previously, what defined a GeneralPurposeAiChatBenchmarkV1 was a locale, a prompt set, and an evaluator. The code more or less matched our domain language. But now PROMPT_SETS is part of the definition, which is definitely outside our domain language, and it's weird from an object modelling perspective.

What typing would get us here is a clearer interface, and using the type system to do the validation that ends up here. Even going with strings, I'd rather validation of UI/CLI inputs end up much closer to the UI/CLI. (If we wanted to be extra careful here and have a double-check, I'd rather have it be something like assert prompt_set.valid_set_id(prompt_set). But as long as things will blow up harmlessly with the wrong prompt set key, I wouldn't go that far.) And PROMPT_SETS definitely shouldn't be part of the constructor signature for a benchmark or a hazard; that's an implementation detail distorting the domain model.

rogthefrog · 2025-01-17T22:24:17Z

@wpietri @bkorycki third pass :)

@wpietri I still think using a bag of strings for prompt sets is preferable at this stage. There's a world in which we may need to add functionality or data to the concept of a prompt set, e.g. a locale if it only contains one locale, and then we can use a class for that. But I don't think we're there yet, and because types aren't enforced and we need to support plain strings to identify prompt sets (e.g. as arguments in the CLI) then I think it's best to stick with strings at this stage.

wpietri

Thanks for persisting!

bkorycki

Looks good!

bkorycki · 2025-01-21T18:11:23Z

src/modelgauge/locales.py

@@ -9,6 +9,8 @@

 # add the other languages after we have official and practice prompt sets
 LOCALES = (EN_US, FR_FR)
+# all the languages we have official and practice prompt sets for
+PUBLISHED_LOCALES = (EN_US,)


Is this the set of locales that we can run in modelbench?

LOCALES should be runnable in modelbench. PUBLISHED_LOCALES is there to avoid hardcoded exceptions deeper in the code, e.g. the funky exceptions we had to put in to run French practice tests.

rogthefrog added 4 commits January 15, 2025 19:45

decouple the file name from the prompt type

2432c54

use new prompt set logic

59ebe6e

decouple prompt set type and locale from file name

2f5a14a

use new prompt set logic that decouples the prompt set type and local…

c493db2

…e from its filename

rogthefrog requested a review from a team as a code owner January 16, 2025 03:51

rogthefrog temporarily deployed to Scheduled Testing January 16, 2025 03:51 — with GitHub Actions Inactive

noop; formatting

864861a

rogthefrog temporarily deployed to Scheduled Testing January 16, 2025 03:53 — with GitHub Actions Inactive

rogthefrog requested review from wpietri, dhosterman and bkorycki January 16, 2025 03:54

change prompt set key for consistency

c1abf55

rogthefrog temporarily deployed to Scheduled Testing January 16, 2025 03:57 — with GitHub Actions Inactive

rogthefrog mentioned this pull request Jan 16, 2025

Improve how locales and prompt file name interact with each other #781

Open

wpietri reviewed Jan 16, 2025

View reviewed changes

bkorycki requested changes Jan 16, 2025

View reviewed changes

src/modelgauge/prompt_sets.py Outdated Show resolved Hide resolved

src/modelgauge/prompt_sets.py Outdated Show resolved Hide resolved

rogthefrog marked this pull request as draft January 17, 2025 00:46

inject the prompt sets instead of using the global prompt set dict, f…

a44639a

…or better testing

rogthefrog temporarily deployed to Scheduled Testing January 17, 2025 02:31 — with GitHub Actions Inactive

rogthefrog marked this pull request as ready for review January 17, 2025 03:03

remove prompt_sets from where it doesn't belong conceptually

ad7236b

rogthefrog temporarily deployed to Scheduled Testing January 17, 2025 22:19 — with GitHub Actions Inactive

wpietri approved these changes Jan 21, 2025

View reviewed changes

bkorycki approved these changes Jan 21, 2025

View reviewed changes

rogthefrog merged commit 9337d40 into main Jan 21, 2025
4 checks passed

rogthefrog deleted the fix/781/prompt-sets-simpler branch January 21, 2025 18:37

github-actions bot locked and limited conversation to collaborators Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decouple the prompt set type (practice, official) from its file name #801

Decouple the prompt set type (practice, official) from its file name #801

rogthefrog commented Jan 16, 2025 •

edited

Loading

github-actions bot commented Jan 16, 2025 •

edited

Loading

wpietri left a comment

bkorycki left a comment

rogthefrog commented Jan 16, 2025 •

edited

Loading

rogthefrog commented Jan 17, 2025

wpietri commented Jan 17, 2025

rogthefrog commented Jan 17, 2025

wpietri left a comment

bkorycki left a comment

bkorycki Jan 21, 2025

rogthefrog Jan 21, 2025

Decouple the prompt set type (practice, official) from its file name #801

Decouple the prompt set type (practice, official) from its file name #801

Conversation

rogthefrog commented Jan 16, 2025 • edited Loading

Story

Rationale

Filenames

Test Items

Main Change

github-actions bot commented Jan 16, 2025 • edited Loading

wpietri left a comment

Choose a reason for hiding this comment

bkorycki left a comment

Choose a reason for hiding this comment

rogthefrog commented Jan 16, 2025 • edited Loading

rogthefrog commented Jan 17, 2025

wpietri commented Jan 17, 2025

rogthefrog commented Jan 17, 2025

wpietri left a comment

Choose a reason for hiding this comment

bkorycki left a comment

Choose a reason for hiding this comment

bkorycki Jan 21, 2025

Choose a reason for hiding this comment

rogthefrog Jan 21, 2025

Choose a reason for hiding this comment

rogthefrog commented Jan 16, 2025 •

edited

Loading

github-actions bot commented Jan 16, 2025 •

edited

Loading

rogthefrog commented Jan 16, 2025 •

edited

Loading