Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add autojudge for Dynamic Rubric Generation #7

Closed
jamesliounis opened this issue Dec 16, 2024 · 0 comments · Fixed by #11
Closed

Add autojudge for Dynamic Rubric Generation #7

jamesliounis opened this issue Dec 16, 2024 · 0 comments · Fixed by #11
Assignees
Labels
enhancement New feature or request

Comments

@jamesliounis
Copy link
Contributor

Description:
Implement a new type of judge called Autojudge, which can dynamically generate an evaluation rubric based on a labeled dataset and associated feedback. This feature will automate the creation of task-specific LLM evaluation rubrics. Autojudge will enable users to leverage labeled data and feedback to fine-tune or guide LLM-based evaluators effectively.

Proposed Workflow:

  1. Input:

    • A labeled dataset containing:
      • input_text: User prompt or query.
      • completion: AI-generated response.
      • label: Binary evaluation (1 for acceptable, 0 for unacceptable).
      • feedback: Detailed explanations for why a response is unacceptable (mandatory for label=0).
    • A task_description providing context for rubric generation.
  2. Process:

    • Use the input data to generate a rubric for evaluation, considering the feedback provided for negative labels.
    • The generated rubric should detail scoring criteria (e.g., factuality, relevance, tone) and decision rules.
  3. Output:

    • A structured rubric that can be used by other judges in the library.
    • Optionally, structured feedback and evaluation metrics for the dataset (accuracy, precision, recall, etc.).

Motivation:

  • Automating rubric generation will streamline the development of evaluation workflows.
  • It reduces manual effort in crafting rubrics while ensuring consistency and adaptability for domain-specific tasks.

Example Use Case:
A user wants to evaluate AI-generated responses for empathy in customer service. They provide labeled examples of good and bad responses with feedback for improvement. Autojudge processes the data and generates an evaluation rubric focused on empathy-related criteria.

Tasks:

  1. Create the Autojudge class with:
    • Methods for processing labeled data and generating rubrics.
    • Integration with the existing judge architecture.
  2. Add unit tests to validate functionality.
  3. Update the documentation to include Autojudge under the Types of Judges section.
@jamesliounis jamesliounis self-assigned this Dec 16, 2024
@freddiev4 freddiev4 added the enhancement New feature or request label Dec 17, 2024
@freddiev4 freddiev4 changed the title [FEATURE] Add a New Judge: Autojudge for Dynamic Rubric Generation Add autojudge for Dynamic Rubric Generation Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants