You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description:
Implement a new type of judge called Autojudge, which can dynamically generate an evaluation rubric based on a labeled dataset and associated feedback. This feature will automate the creation of task-specific LLM evaluation rubrics. Autojudge will enable users to leverage labeled data and feedback to fine-tune or guide LLM-based evaluators effectively.
Proposed Workflow:
Input:
A labeled dataset containing:
input_text: User prompt or query.
completion: AI-generated response.
label: Binary evaluation (1 for acceptable, 0 for unacceptable).
feedback: Detailed explanations for why a response is unacceptable (mandatory for label=0).
A task_description providing context for rubric generation.
Process:
Use the input data to generate a rubric for evaluation, considering the feedback provided for negative labels.
The generated rubric should detail scoring criteria (e.g., factuality, relevance, tone) and decision rules.
Output:
A structured rubric that can be used by other judges in the library.
Optionally, structured feedback and evaluation metrics for the dataset (accuracy, precision, recall, etc.).
Motivation:
Automating rubric generation will streamline the development of evaluation workflows.
It reduces manual effort in crafting rubrics while ensuring consistency and adaptability for domain-specific tasks.
Example Use Case:
A user wants to evaluate AI-generated responses for empathy in customer service. They provide labeled examples of good and bad responses with feedback for improvement. Autojudge processes the data and generates an evaluation rubric focused on empathy-related criteria.
Tasks:
Create the Autojudge class with:
Methods for processing labeled data and generating rubrics.
Integration with the existing judge architecture.
Add unit tests to validate functionality.
Update the documentation to include Autojudge under the Types of Judges section.
The text was updated successfully, but these errors were encountered:
freddiev4
changed the title
[FEATURE] Add a New Judge: Autojudge for Dynamic Rubric Generation
Add autojudge for Dynamic Rubric Generation
Dec 17, 2024
Description:
Implement a new type of judge called Autojudge, which can dynamically generate an evaluation rubric based on a labeled dataset and associated feedback. This feature will automate the creation of task-specific LLM evaluation rubrics. Autojudge will enable users to leverage labeled data and feedback to fine-tune or guide LLM-based evaluators effectively.
Proposed Workflow:
Input:
input_text
: User prompt or query.completion
: AI-generated response.label
: Binary evaluation (1 for acceptable, 0 for unacceptable).feedback
: Detailed explanations for why a response is unacceptable (mandatory forlabel=0
).task_description
providing context for rubric generation.Process:
Output:
Motivation:
Example Use Case:
A user wants to evaluate AI-generated responses for empathy in customer service. They provide labeled examples of good and bad responses with feedback for improvement. Autojudge processes the data and generates an evaluation rubric focused on empathy-related criteria.
Tasks:
Autojudge
class with:Autojudge
under the Types of Judges section.The text was updated successfully, but these errors were encountered: