GPTScore: A Novel Evaluation Framework for Text Generation Models #811
Labels
code-generation
code generation models and tools like copilot and aider
llm-evaluation
Evaluating Large Language Models performance and behavior through human-written evaluation sets
Papers
Research papers
GPTScore: A Novel Evaluation Framework for Text Generation Models
GPTScore: Evaluate as You Desire
This is the Source Code of Paper: GPTScore: Evaluate as You Desire.
What is GPTScore?
GPTScore is a novel evaluation framework that utilizes the emergent abilities (e.g., zero-shot instruction) of Generative Pre-Trained models to Score generated texts.
GPTScore evaluation framework support:
What PLMs does GPTScore support?
We explored 19 Pre-trained Language Models (PLMs) ranging in size from 80M (FLAN-T5-Small) to 175B (GPT3) to design GPTScore. The PLMs studied in this paper are listed as follows:
Evaluator Name indicates the name of the evaluator corresponding to the Model name in the first column.
Usage
Use the GPT3-based model as the evaluator
Take the evaluation of GPT3-text-curie-001 model as an example.
gpt3_score
to True: the GPTScore evaluator uses a GPT3-based PLM.gpt3model
tocurie
: the text-curie-001 model is utilized.out_dir_name
: set the folder for saving scoring results.dataname
: set the dataset name for evaluation (e.g., BAGEL).aspect
: set the aspect name to be evaluated (e.g., quality).1. GPTScore with Instruction and Demonstration
Set both the
use_demo
anduse_ist
as True.2. GPTScore with only Instruction
Set the
use_ist
to True anduse_demo
to False.3. GPTScore without both Instruction and Demonstration
Set the
use_ist
to False anduse_demo
to False.For more information, visit the GitHub repository.
Suggested labels
None
The text was updated successfully, but these errors were encountered: