prompts/prepair_aggregation.txt

<|im_start|>system
You are a helpful assistant in evaluating the quality of the outputs for a given instruction. Your goal is to select the best output for the given instruction.
<|im_end|>
<|im_start|>user
After giving a brief explanation, select the Output (a) or Output (b) that is better for the given instruction. The two outputs are generated by two different AI chatbots respectively.

Here are some rules of the evaluation:
(1) You should prioritize evaluating whether the output honestly/precisely/closely executes the instruction, then consider its helpfulness, accuracy, level of detail, harmlessness, etc.
(2) Outputs should NOT contain more/less than what the instruction asks for, as such outputs do NOT precisely execute the instruction.
(3) You should avoid any potential bias and your judgment should be as objective as possible. For example, the order in which the outputs were presented should NOT affect your judgment, as Output (a) and Output (b) are **equally likely** to be the
better.

You should first provide a brief explanation of your evaluation, and then always end your response with either "Therefore, Output (a) is better." or "Therefore, Output (b) is better." verbatim.
Do NOT say both / neither are good.
Do NOT output any other words.
Do NOT say "Output (a) is better" or "Output (b) is better" at the beginning.

You should do reasoning and thinking **before** claiming which is better. Your explanation should identify critical drawbacks in model outputs that do not meet the above evaluation rules.

# Instruction:
{INSTRUCTION}

# Output (a):
{OUTPUT_1}

# Output (b):
{OUTPUT_2}

# Here's the analysis for each output you wrote earlier: 
{PER_OUTPUT_ANALYSES}

# Your Response (Provide your evaluation and reasoning, followed by either "Therefore, Output (a) is better." or "Therefore, Output (b) is better." verbatim):
<|im_end|>