Skip to content

Latest commit

 

History

History
57 lines (31 loc) · 2.24 KB

evaluation_flow.md

File metadata and controls

57 lines (31 loc) · 2.24 KB

Evaluation Flow

The evaluation flow is used to evaluate the performance of the main flow by comparing their output to ground truth data.

The evaluation flow performs a batch run of the main flow using the ground truth data as input. The output of the main flow is then compared to the ground truth data to compute metrics that evaluate the performance of the main flow.

Evaluation Flow Diagram

How to Evaluate Agents Using Promptflow

This section describes the way of evaluating Agents with Ground Truth Data.

Requirements

  • Ground Truth Data must be collected in a JSONL file. See example below. Objects must be in 1 line and NOT delimited by comma Ground Truth Data Example

Step by step guide in Promptflow UI

  1. From the main flow (ai_doc_review) we click on Evaluate > Custom Evaluation

Evaluate > Custom Evaluation

  1. Next we upload Ground Truth data in a JSONL format as mentioned in Requirements

Upload Ground Truth data

Input mapping references which Agent field corresponds to which Ground Truth field.

In this case:

  • pdf_name corresponds to the document name stored in blob storage
  • stream must be set to False
  • pagination could be set to any desired value, or left blank to use a default value. See more information in the flow documentation.
  1. Select custom eval flow

Here, select the previously deployed evaluation flow. Refer to the deployment documentation on how to deploy the evaluation flow.

Select custom eval flow

  1. Pick data source inputs and submit for evaluation
  • llm_output: Agent run output (output must be in a dict)
  • gt_json: Ground Truth Data JSONL (output must be in a dict)

Select custom eval flow

  1. Check the results of the Evaluation Flow

Once the flow is finished the results can be found in the evaluation flow under View batch runs

Select custom eval flow

  1. Results of the Evaluation flow can be located under Metrics

Eval Metrics