Automatic Evluation

The evluation pipeline includes two parts: dialogue simulation and quality evaluation.

main.sh runs the whole pipeline
run_dialogue_simulation.sh performs only the dialogue simulation
run_autoeval.sh performs only the quality evaluation.

To evaluate different models, modify Nurse in eval_model_config. To use your custom dataset, update FILE_PATH and EMR_PATH in main.sh. The required data format can be found in SFMSS Data.

When comparing the performance of different models, we strongly recommend removing --sample in main.sh and running patient sampling first. This ensures the same patient simulation settings, making the results comparable. (Remember to update the FILE_PATH to the new path containing records with patient settings.)

bash sfmss/workflow/run_patient_sample.sh
bash main.sh

The Folder result contains all evaluation results presented in the paper, including both automatic and human evaluation results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Automatic Evluation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Automatic Evluation