The repo of the Personalized Intelligent Outpatient Reception System (PIORS) and the Service Flow aware Medical Scenario Simulation (SFMSS)
In China, receptionist nurses face overwhelming workloads in outpatient settings, limiting their time and attention for each patient and ultimately reducing service quality. We present the Personalized Intelligent Outpatient Reception System (PIORS). This system integrates an LLM-based reception nurse and a collaboration between LLM and hospital information system (HIS) into real outpatient reception setting, aiming to deliver personalized, high-quality, and efficient reception services. Additionally, to enhance the performance of LLMs in real-world healthcare scenarios, we propose a medical conversational data generation framework named Service Flow aware Medical Scenario Simulation (SFMSS), aiming to adapt the LLM to the real-world environments and PIORS settings. We evaluate the effectiveness of PIORS and SFMSS through automatic and human assessments involving 15 users and 15 clinical experts. The results demonstrate that PIORS-Nurse outperforms all baselines, including the current state-of-the-art model GPT-4o, and aligns with human preferences and clinical needs.
The overall framework of PIORS is shown below.
git clone https://github.com/FudanDISC/PIORS.git
cd PIORS
We provide an environment.yml
file that can be used to create a Conda environment.
conda env create -f environment.yml
conda activate piors
Change the placeholder your api key
in eval_model_config
, judger_config
and agent_config
with your valid OpenAI api key (if you don't have one, refer to OpenAI website to generate one).
- Click here to try the demo of PIORS.
- Below we provide quick examples of SFMSS and the automatic evaluation.
We provide few example outpatient medical records in emr_example.json
, you can use them as seed data to simulation outpatient receptino dialogues.
bash sfmss/main.sh
You can also use your own medical records dataset to start dialogue simulation. Just change FILE_PATH
to the path to your dataset and make sure that the data format meets the requirements. Detailed instructions and specific data format requirements can be found here.
You can evaluate the performance of a model acting as a reception nurse. By default, the evaluation assesses the performance of GPT-4o, and the emr_example.json for patient simulator.
bash eval/main.sh
Change Nurse
in eval_model_config
to evaluate different models, and we recommend deploying local models using vLLM.
Details of the evaluation pipeline can be found here.
We conduct both automatic and human evaluation. Our model (PIORS-Nurse) ranks first in all metrics in the automatic evaluation.
Method | Model | Accuracy | Overall Score | Info Score | Average Turn Number | Average Turn Length |
---|---|---|---|---|---|---|
Directly Prompt | GPT-4o | 0.717 | 3.83 | 2.16 | 3.54 | 207.98 |
Qwen2-7B | 0.634 | 3.65 | 2.28 | 4.22 | 336.40 | |
Llama3-8B | 0.401 | 3.24 | 2.65 | 4.44 | 678.14 | |
HuatouGPT2-13B | 0.501 | 3.25 | 2.17 | 3.57 | 258.38 | |
Fine-tuned | SF-ablated nurse | 0.786 | 3.92 | 2.20 | 3.37 | 202.55 |
PIORS-Nurse | 0.822 | 4.01 | 3.01 | 3.22 | 139.54 |
In human evaluation, our model achieves a win or tie rate of over 80%.
@misc{bao2024piors,
title={PIORS: Personalized Intelligent Outpatient Reception based on Large Language Model with Multi-Agents Medical Scenario Simulation},
author={Zhijie Bao and Qingyun Liu and Ying Guo and Zhengqiang Ye and Jun Shen and Shirong Xie and Jiajie Peng and Xuanjing Huang and Zhongyu Wei},
year={2024},
eprint={2411.13902},
archivePrefix={arXiv},
primaryClass={cs.CL}
}