Skip to content
Sören edited this page May 29, 2024 · 5 revisions

Welcome to TracEX

Hello and welcome to the wiki of our bachelor project, TracEX! Please read this page for general background information about the project or browse the table of contents for more details.

Project Overview

Purpose

TracEX aims to extract event logs from unstructured text, specifically written patient experiences known as patient journeys, using Large Language Models (LLMs).

Key Features

  • Extraction Pipeline: A robust pipeline to clean, process, and extract data from natural language text.
  • Patient Journey Generator: Generates comprehensive patient journeys based on randomized cohort data.
  • Database: Stores patient journeys and related extraction results for easy access and analysis.
  • Metrics and Evaluation Tool: Evaluates the accuracy and effectiveness of the extraction process and allows for analysis of exctraction results.
  • Intuitive UI: User-friendly interface for you to interact with the tool and visualize results.

Target Audience

Researchers in the fields of digital health, process mining, and computer science.

Context and Motivation

Chronic diseases like cancer, long COVID, endometriosis, and dementia affect a significant number of people. These diseases are often uncurable and only partially treatable, leading to various approaches for managing the condition over time. Although individual disease progressions may share similarities, they often vary widely due to numerous influencing factors.

Valuable insights can be gained by comparing and making patient experiences comparable. For example:

  • How does the sequence of events influence outcomes?
  • Which exercises help alleviate symptoms for patients with pre-existing asthma?
  • Should a patient with symptom X visit a hospital or consult a doctor?

Our project partners at mamahealth aim to compile thousands, possibly millions, of patient experiences in a structured form. Their goal is to bridge the gap between medical professionals and patients by directly incorporating the patients' perspectives and presenting them to the medical community.

Process Mining

Process mining is a collection of techniques that allows the creation of graphs from event logs, structured records of events and their timestamps. These graphs depict the sequence and frequency of events. The fundamental idea is that an affected person describes their disease progression and all decisions and events in natural language. This information is integrated into a large process mining model, which also includes the descriptions of all other recorded individuals. From the resulting insights, the affected person can infer potential next steps.

Project Goals and Challenges

Event logs are crucial as they form the basis for all subsequent steps. TracEX is dedicated to the automated generation of event logs from natural language, unstructured texts. We utilize OpenAI's GPT models (currently GPT 3.5) for text processing. Several challenges must be addressed, including missing and incorrect information, ambiguous descriptions, irrelevant details, potential AI hallucinations, and more. We managed to partially address these problems but there is still a lot of room for improvement and new ideas.