This repository contains the Human Expert vs. LLM Preference Data from the My Heart Counts study. It includes raw and processed datasets, a comprehensive list of messages, a mapping dictionary for customization, and a script to process the data.
- data/
- raw/: Contains the raw dataset (
pref_data_raw.csv
) - processed/: Contains the processed dataset (
pref_data_processed.csv
)
- raw/: Contains the raw dataset (
- scripts/
process_data.py
: Script to process the datamapping.json
: Mapping dictionary for customization
- docs/
messages.md
: Full list of messages
README.md
: This file
The mapping.json
file defines descriptions and mappings for each column in the dataset. You can edit it to:
- Adjust column mappings (e.g., update
Gender
orStage of Change
labels). - Modify message preferences or add new mappings.
To process the raw dataset and apply your mappings:
-
Navigate to the
scripts
folder:cd scripts
-
Run the script:
python3 process_data.py
-
The processed file will be saved in:
data/processed/pref_data_processed.csv
Notes on the Survey
- Participants only answered stage-specific questions corresponding to their current stage of change. For example, individuals in the "Action" stage only answered questions related to that stage. Columns unrelated to their stage are intentionally left blank.
- The LLM messages were generated using a fine-tuned version of LLaMA3-70B.
For the full list of messages, refer to the data/messages.md file.