Date repository last updated: January 26, 2023
The UKBBcleanR
package contains an R
function that prepares time-to-event data from raw UK Biobank electronic medical record data. The prepared data can be used for cancer outcomes, but could be modified for other health outcomes. This package is not available on CRAN.
To install the development version from GitHub:
devtools::install_github("machiela-lab/UKBBcleanR")
Function | Description |
---|---|
tte |
Prepares time-to-event data from raw UK Biobank electronic medical record data. |
The repository also includes the resources and code to create the project hex sticker.
-
Alexander Depaulis - Integrative Tumor Epidemiology Branch (ITEB), Division of Cancer Epidemiology and Genetics (DCEG), National Cancer Institute (NCI), National Institutes of Health (NIH), Rockville, Maryland (MD), USA - GitHub
-
Derek W. Brown - ITEB, DCEG, NCI, NIH, Rockville, MD, USA (original) - GitHub - ORCID
-
Aubrey K. Hubbard - ITEB, DCEG, NCI, NIH, Rockville, MD, USA - ORCID
See also the list of contributors who participated in this package, including:
-
Ian D. Buller - Social & Scientific Systems, Inc., a division of DLH Corporation, Silver Spring, Maryland (current) - Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland (original) - GitHub - ORCID
-
Mitchell J. Machiela - ITEB, DCEG, NCI, NIH, Rockville, MD, USA - GitHub - ORCID
The tte
function requires several raw UK Biobank variables to run correctly. A detailed list of required variables are provided in the README_required_variables.txt file.
Data can be loaded in the tte
function in two ways:
-
The user can specify a working directory using
setwd()
to where each individual data set is stored.- NOTE: These individual data sets must contain the specific variables and have names which match the README_required_variables.txt file. Example data is available within the package.
-
The user can generate a single data set containing all the variables of interest. This data set can then be loaded into the
tte
function using thecombined_data
argument. Example data is available within the package.
# ------------------ #
# Necessary packages #
# ------------------ #
library(UKBBcleanR)
# -------- #
# Settings #
# -------- #
##### Input UKBBcleanR sample data
# Use combined data set
testdata <- as.data.frame(combined_data)
# Set ICD-10 outcome of interest
cancer_outcome <- c("C911")
# Set prevalent cancers to identify in data cleaning
prevalent_cancers <- c("D37", "D38", "D39", "D40", "D41", "D42",
"D43", "D44", "D45", "D46", "D47", "D48")
# Set incident cancers to identify in data cleaning
incident_cancers <- c("C900")
# ------- #
# Run tte #
# ------- #
# Run without removing prevalent cancers from analysis
test1 <- tte(combined_data = testdata,
cancer_of_interest_ICD10 = cancer_outcome,
prevalent_cancer_list = prevalent_cancers,
prevalent_C_cancers = TRUE,
incident_cancer_list = incident_cancers,
remove_prevalent_cancer = FALSE,
remove_self_reported_cancer = FALSE)
table(test1$case_control_cancer_ignore) # tte outcome ignoring other incident cancers
table(test1$case_control_cancer_control) # tte outcome controlling for other incident cancers
# Run with removing prevalent cancers from analysis
test2 <- tte(combined_data = testdata,
cancer_of_interest_ICD10 = cancer_outcome,
prevalent_cancer_list = prevalent_cancers,
prevalent_C_cancers = TRUE,
incident_cancer_list = incident_cancers,
remove_prevalent_cancer = TRUE,
remove_self_reported_cancer = TRUE)
table(test2$case_control_cancer_ignore) # tte outcome ignoring other incident cancers
table(test2$case_control_cancer_control) # tte outcome controlling for other incident cancers
We provide a vignette with a practical example and work through of the provided example data.
Package was developed while the first author was a participant of the 2022 National Institutes of Health Summer Internship Program in Biomedical Research and while the second author was a postdoctoral fellow supported by the Cancer Prevention Fellowship Program at the National Cancer Institute (NCI) and the third author was a postdoctoral fellow in the NCI Division of Cancer Epidemiology and Genetics.
When citing this package for publication, please cite follow:
citation("UKBBcleanR")
For questions about the package please contact the maintainer Dr. Derek Brown or submit a new issue.