This is a data package with 15 medical datasets for teaching
Reproducible Medical Research with R. The link to the pkgdown reference
website for {medicaldata} is
here and in the links at
the right. This package will be useful for anyone teaching R to medical
professionals, including doctors, nurses, trainees, and students.
These datasets range from reconstructed versions of James Lind’s
scurvy dataset (1757) and the original Streptomycin for Tuberculosis
trial (1948), a 2012 RCT of indomethacin to prevent post-ERCP
pancreatitis that I was involved in, to cohort data on SARS-CoV2 testing
results (2020). Many of the datasets come from the American Statistical
Association’s TSHS (Teaching Statistics in the Health Sciences)
Resources Portal,
maintained by Carol
Bigelow at the
University of Massachusetts (with permission).
-
Install the stable, current CRAN version with
install.packages("medicaldata")
. If you want to try out the in-development version (which may have new datasets and vignettes, but which may also be intermittently wonky), install with:remotes::install_github("higgi13425/medicaldata")
-
Then load the package with
library(medicaldata)
-
Then you can list the datasets available with
data(package = "medicaldata")
-
Then assign a particular dataset to a named object in your environment with:
covid <- medicaldata::covid_testing
wherecovid
is the name of the new object, andcovid_testing
is the name of the dataset. -
Articles (vignettes) on how to use the datasets can be found at the pkgdown website under the Articles tab.
-
You can click on the links below to view the description document and/or codebook for each dataset. This information is also available under the Reference tab above, or within R by using
help(dataset_name)
.
If you have access to data from a randomized, controlled clinical trial, or a prospective cohort study, or even a case-control study, please consider obtaining the appropriate permissions, anonymizing the data, and donating the dataset for teaching purposes to add to this package. Open an issue on the github page (source code link at the top right) to open the discussion of a data donation. I am happy to help with anonymization.
Click on links below for more details about the dataset itself in the
Description Document, and more details about the variables included in
the dataset in the Codebook. Note that each dataset also has a help file
that you can use within R or RStudio, by entering help("dataset_name")
in the Console pane.
Dataset | Description document | Codebook |
---|---|---|
strep_tb | strep_tb_desc | strep_tb_codebook |
scurvy | scurvy_desc | scurvy_codebook |
indo_rct | indo_rct_desc | indo_rct_codebook |
polyps | polyps_desc | polyps_codebook |
covid_testing | covid_desc | covid_codebook |
blood_storage | blood_storage_desc | blood_storage_codebook |
cytomegalovirus | cytomegalovirus_desc | cytomegalovirus_codebook |
esoph_ca | esoph_ca_desc | esoph_ca_codebook |
laryngoscope | laryngoscope_desc | laryngoscope_codebook |
licorice_gargle | licorice_gargle_desc | licorice_gargle_codebook |
opt | opt_desc | opt_codebook |
smartpill | smartpill_desc | smartpill_codebook |
supraclavicular | supraclavicular_desc | supraclavicular_codebook |
indometh | indometh_desc | indometh_codebook |
theoph | theoph_desc | theoph_codebook |