diff --git a/joss.06277/10.21105.joss.06277.crossref.xml b/joss.06277/10.21105.joss.06277.crossref.xml new file mode 100644 index 0000000000..b5b82976eb --- /dev/null +++ b/joss.06277/10.21105.joss.06277.crossref.xml @@ -0,0 +1,284 @@ + + + + 20240217T070632-57c4f15497f6db472b8ca9494ffe5980dff256ea + 20240217070632 + + JOSS Admin + admin@theoj.org + + The Open Journal + + + + + Journal of Open Source Software + JOSS + 2475-9066 + + 10.21105/joss + https://joss.theoj.org + + + + + 02 + 2024 + + + 9 + + 94 + + + + REDCapTidieR: Extracting complex REDCap databases into +tidy tables + + + + Richard + Hanna + https://orcid.org/0009-0005-6496-8154 + + + Ezra + Porter + https://orcid.org/0000-0002-4690-8343 + + + Stephany + Romero + + + Paul + Wildenhain + + + William + Beasley + https://orcid.org/0000-0002-5613-5006 + + + Stephan + Kadauke + https://orcid.org/0000-0003-2996-8034 + + + + 02 + 17 + 2024 + + + 6277 + + + 10.21105/joss.06277 + + + http://creativecommons.org/licenses/by/4.0/ + http://creativecommons.org/licenses/by/4.0/ + http://creativecommons.org/licenses/by/4.0/ + + + + Software archive + 10.5281/zenodo.10658773 + + + GitHub review issue + https://github.com/openjournals/joss-reviews/issues/6277 + + + + 10.21105/joss.06277 + https://joss.theoj.org/papers/10.21105/joss.06277 + + + https://joss.theoj.org/papers/10.21105/joss.06277.pdf + + + + + + The REDCap consortium: Building an +international community of software platform partners + Harris + Journal of Biomedical +Informatics + 95 + 10.1016/j.jbi.2019.103208 + 1532-0464 + 2019 + Harris, P. A., Taylor, R., Minor, B. +L., Elliott, V., Fernandez, M., O’Neal, L., McLeod, L., Delacqua, G., +Delacqua, F., Kirby, J., & Duda, S. N. (2019). The REDCap +consortium: Building an international community of software platform +partners. Journal of Biomedical Informatics, 95, 103208. +https://doi.org/10.1016/j.jbi.2019.103208 + + + Research electronic data capture (REDCap)—a +metadata-driven methodology and workflow process for providing +translational research informatics support + Harris + Journal of Biomedical +Informatics + 2 + 42 + 10.1016/j.jbi.2008.08.010 + 1532-0464 + 2009 + Harris, P. A., Taylor, R., Thielke, +R., Payne, J., Gonzalez, N., & Conde, J. G. (2009). Research +electronic data capture (REDCap)—a metadata-driven methodology and +workflow process for providing translational research informatics +support. Journal of Biomedical Informatics, 42(2), 377–381. +https://doi.org/10.1016/j.jbi.2008.08.010 + + + Tidy data + Wickham + Journal of Statistical +Software + 10 + 59 + 10.18637/jss.v059.i10 + 2014 + Wickham, H. (2014). Tidy data. +Journal of Statistical Software, 59(10), 1–23. +https://doi.org/10.18637/jss.v059.i10 + + + R: A language and environment for statistical +computing + R Core Team + 2020 + R Core Team. (2020). R: A language +and environment for statistical computing. R Foundation for Statistical +Computing. https://www.R-project.org/ + + + REDCapR: Interaction Between R and +REDCap + Beasley + 2023 + Beasley, W. (2023). REDCapR: +Interaction Between R and REDCap. +https://ouhscbbmc.github.io/REDCapR/ + + + redcapAPI: Accessing data from REDCap projects +using the API + Garbett + 10.5281/zenodo.10564837 + 2024 + Garbett, S., Nutter, B., Lane, S., +Beasley, W., Horner, J., Stephens, J., Lehr, M., Beck, C., & +Obregon, S. (2024). redcapAPI: Accessing data from REDCap projects using +the API. https://doi.org/10.5281/zenodo.10564837 + + + REDCapDM: ’REDCap’ data +management + Carmezim + 2023 + Carmezim, J., Peñafiel, J., Satorra, +P., García, E., Pallarés, N., & Tebé, C. (2023). REDCapDM: ’REDCap’ +data management. +https://bruigtp.github.io/REDCapDM/ + + + tidyREDCap: Helper functions for working with +’REDCap’ data + Balise + 2023 + Balise, R., Odom, G., Calderon, A., +Bouzoubaa, L., DeFreitas, W., & Grealis, K. (2023). tidyREDCap: +Helper functions for working with ’REDCap’ data. +https://raymondbalise.github.io/tidyREDCap/index.html + + + labelled: Manipulating labelled +data + Larmarange + 2023 + Larmarange, J. (2023). labelled: +Manipulating labelled data. +https://larmarange.github.io/labelled/ + + + openxlsx2: Read, write and edit ’xlsx’ +files + Barbone + 2023 + Barbone, J. M., & Garbuszus, J. +M. (2023). openxlsx2: Read, write and edit ’xlsx’ files. +https://janmarvin.github.io/openxlsx2/ + + + skimr: Compact and flexible summaries of +data + Waring + 2023 + Waring, E., Quinn, M., McNamara, A., +Arino de la Rubia, E., Zhu, H., & Ellis, S. (2023). skimr: Compact +and flexible summaries of data. +https://docs.ropensci.org/skimr/ + + + tibble: Simple data frames + Müller + 2023 + Müller, K., & Wickham, H. (2023). +tibble: Simple data frames. +https://tibble.tidyverse.org/ + + + OpenSSF Best Practices badge +program + Open Source Security Foundation + 2023 + Open Source Security Foundation. +(2023). OpenSSF Best Practices badge program. The Linux Foundation. +https://www.bestpractices.dev/ + + + Writing to a REDCap project + Beasley + 2023 + Beasley, W., & Balise, R. (2023). +Writing to a REDCap project. +https://ouhscbbmc.github.io/REDCapR/articles/workflow-write.html + + + REDCapTidieR + Hanna + 2023 + Hanna, R., Porter, E., & Kadauke, +S. (2023). REDCapTidieR. +https://chop-cgtinformatics.github.io/REDCapTidieR/index.html + + + Superhero database + ter Lingen + 2023 + ter Lingen, J. (2023). Superhero +database. https://www.superherodb.com/ + + + + + + diff --git a/joss.06277/10.21105.joss.06277.jats b/joss.06277/10.21105.joss.06277.jats new file mode 100644 index 0000000000..274330e0d4 --- /dev/null +++ b/joss.06277/10.21105.joss.06277.jats @@ -0,0 +1,608 @@ + + +
+ + + + +Journal of Open Source Software +JOSS + +2475-9066 + +Open Journals + + + +6277 +10.21105/joss.06277 + +REDCapTidieR: Extracting complex REDCap databases into +tidy tables + + + +https://orcid.org/0009-0005-6496-8154 + +Hanna +Richard + + + + +https://orcid.org/0000-0002-4690-8343 + +Porter +Ezra + + + + + +Romero +Stephany + + + + + +Wildenhain +Paul + + + + +https://orcid.org/0000-0002-5613-5006 + +Beasley +William + + + + +https://orcid.org/0000-0003-2996-8034 + +Kadauke +Stephan + + + + + + + + +Division of Oncology, Children’s Hospital of Philadelphia, +Philadelphia, Pennsylvania + + + + +Department of Biomedical and Health Informatics, Children’s +Hospital of Philadelphia, Philadelphia, Pennsylvania + + + + +Department of Pathology and Laboratory Medicine, Perelman +School of Medicine at the University of Pennsylvania, Philadelphia, +Pennsylvania + + + + +Division of Transfusion Medicine, Children’s Hospital of +Philadelphia, Pennsylvania + + + + +Division of Pathology Informatics, Children’s Hospital of +Philadelphia, Pennsylvania + + + + +Division of Pediatrics, Children’s Hospital of +Philadelphia, Philadelphia, Pennsylvania + + + + +Department of Pediatrics, The University of Oklahoma Health +Sciences Center, College of Medicine, Oklahoma City, Oklahoma, +USA + + + +9 +94 +6277 + +Authors of papers retain copyright and release the +work under a Creative Commons Attribution 4.0 International License (CC +BY 4.0) +2022 +The article authors + +Authors of papers retain copyright and release the work under +a Creative Commons Attribution 4.0 International License (CC BY +4.0) + + + +R +REDCap +data management + + + + + + Summary +

Capturing and storing electronic data is integral in the research + world. + REDCap + (Harris + et al., 2009, + 2019) + offers a secure web application that lets users build databases and + surveys with a robust front-end interface that can support data of any + type, including data requiring compliance with standards for protected + information.

+

Many REDCap users use the R programming language + (R Core + Team, 2020) to extract and analyze their data. The + REDCapR + (Beasley, + 2023) and + redcapAPI + (Garbett + et al., 2024) packages allow R users to extract data directly + into their programming environment. While this works well for simple + REDCap databases, it becomes cumbersome for complex databases, because + the REDCap API outputs a “block matrix”–a single table with varied + granularity levels, which conflicts with the “tidy data” framework + (Wickham, + 2014) that advocates for standardized data organization.

+

To address this, we introduce REDCapTidieR, + an open-source package that streamlines data extraction and + restructures it into an intuitive format compatible with the tidy data + principles. This facilitates seamless data analysis in R, especially + for complex longitudinal studies.

+

While there are several tools available for REDCap data management, + REDCapTidieR introduces a unique solution by transforming the + challenging block matrix into a standardized tidy data structure that + we term the “supertibble”. This approach not only aligns with good + data science practice but also caters to databases of any complexity. + By providing a suite of utility functions to work with the + supertibble, REDCapTidieR provides a complete framework for extracting + REDCap data designed with user-friendliness at its core.

+
+ + Statement of Need +

As of 2023, the REDCap Consortium boasts nearly 3 million users + across over 150 countries. REDCap databases range from + single-instrument projects to complex builds that use both repeating + instruments and repeating events. These data structures are needed to + capture multiple items related to a specific visit, such as + concomitant medications, or events that cannot be planned ahead of + time, such as adverse events.

+

REDCap databases that contain repeating events and instruments + require significant manual pre-processing, a major pain point for + researchers and analysts. This is because the REDCap API returns a + single table (Figure 1) that includes data from instruments that + record data at different levels of granularity.

+

While several existing REDCap packages are available (Table 1), + REDCapTidieR distinguishes itself by offering + analysts a unique framework that returns a tidy data structure + regardless of the size or complexity of the extracted database. + Packages such as + tidyREDCap + (Balise + et al., 2023) and + REDCapDM + (Carmezim + et al., 2023) also offer tools for data processing, while + redcapAPI gives a wealth of options for data + export in addition to features that break apart the block matrix using + a base R framework. However, only REDCapTidieR + deconstructs the block matrix into easily joinable tidy tables that + form their own composite primary keys to preserve the relationships + between each other in accordance with their unique granularity.

+

REDCapTidieR is built with production + readiness in mind. In addition to an extensive 98% coverage test + suite, REDCapTidieR execution is evaluated + against 15 test databases that cover many complex configuration + scenarios. It also provides ample documentation through a + pkgdown + site + (Hanna + et al., 2023). It is also built on top of + REDCapR, which contains its own extensive test + suite, and evaluated against an additional 26 test databases. + REDCapTidieR meets the rigorous requirements of + the + OpenSSF + Best Practices Badge + (Open + Source Security Foundation, 2023), which certifies open-source + projects that adhere to criteria for delivering high-quality, robust, + and secure software.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PackageExports from REDCapImports into REDCapTidy ReformattingExtensive Test Suite
redcapAPIxxx
REDCapRxxx
tidyREDCapx
REDCapDMx
REDCapTidieRxxx
+
+

Table 1: Comparative breakdown of the landscape for REDCap tools in + R.

+
+ + Design +

The REDCapTidieR::read_redcap() function + leverages REDCapR to make API calls to query + the data and metadata of a REDCap project and returns the supertibble + (Figure 1). The supertibble, named after the + tibble + package + (Müller + & Wickham, 2023), is an alternative presentation of the + data in which multiple tables are linked together in a single object + in a fashion consistent with tidy data principles. Specific data + tibbles within the supertibble, representing the data of individual + REDCap instruments, can be easily joined using their composite primary + keys.

+ +

The REDCapTidieR Supertibble

+ +
+

Figure 1: The REDCapTidieR supertibble shown in the Data Viewer of + the RStudio IDE. The “Superhero database” + (ter + Lingen, 2023) contains two instruments, one nonrepeating and + one repeating. A. The REDCap API outputs a “Block Matrix”. Note an + abundance of NA values, which do not represent + missing values but rather fields that do not apply due to the data + structure. B. The read_redcap() function + returns a “Supertibble”. Note that each row represents one instrument, + identified by the redcap_form_name column. The + redcap_data column is a list column that links + to tibbles containing the data from a specific instrument. The Data + Viewer allows drilling down into individual tibbles by clicking on the + table icon, allowing for rapid and intuitive data exploration without + any preprocessing. Since each instrument has a consistent granularity, + these tibbles can be tidy. Two data tibbles are shown, one from a + nonrepeating and one from a repeating instrument. Note the differences + in granularity between the instruments.

+

REDCapTidieR provides utility functions to + work with the supertibble, all designed to work with the R pipe + operator |>. The + extract_tibble() function takes a supertibble + object and returns a specific data tibble. The + make_labelled() function leverages the + labelled package + (Larmarange, + 2023) to apply variable labels to the supertibble. The + add_skimr_metadata() function uses the + skimr package + (Waring + et al., 2023) to add summary statistics. Using the + write_redcap_xlsx() function, which leverages + the openxlsx2 + (Barbone + & Garbuszus, 2023) package, users can easily export an the + supertibble into a collaborator-friendly Excel document, in which each + Excel sheet contains the data for an instrument.

+

REDCapTidieR cannot be used to write data to + a REDCap project. We refer the reader to an excellent guide of how to + accomplish this using REDCapR + (Beasley + & Balise, 2023).

+
+ + Installation +

REDCapTidieR is available on + GitHub + and + CRAN + and works on all major operating systems.

+
+ + Acknowledgements +

We would like to thank Jan Marvin and Raymond Balise for their + feedback and support in development.

+

This package was developed by the + Cell + and Gene Therapy Informatics Team of the + Children’s + Hospital of Philadelphia.

+
+ + Conflict of interest +

The authors declare no financial conflicts of interest.

+
+ + + + + + + HarrisPaul A. + TaylorRobert + MinorBrenda L. + ElliottVeida + FernandezMichelle + O’NealLindsay + McLeodLaura + DelacquaGiovanni + DelacquaFrancesco + KirbyJacqueline + DudaStephany N. + + The REDCap consortium: Building an international community of software platform partners + Journal of Biomedical Informatics + 2019 + 95 + 1532-0464 + https://www.sciencedirect.com/science/article/pii/S1532046419301261 + 10.1016/j.jbi.2019.103208 + 103208 + + + + + + + HarrisPaul A. + TaylorRobert + ThielkeRobert + PayneJonathon + GonzalezNathaniel + CondeJose G. + + Research electronic data capture (REDCap)—a metadata-driven methodology and workflow process for providing translational research informatics support + Journal of Biomedical Informatics + 2009 + 42 + 2 + 1532-0464 + https://www.sciencedirect.com/science/article/pii/S1532046408001226 + 10.1016/j.jbi.2008.08.010 + 377 + 381 + + + + + + WickhamHadley + + Tidy data + Journal of Statistical Software + 2014 + 59 + 10 + https://www.jstatsoft.org/index.php/jss/article/view/v059i10 + 10.18637/jss.v059.i10 + 1 + 23 + + + + + + R Core Team + + R: A language and environment for statistical computing + R Foundation for Statistical Computing + Vienna, Austria + 2020 + https://www.R-project.org/ + + + + + + BeasleyWill + + REDCapR: Interaction Between R and REDCap + 2023 + https://ouhscbbmc.github.io/REDCapR/ + + + + + + GarbettShawn + NutterBenjamin + LaneStephen + BeasleyWill + HornerJeffrey + StephensJeremy + LehrMarcus + BeckCole + ObregonSavannah + + redcapAPI: Accessing data from REDCap projects using the API + 2024 + https://github.com/vubiostat/redcapAPI + 10.5281/zenodo.10564837 + + + + + + CarmezimJoão + PeñafielJudith + SatorraPau + GarcíaEsther + PallarésNatàlia + TebéCristian + + REDCapDM: ’REDCap’ data management + 2023 + https://bruigtp.github.io/REDCapDM/ + + + + + + BaliseRaymond + OdomGabriel + CalderonAnna + BouzoubaaLayla + DeFreitasWayne + GrealisKyle + + tidyREDCap: Helper functions for working with ’REDCap’ data + 2023 + https://raymondbalise.github.io/tidyREDCap/index.html + + + + + + LarmarangeJoseph + + labelled: Manipulating labelled data + 2023 + https://larmarange.github.io/labelled/ + + + + + + BarboneJordan Mark + GarbuszusJan Marvin + + openxlsx2: Read, write and edit ’xlsx’ files + 2023 + https://janmarvin.github.io/openxlsx2/ + + + + + + WaringElin + QuinnMichael + McNamaraAmelia + Arino de la RubiaEduardo + ZhuHao + EllisShannon + + skimr: Compact and flexible summaries of data + 2023 + https://docs.ropensci.org/skimr/ + + + + + + MüllerKirill + WickhamHadley + + tibble: Simple data frames + 2023 + https://tibble.tidyverse.org/ + + + + + + Open Source Security Foundation + + OpenSSF Best Practices badge program + The Linux Foundation + 202310 + https://www.bestpractices.dev/ + + + + + + BeasleyWill + BaliseRaymond + + Writing to a REDCap project + 2023 + https://ouhscbbmc.github.io/REDCapR/articles/workflow-write.html + + + + + + HannaRichard + PorterEzra + KadaukeStephan + + REDCapTidieR + 2023 + https://chop-cgtinformatics.github.io/REDCapTidieR/index.html + + + + + + ter LingenJeroen + + Superhero database + 2023 + https://www.superherodb.com/ + + + + +
diff --git a/joss.06277/10.21105.joss.06277.pdf b/joss.06277/10.21105.joss.06277.pdf new file mode 100644 index 0000000000..7b7026ee71 Binary files /dev/null and b/joss.06277/10.21105.joss.06277.pdf differ diff --git a/joss.06277/media/images/Figure1.png b/joss.06277/media/images/Figure1.png new file mode 100644 index 0000000000..cee97ef3ca Binary files /dev/null and b/joss.06277/media/images/Figure1.png differ