From ea9d850331f27810764130d0b40a11bb3c9d94d0 Mon Sep 17 00:00:00 2001 From: Florian Kohrt Date: Wed, 27 Nov 2024 18:50:42 +0100 Subject: [PATCH] Add a background on research compendia to introduction --- intro.qmd | 24 ++++++++++- literature.bib | 111 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 134 insertions(+), 1 deletion(-) diff --git a/intro.qmd b/intro.qmd index 62165e0..7ca1c20 100644 --- a/intro.qmd +++ b/intro.qmd @@ -3,7 +3,27 @@ title: "Introduction" lang: en --- -Together, we will create a research compendium whilst taking the following ten things into account [by @Arguillas2022, licensed under [CC\ BY\ 4.0](https://creativecommons.org/licenses/by/4.0/)]: +In the following, we will provide a brief introduction to the concept of _research compendia_. + +## The importance of sharing + +Suppose you are reading an article about a new imaging method to turn seismological data into subsurface images. The article describes the ideas that went into developing this method and presents a few examples to illustrate its superiority over previous approaches. You got interested and would like to apply this method to your own data. However, with only the article available, it could take months to come up with a working solution, if possible at all. This situation has been put aptly by @Buckheit1995, distilling an idea by Jon Claerbout: + +> "An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures." + +Even for articles that are use computers to apply existing methods (rather than reporting on a new method), sharing the source code and being transparent about the computational environment is imperative to making research reproducible [@Ince2012]. By reproducibility, we mean "obtaining consistent results using the same input data; computational steps, methods, and code; and conditions of analysis" [@NASEM2019, p. 46]. + +## Linking results and computations + +This tutorial not only covers sharing the source code, but also connecting it to the results through the creation of dynamic documents. Rather than manually copying numerical results, figures, or tables, they are inserted automatically upon rendering of the article. Dynamic documents bundled together with any necessary data and auxiliary software are called a [research compendium](https://research-compendium.science/) [@Gentleman2007]. + +The practice to interleave narrative text with code has its roots in the paradigm of literate programming, where documentation and source code are treated as equals and are arranged in a way to maximize understanding [@Knuth1984]. Alternating text and code can be also found in notebook interfaces for exploratory programming, such as provided by Wolfram Mathematica or Jupyter Notebooks [@Kluyver2016], with the added capability of executing the code and embedding its output. With _Sweave_ [@Leisch2002], ideas from both worlds -- literate programming and embedding program output -- were combined into one tool for rendering dynamic documents using the R programming language. It is the predecessor of the R package `knitr` [@Xie2015] which is being used under the hood in this tutorial.^[Specifically, Quarto employs `knitr` to execute chunks of R code.] + +Linking results with their computations has benefits for authors and readers. For the author, articles always contain the most recent version of figures, as they are updated automatically when the computation changes. For the readers, it enables understanding exactly how a particular result was obtained if they get access to the underlying research compendium. + +## Best practices + +When creating the research compendium, there are a few things to consider [by @Arguillas2022, licensed under [CC\ BY\ 4.0](https://creativecommons.org/licenses/by/4.0/)]: > __Does the research compendium contain everything needed to reproduce a predefined outcome in an organized and parsimonious way?__ > @@ -29,3 +49,5 @@ Together, we will create a research compendium whilst taking the following ten t > __Is there a plan for reviewing the research compendium for FAIR and computational reproducibility standards over time?__ > > 10. __Review:__ A series of managed activities are needed to ensure continued access to and functionality of the research compendium and its components for as long as necessary. + +Although this tutorial guides you through the creation of a research compendium, you are invited to revisit these questions after completion and check whether and how each point was addressed (or not). Further, you can consult them as a checklist for future projects. diff --git a/literature.bib b/literature.bib index 41b8017..c93fd9e 100644 --- a/literature.bib +++ b/literature.bib @@ -548,4 +548,115 @@ @online{UKGovernment2020 urldate = {2024-11-26}, author = {{UK Government Analytical Community}}, year = {2020}, +} +@incollection{Buckheit1995, + location = {New York, {NY}}, + title = {{WaveLab} and Reproducible Research}, + volume = {103}, + isbn = {978-0-387-94564-4 978-1-4612-2544-7}, + url = {http://link.springer.com/10.1007/978-1-4612-2544-7_5}, + pages = {55--81}, + booktitle = {Wavelets and Statistics}, + publisher = {Springer New York}, + author = {Buckheit, Jonathan B. and Donoho, David L.}, + editor = {Antoniadis, Anestis and Oppenheim, Georges}, + editorb = {Bickel, P. and Diggle, P. and Fienberg, S. and Krickeberg, K. and Olkin, I. and Wermuth, N. and Zeger, S.}, + editorbtype = {redactor}, + urldate = {2024-11-27}, + date = {1995}, + doi = {10.1007/978-1-4612-2544-7_5}, + note = {Series Title: Lecture Notes in Statistics}, +} +@article{Gentleman2007, + title = {Statistical Analyses and Reproducible Research}, + volume = {16}, + issn = {1061-8600, 1537-2715}, + doi = {10.1198/106186007X178663}, + pages = {1--23}, + number = {1}, + journaltitle = {Journal of Computational and Graphical Statistics}, + shortjournal = {Journal of Computational and Graphical Statistics}, + author = {Gentleman, Robert and Temple Lang, Duncan}, + urldate = {2024-11-27}, + date = {2007-03}, + langid = {english}, +} +@article{Ince2012, + title = {The case for open computer programs}, + volume = {482}, + issn = {0028-0836, 1476-4687}, + url = {https://www.nature.com/articles/nature10836}, + doi = {10.1038/nature10836}, + pages = {485--488}, + number = {7386}, + journaltitle = {Nature}, + shortjournal = {Nature}, + author = {Ince, Darrel C. and Hatton, Leslie and Graham-Cumming, John}, + urldate = {2023-10-24}, + date = {2012-02}, + langid = {english}, +} +@book{NASEM2019, + location = {Washington, D.C.}, + title = {Reproducibility and Replicability in Science}, + isbn = {978-0-309-48616-3}, + publisher = {National Academies Press}, + author = {{National Academies of Sciences, Engineering, and Medicine}}, + urldate = {2024-11-27}, + date = {2019-09-20}, + doi = {10.17226/25303}, +} +@article{Knuth1984, + title = {Literate Programming}, + volume = {27}, + issn = {0010-4620, 1460-2067}, + doi = {10.1093/comjnl/27.2.97}, + pages = {97--111}, + number = {2}, + journaltitle = {The Computer Journal}, + shortjournal = {The Computer Journal}, + author = {Knuth, D. E.}, + urldate = {2024-11-27}, + date = {1984-02-01}, + langid = {english}, +} +@incollection{Leisch2002, + location = {Heidelberg}, + title = {Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis}, + isbn = {978-3-7908-1517-7 978-3-642-57489-4}, + url = {http://link.springer.com/10.1007/978-3-642-57489-4_89}, + shorttitle = {Sweave}, + pages = {575--580}, + booktitle = {Compstat}, + publisher = {Physica-Verlag {HD}}, + author = {Leisch, Friedrich}, + editor = {Härdle, Wolfgang and Rönz, Bernd}, + urldate = {2024-11-27}, + date = {2002}, + langid = {english}, + doi = {10.1007/978-3-642-57489-4_89}, +} +@book{Xie2015, + location = {Boca Raton, {FL}}, + edition = {Second edition}, + title = {Dynamic documents with {R} and knitr}, + isbn = {978-1-4987-1697-0 978-1-315-36070-6 978-1-4987-8739-0 978-1-315-38248-7}, + series = {The R series}, + abstract = {Quickly and Easily Write Dynamic Documents Suitable for both beginners and advanced users, Dynamic Documents with R and knitr, Second Edition makes writing statistical reports easier by integrating computing directly with reporting. Reports range from homework, projects, exams, books, blogs, and web pages to virtually any documents related to statistical graphics, computing, and data analysis. The book covers basic applications for beginners while guiding power users in understanding the extensibility of the knitr package. New to the Second Edition A new chapter that introduces R Markdown v2 Changes that reflect improvements in the knitr package New sections on generating tables, defining custom printing methods for objects in code chunks, the C/Fortran engines, the Stan engine, running engines in a persistent session, and starting a local server to serve dynamic documents Boost Your Productivity in Statistical Report Writing and Make Your Scientific Computing with R Reproducible Like its highly praised predecessor, this edition shows you how to improve your efficiency in writing reports. The book takes you from program output to publication-quality reports, helping you fine-tune every aspect of your report}, + pagetotal = {1}, + publisher = {{CRC} Press}, + author = {Xie, Yihui}, + date = {2015}, +} +@inproceedings{Kluyver2016, + booktitle = {Positioning and Power in Academic Publishing: Players, Agents and Agendas}, + editor = {Fernando Loizides and Birgit Scmidt}, + title = {Jupyter Notebooks – a publishing format for reproducible computational workflows}, + author = {Thomas Kluyver and Benjamin Ragan-Kelley and Fernando P{\'e}rez and Brian Granger and Matthias Bussonnier and Jonathan Frederic and Kyle Kelley and Jessica Hamrick and Jason Grout and Sylvain Corlay and Paul Ivanov and Dami{\'a}n Avila and Safia Abdalla and Carol Willing and Jupyter development team}, + publisher = {IOS Press}, + address = {Netherlands}, + year = {2016}, + pages = {87--90}, + doi = {10.3233/978-1-61499-649-1-87}, + abstract = {It is increasingly necessary for researchers in all fields to write computer code, and in order to reproduce research results, it is important that this code is published. We present Jupyter notebooks, a document format for publishing code, results and explanations in a form that is both readable and executable. We discuss various tools and use cases for notebook documents.} } \ No newline at end of file