-
Notifications
You must be signed in to change notification settings - Fork 0
Home
⚠️ This wiki is a work in progress!⚠️
This repository contains the basic skeleton I use for my research projects. In an effort to contribute something other than papers, I share both the source code and basic folder structure, as well as some more verbose, informative material (i.e., this wiki). The intention is to introduce fellow researchers in Economics and Finance to a standardized, disciplined and opinionated workflow.
- I propose a research workflow for economists.
- The workflow is composed of four main complementary parts: a folder structure, an automated build system, a Version Control System (VSC) and semantic versioning guidelines.
- Auxiliary parts include: Wikis, Issues and Pages.
- I explain why you want to adopt this workflow.
- This is not teaching material. It is a full-fledged contribution to research.
The replicability crisis has hit Economics and Finance as well as other fields. In a response to this, several journals such as the American Economic Review and the Journal of Financial Economics have introduced the role of Data Editor. A data editor requests and verifies replication material publishing a paper. Data editors often encounter challenges to replicating research. In many instances, data is proprietary and cannot be shared by authors, with the potential to make replication expensive, if not impossible. In other cases, replication is difficult because the code is poorly documented, program dependencies are hardly explained, tables may be constructed manually instead of programmatically, or a mix of the above. On the other hand, these challenges to replication may hold back a paper and delay its publication even for several months.
To tackle this and other issues, I propose a workflow for research that follows conventions followed by software development communities. The workflow is informed by four main elements. First, files are organized in a disciplined folder structure, whereby the contents of certain folders are supposed to be modified by users, while other folders are populated and modified exclusively by the programs written. This enables an organized workspace where research authors (and their assitants) can easily find and track files. Second, the workflow employs an automated build system, whereby one executable file allows to run all the necessary code to replicate a paper. The build system of choice is GNU Make, which works with any program that has a command-line interface (CLI). From downloading data to typesetting the paper in LaTeX, a build system helps ensure that a research project is fully replicable with minimal effort and no ambiguity on the order of steps required. Third, changes to each file are versioned using a Version Control System (VCS). The VCS I chose is git. File versioning is useful because it describes changes and their motivation in convenient bite-sized chunks. Writing and reading messages associated to each change should provide context and explain why those changes were committed. Finally, the whole project follows semantic versioning, whereby certain points in time are flagged with a version number. Differently from file versioning, semantic versioning applies to the entire project and allows for tracking significant changes, such as different versions of the paper and its underlying code and data.
The workflow I propose may use an online platform. This is obviously GitHub in the case of this work, although other options are available such as GitLab and BitBucket. Using such platforms unlocks auxiliary elements for the workflow. Wikis (such as this one) allow for verbose documentation of a research project, which is useful in the development phase of research ideas. They may also contain specific instructions that cannot be written in code, such as the download procedure for proprietary data. Issues are useful development tracking devices, which allow for documenting problems, questions and, obviously issues. Issues can be opened, commented and closed, allowing researchers to easily track (and quantify) the work they do. Issues can be thought of as to-do lists that can be commented by collaborators. Finally, Pages enable web rendering of information related to research project, which can either result in a single page on an author's website or in a full-fledged standalone website.
This repository, and this wiki, is not the first treatment of standardized research workflows. Other authors in Economics and Finance have already proposed or discussed some essential elements. For example, Micheal Stepner has discussed the use of conventions in file management, code writing and folder structures, as well as proposed a basic build system implemented in Stata. Jesus Fernandez-Villaverde has written educational material about the use of git and of GNU Make, predominantly with students in mind. Perhaps closest to my work are Gentzkow and Shapiro (2014), who provide a Practitioner's Guide for empiricists. They provide a set of rules for working with code and data, and explain the related supporting arguments by means of examples.
My work adds a repository template to existing contributions. Importantly, I also introduce the concept of semantic versioning, which is known in other fields and virtually unused in Economics. I bring together this and other existing contributions into a coherent, complete framework. I provide a ready-to-use repository template that is fully documented and that makes use of all elements in the workflow I provide. I additionally provide arguments and examples about the use of GitHub's Wikis, Issues and Pages to enhance documentation, collaboration and publicity.
The sidebar on the right contains an ordered list of topics contained in this wiki. Its logical organization follows the conventional one in a research paper. Section 2 motivates the need for a standardized and disciplined workflow. Section 3 explains the folder structure I propose, which is clearly inspired by conventions found in software developers communities. Section 4 illustrates the use of GNU Make and provides operational examples. Section 5 shows git, discusses its features and proposes ways to effectively use it in a research context. Section 6 introduces the concept of semantic versioning though the use of git's tags, and explains one way researchers can use if effectively. Section 7 explains the use of online repository platforms such as GitHub. Section 8 briefly introduces some tools I use, as well as some references on how to set up While it refers to GitHub extensively, its contents are easily ported to other choices such as GitLab. Finally, Section 9 summarizes and concludes.
The content in this wiki is licensed under the Creative Commons Attribution 4.0 International License. You can adapt or redistribute the text, even for commercial purposes, so long as you provide attribution.
For any feedback, open an issue or send me an email at the address you find on my website.