These guidelines establish our lab's standard practices for using GitHub repositories as lab notebooks. Our past discussions led to these guidelines.
- README.md: Project overview and key links
- Analysis modules: Folders named
00.<name>
,01.<name>
, etc. - Each module contains:
- Notebooks/scripts:
00.<name>.ipynb
,01.<name>.py
, etc. - Subfolders:
input/
,output/
,figures/
- Notebooks/scripts:
Issues:
- Primary method for notes and discussions
- Use deliberate titles (hypothesis/question initially, conclusion when completed)
- Paste figures and analysis snippets for discussion
Pull Requests:
- Use for implementation-specific discussions
- Keep PRs small and focused
- Use fork-and-branch approach
- Use for most analyses (rmd, jupytext, ipynb)
- Use scripts for computationally intensive tasks
- Format code with ruff (see Best Practices)
Use DVC for data versioning.
For large/dynamic datasets:
-
Create separate
-data
repository with the same base name as the main project -
Set up DVC in data repo:
- Clone the
-data
repository and navigate to it - Install DVC:
pip install dvc
- Initialize DVC:
dvc init
- Add remote storage:
dvc remote add -d storage s3://<YOUR S3 URI>
- Commit and push changes
- Clone the
-
Manage data:
- Create a folder for tracked data (e.g., "data")
- Add data to DVC:
dvc add ./data
- Commit and push data.dvc and .gitignore files
-
Update data:
-
When adding or updating data, use:
dvc add ./data git commit data.dvc -m "dataset updates" git push dvc push
-
-
Add data repo as submodule to main project:
-
In the main project repo:
git submodule add <data submodule git repo URL> git submodule update --init --recursive git commit -m "added submodule" git push
-
-
Clean up: Delete the local copy of the data submodule repository created in step 2 to avoid confusion
- Use Conda: one environment per analysis module
- Save as SVG for final versions
- Use
x.generate-figures.ipynb
in each module to- Read data from the output folder
- Produce figures
- Allow figure reproduction without redoing all analysis
- Use
.gitignore
- Use precommit hooks (see webpage for local installation instructions)
- Use
.pre-commit-config.yaml
as the starting point - Run
pre-commit run --all-files
locally - Install precommit CI on your repo https://pre-commit.ci/
- Use
- Use ruff to enforce code style
At present, we use the default settings, so
.ruff.toml
is not required, but we have included and empty config for completeness (and.pre-commit-config.yaml
requires it)
This section is for known issues we're working to improve. We welcome ideas for solutions.
- Analysis Iterations:
- Consider adding optional datetime suffixes to folders/files for multiple iterations of an analysis
- Example:
00.initial_analysis_2023-06-24
,00.revised_analysis_2023-07-15
- Project Management: