Workflow and best practices

Messaging

The lab primarily communicates via Slack. Let Lauren know if you need to be added. You are encouraged to reduce email use through using Slack. Our lab channel is https://huckley.slack.com/. Lauren checks email only occasionally, so you’re likely to get a faster response through Slack.

Workflow

We aspire to conduct open and reproducible science. This includes sharing data and code and using transparent and reproducible workflows. We primarily use open source software as it is often more flexible and transparent and evolves more quickly. Much of the software used in the lab in available for download from UW. Some resources for best practices for open and reproducible ecology and evolution:

The openscapes initiative encourages a workflow similar to ours that is described here: Our path to better science in less time using open data science tools
Friend of the lab Alison Smith on Elevating The Status of Code in Ecology

File storage

We use the UW Google G Suite resources to store and share date. You are encouraged to use your personal account, where you will have unlimited space available, as well as our lab shared Google Drive, TrEnCh. The TrEnCh drive contains folders for lab safety, protocols, and projects (where you can set up folders for your individual projects). We recommend downloading Drive File Stream to easily access your files from any device. It allows accessing files on demand when you have internet access withut occupying much disk space. UW now limits space in Google Drove, so we have archived some information in Microsoft OneDrive. The OneDrive app allows accessing folders from the desktop or command line. Let Lauren know if you need to be added to the lab TrEnCh Google Drive (or OneDrive, but most will likely not need access).

Version control

You are expected to use Github to version control your code and other files. We have Github organizations for the Huckley Lab, TrEnCh project, and the TrEnCh project education website. A favorite introduction to Github is here.

Code

Lauren primarily works in R and R studio since it has a great ecology and evolution community including many specialized packages. Other lab members have chosen to primarily use Phython, which also has a strong community and resources. Github integrates easily with R studio (described briefly here and thoroughly here ). Google Drive files can be easily read and written from Rstudio: e.g., setwd("/Volumes/GoogleDrive/Shared drives/TrEnCh/Projects/") (format depends on computer settings).

For analytics, Mathematica and Matlab are useful and available from UW. ArcGIS including ArpMap and ArcInfo is available for download, but much spatial analysis can now be done in R or similar programs.

Computing

Hyak

Adapted from the QERM seminar presentation of Connie Okasaki, John Best, Maria Kuruvilla, Martin Endress, Michele Buonanduci Hyak is UW's shared computing cluster. You can send jobs to run in parallel on Hyak and access their GPU nodes.

Getting access for the first time

Join Mailing list and uw-rcc Slack Team here
Read Hyak Wiki
Take the hyak quiz
Email uwrcc@uw.edu with the subject line “Hyak Account”
Set up 2 Factor Authentication
Add Hyak and Lolo as services here

Requesting nodes

srun -p stf-int -A stf --ntasks=8 --mem=20G --pty /bin/bash -l $${\color{black}-A srun \space\color{red}-p stf-int \space \color{yellow}-A stf \space \color{lightgreen}- -ntasks=8 \space \color{lightblue}- -mem=20G \space\color{purple}{- -pty /bin/bash -l}}$$ $${\color{red}{Partition}}$$ $${\color{yellow}{Account}}$$ $${\color{lightgreen}{Number \space of \space processes (*)}}$$ $${\color{lightblue}{Amount \space of \space RAM}}$$ $${\color{purple}{Command}}$$

Whole Hyak Workflow

Overview

Write and debug your code locally
Set up your compute environment
Transfer data and code to Hyak
Write SLURM script
Submit job
Process and retrieve results

1. Write and debug your code

Write small test examples
Check that code works in serial locally
Check that code works in parallel locally
Debug locally!
Run as a script from command line

2. Set up your compute environment

Modules
- module load r_3.6.0
- May be out of date!
Roll-your-own
- More control
- Better BLAS library for R
- May end up compiling a lot of dependencies

3. Transfer data and code

git via Github
sftp
sshfs

4. Write a SLURM script

Use a template!
- Specify
- Partition
- Account
- Number of nodes
- Number of processors
- RAM
- Time

5. Submit your job Submit:

sbatch longcompute.slurm Check progress:
squeue -u $USER Check resource usage:
ssh n1234
htop -u $USER

6. Process and retrieve results

Can have dependent jobs
Save what you need on gscratch
Use sftp or sshfs to retrieve results

Coding best practices

Use checkpointing where possible
Choose level of parallelism:
- Within-computation (e.g. BLAS)
- Among-computations
Use gscratch to save work

File storage

home: persistent, low performance, limited
gscratch: large, fast, short-term
lolo: extra-large, long-term only

Cloud Computing

Our lab also utilizes new technologies to best do research and share our work. We have a lab AWS account for storage and computation. The Trench-IR website uses the Azure cloud to host images, transform images, and serve the website. More information on our use of cloud computing in Trench-IR can be found here.

Text

Integrating coding and writing manuscripts and other products is a great option and one we encourage you to take on and share with those of us who aren't quite there. We do use R markdown for components of the TrEnCh project and these files. LaTeX is great for equations and is integrated with R markdown or can be used in other applications. LaTeX is also a great way to easily reformat papers into a dissertation that conforms to school policies (see UW LaTeX resources). We also use Google Docs and Slides extensively for collaborative writing. You can use R studio for visual markdown editing.

References

Lauren uses Zotero as an open source reference software program. Styles for many journals are available. There are extensions for citations in word or Google Docs. There are also R packages for citations in Rmd documents (e.g., citr), that we should start using! Zotero can be configured to input citations from Google Scholar by clicking the “Use Zotero for downloaded RIS/Refer files” under Zotero’s preferences menu (using the import into EndNote tab). Unclick the tab if you want Zotero to stop snagging EndNote’s references. The references sync online, so it’s easy to share reference libraries with collaborators. R studio now has capacity for linking to Zotero for citations.

Visualization

We are committed to sharing our research to a broader public. One way we accomplish this goal is using RShiny visualizations of data. These visualization allow end users to interact with scientific data more than possible in a scientific paper. Visualizations also broaden the reach of scientific work outside of the scientific community. For example, our visualizations target a high-school audience, and we work with local teachers to use our visualizations in the classroom. Check out our visualizations on Trench-ED!

Dissemination

You are encouraged to edit our lab website via GitHub. You can edit in R studio and push to GitHub if preferred. You are also encouraged to link to a personal website or create your own page on the lab site. Creating a github repo with your user handle and putting text into the README.md is an easy way to make a simple page Lauren's example.

The TrEnCh project has a Twitter (@TrenchGroup) and Instagram account you are encouraged to use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WorkFlowAndBestPractices.md

WorkFlowAndBestPractices.md

Workflow and best practices

Messaging

Workflow

File storage

Version control

Code

Computing

Hyak

Getting access for the first time

Requesting nodes

Whole Hyak Workflow

Cloud Computing

Text

References

Visualization

Dissemination

Files

WorkFlowAndBestPractices.md

Latest commit

History

WorkFlowAndBestPractices.md

File metadata and controls

Workflow and best practices

Messaging

Workflow

File storage

Version control

Code

Computing

Hyak

Getting access for the first time

Requesting nodes

Whole Hyak Workflow

Cloud Computing

Text

References

Visualization

Dissemination