Skip to content

Latest commit

 

History

History
130 lines (101 loc) · 9.29 KB

WorkFlowAndBestPractices.md

File metadata and controls

130 lines (101 loc) · 9.29 KB

Workflow and best practices

Messaging

The lab primarily communicates via Slack. Let Lauren know if you need to be added. You are encouraged to reduce email use through using Slack. Our lab channel is https://huckley.slack.com/. Lauren checks email only occasionally, so you’re likely to get a faster response through Slack.

Workflow

We aspire to conduct open and reproducible science. This includes sharing data and code and using transparent and reproducible workflows. We primarily use open source software as it is often more flexible and transparent and evolves more quickly. Much of the software used in the lab in available for download from UW. Some resources for best practices for open and reproducible ecology and evolution:

File storage

We use the UW Google G Suite resources to store and share date. You are encouraged to use your personal account, where you will have unlimited space available, as well as our lab shared Google Drive, TrEnCh. The TrEnCh drive contains folders for lab safety, protocols, and projects (where you can set up folders for your individual projects). We recommend downloading Drive File Stream to easily access your files from any device. It allows accessing files on demand when you have internet access withut occupying much disk space. UW now limits space in Google Drove, so we have archived some information in Microsoft OneDrive. The OneDrive app allows accessing folders from the desktop or command line. Let Lauren know if you need to be added to the lab TrEnCh Google Drive (or OneDrive, but most will likely not need access).

Version control

You are expected to use Github to version control your code and other files. We have Github organizations for the Huckley Lab, TrEnCh project, and the TrEnCh project education website. A favorite introduction to Github is here.

Code

Lauren primarily works in R and R studio since it has a great ecology and evolution community including many specialized packages. Other lab members have chosen to primarily use Phython, which also has a strong community and resources. Github integrates easily with R studio (described briefly here and thoroughly here ). Google Drive files can be easily read and written from Rstudio: e.g., setwd("/Volumes/GoogleDrive/Shared drives/TrEnCh/Projects/") (format depends on computer settings).

For analytics, Mathematica and Matlab are useful and available from UW. ArcGIS including ArpMap and ArcInfo is available for download, but much spatial analysis can now be done in R or similar programs.

Computing

Hyak

Adapted from the QERM seminar presentation of Connie Okasaki, John Best, Maria Kuruvilla, Martin Endress, Michele Buonanduci Hyak is UW's shared computing cluster. You can send jobs to run in parallel on Hyak and access their GPU nodes.

Getting access for the first time
Requesting nodes

srun -p stf-int -A stf --ntasks=8 --mem=20G --pty /bin/bash -l $${\color{black}-A srun \space\color{red}-p stf-int \space \color{yellow}-A stf \space \color{lightgreen}- -ntasks=8 \space \color{lightblue}- -mem=20G \space\color{purple}{- -pty /bin/bash -l}}$$ $${\color{red}{Partition}}$$ $${\color{yellow}{Account}}$$ $${\color{lightgreen}{Number \space of \space processes (*)}}$$ $${\color{lightblue}{Amount \space of \space RAM}}$$ $${\color{purple}{Command}}$$

Whole Hyak Workflow

Overview

  1. Write and debug your code locally
  2. Set up your compute environment
  3. Transfer data and code to Hyak
  4. Write SLURM script
  5. Submit job
  6. Process and retrieve results

1. Write and debug your code

  • Write small test examples
  • Check that code works in serial locally
  • Check that code works in parallel locally
  • Debug locally!
  • Run as a script from command line

2. Set up your compute environment

  • Modules
    • module load r_3.6.0
    • May be out of date!
  • Roll-your-own
    • More control
    • Better BLAS library for R
    • May end up compiling a lot of dependencies

3. Transfer data and code

  • git via Github
  • sftp
  • sshfs

4. Write a SLURM script

  • Use a template!
    • Specify
    • Partition
    • Account
    • Number of nodes
    • Number of processors
    • RAM
    • Time

5. Submit your job Submit:

  • sbatch longcompute.slurm Check progress:
  • squeue -u $USER Check resource usage:
  • ssh n1234
  • htop -u $USER

6. Process and retrieve results

  • Can have dependent jobs
  • Save what you need on gscratch
  • Use sftp or sshfs to retrieve results

Coding best practices

  • Use checkpointing where possible
  • Choose level of parallelism:
    • Within-computation (e.g. BLAS)
    • Among-computations
  • Use gscratch to save work

File storage

  • home: persistent, low performance, limited
  • gscratch: large, fast, short-term
  • lolo: extra-large, long-term only

Cloud Computing

Our lab also utilizes new technologies to best do research and share our work. We have a lab AWS account for storage and computation. The Trench-IR website uses the Azure cloud to host images, transform images, and serve the website. More information on our use of cloud computing in Trench-IR can be found here.

Text

Integrating coding and writing manuscripts and other products is a great option and one we encourage you to take on and share with those of us who aren't quite there. We do use R markdown for components of the TrEnCh project and these files. LaTeX is great for equations and is integrated with R markdown or can be used in other applications. LaTeX is also a great way to easily reformat papers into a dissertation that conforms to school policies (see UW LaTeX resources). We also use Google Docs and Slides extensively for collaborative writing. You can use R studio for visual markdown editing.

References

Lauren uses Zotero as an open source reference software program. Styles for many journals are available. There are extensions for citations in word or Google Docs. There are also R packages for citations in Rmd documents (e.g., citr), that we should start using! Zotero can be configured to input citations from Google Scholar by clicking the “Use Zotero for downloaded RIS/Refer files” under Zotero’s preferences menu (using the import into EndNote tab). Unclick the tab if you want Zotero to stop snagging EndNote’s references. The references sync online, so it’s easy to share reference libraries with collaborators. R studio now has capacity for linking to Zotero for citations.

Visualization

We are committed to sharing our research to a broader public. One way we accomplish this goal is using RShiny visualizations of data. These visualization allow end users to interact with scientific data more than possible in a scientific paper. Visualizations also broaden the reach of scientific work outside of the scientific community. For example, our visualizations target a high-school audience, and we work with local teachers to use our visualizations in the classroom. Check out our visualizations on Trench-ED!

Dissemination

You are encouraged to edit our lab website via GitHub. You can edit in R studio and push to GitHub if preferred. You are also encouraged to link to a personal website or create your own page on the lab site. Creating a github repo with your user handle and putting text into the README.md is an easy way to make a simple page Lauren's example.

The TrEnCh project has a Twitter (@TrenchGroup) and Instagram account you are encouraged to use.