tutorial.txt

## A Simple Computational Workflow
## PSPG 245B.2 Systems Pharmacology

## W. Connell
## 2020.02.13

# open terminal

# git should be installed system wide if you used the
# download/install link provided
# check it is installed with
# if there are issues in the follow steps reference
# https://help.github.com/en/github/using-git/setting-your-username-in-git

git --version


# set your global github username and email

git config --global user.name "your_username"
git config --global user.email "your_email@gmail.com"
# check this
git config --global user.name
git config --global user.email


# create a parent directory for your GitHub repositories
# this is unrelated to the conda environment or git
# this is just part of your file tree
# which is accessible when you are in any conda env

mkdir github
cd github


# fork my GitHub repository
# go to https://github.com/wconnell/intro-comp-wrkflw
# and in the upper right corner click "Fork"
# this will create an exact copy of my repo in your
# GitHub account

# this will not track any changes I make to mine
# for collaborative software development you can set
# a fork up to track changes and update them based on the
# repo you have forked from

# go to your forked repo version
# click "clone or download" (green button)
# copy the text
# clone the forked repo

git clone git@github.com:your_username/intro-comp-wrkflw.git


# this will fail unless you have already set up a connection
# between your computer and github via ssh
# reference this webpage to setup a ssh key on github
# so the github server will recognize your computer
# Follow the directions on the site closely...
# First check if you have a ssh key on your computer! you should!
# avoid generating extra ssh keys! they clutter everything up!

# https://help.github.com/en/github/authenticating-to-github/adding-a-new-ssh-key-to-your-github-account


# once you have completed this (fingers crossed...)
# you can go back and clone the forked repo

git clone git@github.com:your_username/intro-comp-wrkflw.git


# move into the cloned directory
# and check the version control status

cd intro-comp-wrkflw
git status


# inspect the directory and 
# the first few lines of the environment file

ls
head environment.yml


# the environment.yml file contains directions
# for conda to create a new environment
# which I have defined with a name and set of packages
# activate it following successful creation/installation

conda env create -f environment.yml
conda activate intro-comp-wrkflw


# remember that although the conda environment has the same
# name as the git repo, these are separate!
# you can activate a different conda env and still
# manipulate this repo
# things like the .ipynb file may not run because
# it has a set of dependencies (python packages) that the
# conda environment is required to have in order for proper usage

# we will go through an exercise to highlight this concept
# and introduce some more funcitonality

# run jupyter lab
# you may need to try a different port, 8888 is always a good one
# although if you are already running a notebook on your computer
# it may not be available

jupyter lab --port=4200 --no-browser &


# go to your browser and type in

localhost:4200/lab


# you will probably be prompted with a set of directions
# to use a token to set up a password
# do this, so you just have a simple password
# when starting a new jupyter lab session

# you should be able to navigate through directories on the right
# next, simply click on unsupervised.ipynb

# run the first code cell (the package imports)
# it should be successfull (no output)

# notice that you can open .yml and .txt file as well
# jupyter lab is really powerful
# you can use it as an interactive computing environment,
# development environment, and file tree GUI


### lets go back and see how conda environment dependencies
# affect your computation environment

# open a new terminal tab or window
# and create an empty conda env

conda create -n test
conda activate test


# check that it is empty, i.e. there no packages installed
# and then install jupyter lab

conda list
conda install jupyterlab


# start a jupyter session on a new port

jupyter lab --port=6200 --no-browser &

# open the session in the browser using the same
# method defined previously

localhost:6200/lab


# next click open the file below
# and execute the first cell in

unsupervised.ipynb


# this should fail due to package dependencies
# proving the function of the repo is independent of
# the environment

# go back to the terminal window and check
# where your notebooks are running

jupyter notebook list


# I have often found difficulty cleaning up unused notebooks
# there is often some bug with the "jupyter notebook stop `port`"
# so I get the process ID of the notebook and kill that
# I'll elaborate on this further

lsof -nti:6200 | xargs kill -9


# this notebook should be gone when you run

jupyter notebook list


# **I truly  never open multiple jupyter lab sessions
# because you can work on mutiple notebooks at once 
# under different environments

# you need to install nb_conda_kernels in your
# base conda environment
# then, you can run jupyter lab out of `base` env
# every time, and go to the tab on the right side
# of the jupyter session and select whichever conda env
# you would like to use

# kill and exit your jupyter lab session (at 4200)
# from the browser, under `file` select shutdown

# now go to your terminal and install this package in
# your base environment
# you don't have to explicitly switch to the base env
# to do this (just stay in the intro-comp-wrkflw env)
# heres the site for reference
# https://github.com/Anaconda-Platform/nb_conda_kernels

conda install -n base nb_conda_kernels


# start jupyter lab again

jupyter lab --port=4200 --no-browswer &


# now just go to your browser and refresh the tab at

localhost:4200/lab


# open unsupervised.ipynb go to the right side
# there is a tab that probably says `Python [...]`
# select the drop down and test running this cell 
# when you switch between your two env recently created
# intro-comp-wrkflw and test


### a couple more helpful conda commands...
# you can rollback your environment to a working status
# if you install some packages and it screws everything up

# first open a new terminal tab/window
# and activate your test env, then
# list the packages installed in your env

conda activate test
conda list


# too see the history of your environment updates
# helpfully it shows the date

conda list --revisions


# to rollback to a previous env status
# where the number is the approriate revision number

conda install --revision 0


# check your packages installed now

conda list


# this has saved me a number of times when I have
# had a very specific environment working
# and then screwed it up with some random package update

# although, conda is usually really good a figuring 
# out how to make everything work together seamlessly

# lets remove this test env since we don't want it
# hanging around taking up space
# go back to your base environment
# btw, don't use your base env for any real development

conda activate base
conda env list


# now you can delete the test environment

conda remove -n test
conda env list


### now we will do a git commit

# go to your original jupyter lab session in browser
# open README.md
# add a line to the file:

~ anything you like ~


# save the file
# go to your terminal and check what git has tracked

git status


# you will see there are changes not staged for commit

git add README.md
git status


# when you want to add all modified and untracked files at once
# a good shortcut is
# git add -A


# now the changes are staged
# and you can commit them
# you need to provide a message for the commit
# I have details about `add` vs `commit`

git commit -m 'look ma first commit!'


# now run

git status


# you will see your local branch is ahead of the
# origin (remote repository) by 1 commit
# now you can push to update the changes to remote

git push


# go and check your repo on github now
# and your changes will be there

# now imagine you do this for on any computer you work on
# and always be able to access your code (along with others)

# it isn't good practice to store signifcant amounts of data
# on github (there is a low limit)
# so you need to figure out ways to be able to still 
# develop your analysis even when you don't have access to
# all of your data


######################################################

# For an introduction to some of the basics of 
# scientific computing in python see the the
# "Python Data Science Handbook"
# and start on section 2

# you can run these notebooks interactively on google cloud
# just by clicking the 'colab' button!


https://jakevdp.github.io/PythonDataScienceHandbook/


# For those comfortable with python, explore the PCA notebook
# in your browser, working/reading through it
# you can also find this exercise in the table of contents
# in the link below above in section 5.09

# all credit for this phenomenal resource goes to 
# Jake VanderPlas