forked from wconnell/intro-comp-wrkflw
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtutorial.txt
363 lines (226 loc) · 8.94 KB
/
tutorial.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
## A Simple Computational Workflow
## PSPG 245B.2 Systems Pharmacology
## W. Connell
## 2020.02.13
# open terminal
# git should be installed system wide if you used the
# download/install link provided
# check it is installed with
# if there are issues in the follow steps reference
# https://help.github.com/en/github/using-git/setting-your-username-in-git
git --version
# set your global github username and email
git config --global user.name "your_username"
git config --global user.email "your_email@gmail.com"
# check this
git config --global user.name
git config --global user.email
# create a parent directory for your GitHub repositories
# this is unrelated to the conda environment or git
# this is just part of your file tree
# which is accessible when you are in any conda env
mkdir github
cd github
# fork my GitHub repository
# go to https://github.com/wconnell/intro-comp-wrkflw
# and in the upper right corner click "Fork"
# this will create an exact copy of my repo in your
# GitHub account
# this will not track any changes I make to mine
# for collaborative software development you can set
# a fork up to track changes and update them based on the
# repo you have forked from
# go to your forked repo version
# click "clone or download" (green button)
# copy the text
# clone the forked repo
git clone git@github.com:your_username/intro-comp-wrkflw.git
# this will fail unless you have already set up a connection
# between your computer and github via ssh
# reference this webpage to setup a ssh key on github
# so the github server will recognize your computer
# Follow the directions on the site closely...
# First check if you have a ssh key on your computer! you should!
# avoid generating extra ssh keys! they clutter everything up!
# https://help.github.com/en/github/authenticating-to-github/adding-a-new-ssh-key-to-your-github-account
# once you have completed this (fingers crossed...)
# you can go back and clone the forked repo
git clone git@github.com:your_username/intro-comp-wrkflw.git
# move into the cloned directory
# and check the version control status
cd intro-comp-wrkflw
git status
# inspect the directory and
# the first few lines of the environment file
ls
head environment.yml
# the environment.yml file contains directions
# for conda to create a new environment
# which I have defined with a name and set of packages
# activate it following successful creation/installation
conda env create -f environment.yml
conda activate intro-comp-wrkflw
# remember that although the conda environment has the same
# name as the git repo, these are separate!
# you can activate a different conda env and still
# manipulate this repo
# things like the .ipynb file may not run because
# it has a set of dependencies (python packages) that the
# conda environment is required to have in order for proper usage
# we will go through an exercise to highlight this concept
# and introduce some more funcitonality
# run jupyter lab
# you may need to try a different port, 8888 is always a good one
# although if you are already running a notebook on your computer
# it may not be available
jupyter lab --port=4200 --no-browser &
# go to your browser and type in
localhost:4200/lab
# you will probably be prompted with a set of directions
# to use a token to set up a password
# do this, so you just have a simple password
# when starting a new jupyter lab session
# you should be able to navigate through directories on the right
# next, simply click on unsupervised.ipynb
# run the first code cell (the package imports)
# it should be successfull (no output)
# notice that you can open .yml and .txt file as well
# jupyter lab is really powerful
# you can use it as an interactive computing environment,
# development environment, and file tree GUI
### lets go back and see how conda environment dependencies
# affect your computation environment
# open a new terminal tab or window
# and create an empty conda env
conda create -n test
conda activate test
# check that it is empty, i.e. there no packages installed
# and then install jupyter lab
conda list
conda install jupyterlab
# start a jupyter session on a new port
jupyter lab --port=6200 --no-browser &
# open the session in the browser using the same
# method defined previously
localhost:6200/lab
# next click open the file below
# and execute the first cell in
unsupervised.ipynb
# this should fail due to package dependencies
# proving the function of the repo is independent of
# the environment
# go back to the terminal window and check
# where your notebooks are running
jupyter notebook list
# I have often found difficulty cleaning up unused notebooks
# there is often some bug with the "jupyter notebook stop `port`"
# so I get the process ID of the notebook and kill that
# I'll elaborate on this further
lsof -nti:6200 | xargs kill -9
# this notebook should be gone when you run
jupyter notebook list
# **I truly never open multiple jupyter lab sessions
# because you can work on mutiple notebooks at once
# under different environments
# you need to install nb_conda_kernels in your
# base conda environment
# then, you can run jupyter lab out of `base` env
# every time, and go to the tab on the right side
# of the jupyter session and select whichever conda env
# you would like to use
# kill and exit your jupyter lab session (at 4200)
# from the browser, under `file` select shutdown
# now go to your terminal and install this package in
# your base environment
# you don't have to explicitly switch to the base env
# to do this (just stay in the intro-comp-wrkflw env)
# heres the site for reference
# https://github.com/Anaconda-Platform/nb_conda_kernels
conda install -n base nb_conda_kernels
# start jupyter lab again
jupyter lab --port=4200 --no-browswer &
# now just go to your browser and refresh the tab at
localhost:4200/lab
# open unsupervised.ipynb go to the right side
# there is a tab that probably says `Python [...]`
# select the drop down and test running this cell
# when you switch between your two env recently created
# intro-comp-wrkflw and test
### a couple more helpful conda commands...
# you can rollback your environment to a working status
# if you install some packages and it screws everything up
# first open a new terminal tab/window
# and activate your test env, then
# list the packages installed in your env
conda activate test
conda list
# too see the history of your environment updates
# helpfully it shows the date
conda list --revisions
# to rollback to a previous env status
# where the number is the approriate revision number
conda install --revision 0
# check your packages installed now
conda list
# this has saved me a number of times when I have
# had a very specific environment working
# and then screwed it up with some random package update
# although, conda is usually really good a figuring
# out how to make everything work together seamlessly
# lets remove this test env since we don't want it
# hanging around taking up space
# go back to your base environment
# btw, don't use your base env for any real development
conda activate base
conda env list
# now you can delete the test environment
conda remove -n test
conda env list
### now we will do a git commit
# go to your original jupyter lab session in browser
# open README.md
# add a line to the file:
~ anything you like ~
# save the file
# go to your terminal and check what git has tracked
git status
# you will see there are changes not staged for commit
git add README.md
git status
# when you want to add all modified and untracked files at once
# a good shortcut is
# git add -A
# now the changes are staged
# and you can commit them
# you need to provide a message for the commit
# I have details about `add` vs `commit`
git commit -m 'look ma first commit!'
# now run
git status
# you will see your local branch is ahead of the
# origin (remote repository) by 1 commit
# now you can push to update the changes to remote
git push
# go and check your repo on github now
# and your changes will be there
# now imagine you do this for on any computer you work on
# and always be able to access your code (along with others)
# it isn't good practice to store signifcant amounts of data
# on github (there is a low limit)
# so you need to figure out ways to be able to still
# develop your analysis even when you don't have access to
# all of your data
######################################################
# For an introduction to some of the basics of
# scientific computing in python see the the
# "Python Data Science Handbook"
# and start on section 2
# you can run these notebooks interactively on google cloud
# just by clicking the 'colab' button!
https://jakevdp.github.io/PythonDataScienceHandbook/
# For those comfortable with python, explore the PCA notebook
# in your browser, working/reading through it
# you can also find this exercise in the table of contents
# in the link below above in section 5.09
# all credit for this phenomenal resource goes to
# Jake VanderPlas