-
Notifications
You must be signed in to change notification settings - Fork 1
/
genomics.html
484 lines (393 loc) · 12.9 KB
/
genomics.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
---
layout: reveal_markdown
title: "Course overview and introduction to computational genomics"
tags: slides
date: 2021-12-09
---
## {{ page.title }}
---
## Course overview
- Course website: https://cphg.github.io/compgen/
- Scribing assignments
- Final presentations
- Programming assignments: GitHub
---
## git and GitHub in 5 slides
---
### git/GitHub serve multiple purposes
1. Backup
2. Version control
3. Distributing/sharing code
4. Testing
5. Communication and collaboration
---
### Basics of git repos
1. A git repository is a folder with a `.git` subfolder
2. The `.git` subfolder stores an index of snapshots
3. Commits track the repository, not individual files.
4. Git uses a model of distributed *clones*
---
<div class="row">
<div class="col">
Centralized (svn)
<img src="images/introduction/centralized.png" style="margin-top:140px">
</div>
<div class="col">
Distributed (git)
<img src="images/introduction/distributed.png">
</div>
</div>
<span class="small"><a href="https://git-scm.com/">ProGit</a></span>
---
### Basics of git branching
1. Branches are pointers to commits. They are cheap.
2. Combine branches with either *merge* or *rebase*.
3. Branching workflows can be opinionated<br>(*e.g*, gitflow, threeflow, trunk development)
---
<div class="row">
<div class="col">
<img src="images/introduction/basic-branching-4.png" width="425" style="margin-bottom:65px">
<pre><code class="small" style="line-height:1em">$ git checkout -b hotfix
Switched to a new branch 'hotfix'
$ vim index.html
$ git commit -a -m 'Fix broken email address'
[hotfix 1fb7853] Fix broken email address
1 file changed, 2 insertions(+)
</code></pre>
</div>
<div class="col fragment">
<img src="images/introduction/basic-branching-5.png" width="425">
<pre><code class="small">$ git checkout master
$ git merge hotfix
Updating f42c576..3a0874c
Fast-forward
index.html | 2 ++
1 file changed, 2 insertions(+)
</code></pre>
</div>
</div>
<span class="small"><a href="https://git-scm.com/">ProGit</a></span>
---
## Introduction to computational genomics
This intro will cover:
1. Definitions of genomics and related fields
2. Brief history of genomics
3. Core biological concepts and technology
4. Overview of sequencing technologies
5. The genomic data explosion and computation
6. How genomics links to business, ethics, and health
---
### Genomics is the study of genomes What is a genome?
<div class="fragment">
> A complete set of genetic information.
### What is genetic?
</div>
<div class="fragment">
> Relating to origin.
### What is genomics?
</div>
<div class="fragment">
> The study of the instructions needed to originate an organism.
</div>
---
### Early history of genomics
- Discovery of DNA (1871)
- Discovery of nucleotides (1910)
- DNA is the genetic material (1944)
- Discovery of structure of DNA (1953)
- The central dogma (1957)
<span class="small">See https://www.yourgenome.org/facts/timeline-history-of-genomics</span>
---
<!--
<div class="row">
<div class="col">
1871: Discovery of DNA
<img
src="images/introduction/avery_macleod_mccarty_experiment_02_yourgenome.png"
height="520"
style="background:white; padding: 10px; margin: 0px"><br>
<span class="small">Avery et al. 1944</span>
</div>
<div class="col">
1910: Nucleotides
<img
src="images/introduction/bases.png"
height="320"
style="background:white; padding: 10px; margin: 0px"><br>
<span class="small">Kossell 1910</span>
</div>
</div>
<span class="small">Image credits: Genome Research Limited, nobelprize.org</span> -->
<div class="row">
<div class="col">
1910: Nucleotides
<img
src="images/introduction/bases.png"
height="320"
style="background:white; padding: 10px; margin: 0px"><br>
<span class="small">Albrecht Kossel Nobel Prize:<br><a href="https://www.nobelprize.org/prizes/medicine/1910/summary/">The Chemical Composition of the Cell Nucleus</a></span>
</div>
<div class="col">
1944: DNA is genetic
<img
src="images/introduction/avery_macleod_mccarty_experiment_02_yourgenome.png"
height="520"
style="background:white; padding: 10px; margin: 0px"><br>
<span class="small">Avery et al. 1944</span>
</div>
</div>
<span class="small">Image credits: Nobel, Genome Research Limited</span>
---
<div class="row">
<div class="col">
1953: Structure of DNA
<img
src="images/introduction/dna_structure.png"
height="520"
style="background:white; padding: 10px; margin: 0px"><br>
<span class="small"><a href="https://www.nature.com/articles/171737a0">Watson and Crick</a>, <i>Nature</i>, 1953</span>
</div>
<div class="col">
1957: Central dogma
<img
src="images/introduction/dna_central_dogma_yourgenome.png"
height="520"
style="background:white; padding: 10px; margin: 0px"><br>
<span class="small">Crick 1957</span>
</div>
</div>
<span class="small">Image credits: Genome Research Limited</span>
---
### Genomics is intertwined with sequencing technology
> DNA sequencing is the process of determining the *sequence* of nucleotides (ACTG) in a strand of DNA
DNA sequencing = measurement (like a microscope)
---
### History of genome sequencing
- Sanger sequencing (1977-1980)
- The [Human Genome Project](https://www.genome.gov/human-genome-project) (1990-2003)
- The [ENCODE project](https://www.genome.gov/Funded-Programs-Projects/ENCODE-Project-ENCyclopedia-Of-DNA-Elements) (2003-2021)
- Sequencing by synthesis (2008-2012)
- Single-molecule sequencing
- 100k genomes sequenced
---
<div class="row">
<div class="col">
1977: Sanger sequencing
<img
src="images/introduction/Radioactive_Fluorescent_Seq.jpg"
height="520"
style="background:white; padding: 10px; margin: 0px"><br>
<span class="small">Chain Termination Method</span>
</div>
<div class="col">
2001: Human genome
<img
src="images/introduction/Logo_HGP.jpg"
height="320"
style="background:white; padding: 10px; margin-top: 90px"><br>
<span class="small"></span>
</div>
</div>
<span class="small">Image credits: Wikipedia</span>
---
### 2008: Sequencing by synthesis
<iframe width="560" height="315" src="https://www.youtube.com/embed/v10bUR2aL5g?start=35" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<a href="https://www.youtube.com/embed/v10bUR2aL5g?start=35" class="small">Illumina</a>
---
### Single-molecule sequencing
<div class="row">
<div class="col">
<img src="images/introduction/pavlovic2020.png" height=620>
</div>
<div class="col">
<li>2004: Helicos</li>
<li>2009: SMRT</li>
<li>2009: Nanoball</li>
<li>2016: Nanopore</li>
<span class="small"><a href="https://doi.org/10.1016/B978-0-12-816664-2.00009-8">Pavlovic et al. 2020. Microbiomics.</a></span>
</div>
</div>
---
<div class="row">
<div class="col">
### Contrasting sequencing platforms
- Size of machine
- Length of reads
- Number of reads
- Cost of machine
- Cost of reads
- Accuracy of reads
- Technician time
</div>
<div class="col">
<img src="images/introduction/levy2016.png" height=340>
<span class="small"><a href="https://doi.org/10.1146/annurev-genom-083115-022413">Levy and Myers 2016</a></span>
<h4>Other systems</h4>
<div class="small">
<li><a href="http://www.biorxiv.org/content/early/2016/04/13/048603">Bionano: Saphyr system</a></li>
<li><a href="https://en.wikipedia.org/wiki/Pyrosequencing">Roche 454 Pyrosequencing</a></li>
<li><a href="https://www.thermofisher.com/us/en/home/brands/ion-torrent.html">ThermoFisher IonTorrent</a></li>
<li><a href="https://en.wikipedia.org/wiki/Polony_sequencing">Polony sequencing</a>
</div>
</div>
</div>
---
### Sequencing costs
<iframe src="https://databio.org/seqcosts/cost.html" width="1250" height="550"></iframe>
<a style="font-size:0.6em" href="https://databio.org/seqcosts">databio.org/seqcosts</a>
---
### Sequencing
- Sequencing technology doesn't just measure DNA
Genome → transcriptome → epigenome
<div class="fragment">Convert what you want to measure to DNA, then use DNA sequencing technology.</div>
---
### Transcriptomics
> The study of the RNA molecules produced by a cell.
<span class="fragment">
Complementary DNA (cDNA) is a DNA copy of a messenger RNA (mRNA) molecule produced by reverse transcriptase, a DNA polymerase that can use either DNA or RNA as a template. (Encyclopedia of Genetics)
</span>
---
### Epigenomics
> The study of the chemical modification and physical conformation of cellular DNA and bound proteins
---
<img src="images/introduction/rosa2013_chromatin.png" width="550"><br>
<span class="small">Protocols: Bisulfite-seq, ChIP-seq, ATAC-seq, Hi-C</span>
<br><span class="small">Rosa et al. 2013</span>
---
### Units
<div class="row">
<div class="col">
<b>Genomics</b>
<li class="small">sequence alignment</li>
<li class="small">genome assembly</li>
<li class="small">variant calling</li>
</div>
<div class="col">
<b>Epigenomics</b>
<li class="small">short-read reference mapping</li>
<li class="small">string models </li>
<li class="small">genomic intervals</li>
</div>
<div class="col">
<b>Transcriptomics</b>
<li class="small">dimensionality reduction</li>
<li class="small">k-mer algorithms</li>
<li class="small">differential expression</li>
</div>
</div><br><br>
### Focus
Algorithms and methods motivated by biology
<!--
- Public human genome funding comes from the [National Human Genome Research Institute](https://www.genome.gov).
-->
---
### Genomics today
- Genomics and health
- Genomics and business
- Genomics and evolution
- Genomics and privacy
- Genomics and ethics
- Genomics and computing
---
### Genomics and health: traits
![](images/introduction/polygenic.svg)
---
### Genomics and health: GWAS
![](images/introduction/GWAS_Fact-sheet2020.jpg)
---
### Genomics and health: GWAS
<video width="960" height="620" controls autoplay="true">
<source src="http://cloud.databio.org.s3.amazonaws.com/courses/GWAS_slideshow.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
---
### Genomics and health: gene/environment
![](images/introduction/gene-by-env.svg)
---
### Genomics and health:<br>personalized medicine
![](images/introduction/all_of_us.png)
---
### Genomics and health: epigenetics
- epimutations
- cancer classification
- clinical epigenetic assays
---
### Genomics and business
- Genome-stratified clinical trials
- mRNA vaccines: Pfizer, Moderna
- Clinical genetic tests
- Direct-to-consumer (DTC) genetic testing:
- 23andme, ancestry.com, etc.
---
### Genomics and law
- Celera and the Human Genome Project
- intended to patent human genes in the race to finish the genome
---
### Genomics and law
- Uni. of Utah/NIEHS/Myriad patent BRCA1 (1994)
- Myriad sold diagnostic tests for BCRA genes
- Supreme Court struck down gene patents (2013)
> A naturally occurring DNA segment is a product of nature and not patent eligible merely because it has been isolated, but cDNA is patent eligible because it is not naturally occurring.
---
### Genomics and law
- Lawrence Berkeley Laboratory tested employees without consent
- Norman-Bloodsaw v. Lawrence Berkeley Laboratory (1998)
> Holds that unauthorized employer testing for sensitive medical information [specifically including genetic testing] violates employees' right to informational privacy
---
### Genomics and law
- In the US: The Genetic Information Nondiscrimination Act of 2008
> To prohibit discrimination on the basis of genetic information with respect to health insurance and employment.
See also: [Genetic Information Nondiscrimination Act](https://www.eeoc.gov/statutes/genetic-information-nondiscrimination-act-2008)
---
### Genomics and ethics
- genetic ownership (patents, family dynamics)
- genetic discrimination (employment, insurance)
- genetically modified babies
- identification and privacy
---
### Genomics and privacy: Crypt4GH
![](images/introduction/Crypt4GH_comic.png)
See also: [Genome.gov on Genomic Privacy](https://www.genome.gov/about-genomics/policy-issues/Privacy)
---
### Computing is everywhere in genomics
<div class="row">
<div class="col small">
<li>Genome assembly</li>
<li>Genome alignment</li>
<li>Variant identification</li>
<li>Variant associations</li>
<li>Phylogenetics</li>
<li>Viral evolution</li>
<li>Comparative genomics</li>
<li>Microbiome</li>
</div>
<div class="col small">
<li>Data privacy and encryption</li>
<li>Data models</li>
<li>Data integration</li>
<li>Data sharing</li>
<li>Data compression</li>
<li>Data storage</li>
<li>Computing reproducibility</li>
<li>Single-cell sequencing</li>
</div>
<div class="col small">
<li>Genomic data structures</li>
<li>Outcome prediction</li>
<li>Disease subtyping</li>
<li>Gene prediction</li>
<li>Gene network regulation</li>
<li>Cell-type definition</li>
<li>Epigenetics</li>
<li>Metagenomics</li>
</div>
</div>
---
### Conclusion
It's an interesting time for computational genomics.<br>
<div class="fragment">
The world is starving for people who can apply advanced computation to growing genomic data.
</div><br>
<div class="fragment">
Looking forward to a fun and enlightening semester.
</div>