-
Notifications
You must be signed in to change notification settings - Fork 0
/
just-a-little-more-python.qmd
223 lines (166 loc) · 4.96 KB
/
just-a-little-more-python.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
---
title: "Just A Little More Python"
subtitle: "than what I could cover in Just Enough Python"
date: last-modified
engine: knitr
---
::: {.callout-warning}
This tutorial is under construction.
:::
I wish I had time to cover everything in-depth, but then I'd have to change the
title from "[Just Enough Python](index.qmd)" to "Way More Python
Than We Could Possibly Cover In An Hour".
So as a compromise, here's just a little more Python.
<!--
# topics i wish i had time to cover
## mutability
## iterables, iterators, generators
### range returns somehting like a generator. lazy evaluation.
## reading & writing files
## tupes, named tupes. for snakemake wildcards
## snakemake params - use lambda with wildcards & output args
TODO
-->
### more comprehensions
#### dictionary comprehensions
```{python}
#| filename: "Python"
movies = ['star wars episode V', 'the godfather', 'the dark knight']
years = [1980, 1972, 2008]
movie_dict = {}
for movie, year in zip(movies, years):
movie_dict[movie] = year
movie_dict
```
```{python}
#| filename: "Python"
movies = ['star wars episode V', 'the godfather', 'the dark knight']
years = [1980, 1972, 2008]
movie_dict = {movie: year for movie, year in zip(movies, years)}
```
#### set comprehensions
Use a set instead of a list when order doesn't matter and you want every item
to be unique.
```{python}
#| filename: "Python"
myset = set([1,2,2,2,3,3,4])
print(myset)
```
```{python}
#| filename: "Python"
presidents = ["George Washington", "John Adams", "Thomas Jefferson",
"James Madison Jr", "James Monroe", "John Quincy Adams"]
first_names = {name.split()[0] for name in presidents}
print(first_names)
```
Looking up whether a set contains an item is more efficient than using a list.
## packages
Using packages works a little differently between R and Python.
In R, you can refer to a function from any package you have installed with double colons:
```{R pkg-syntax}
#| filename: R script
#| eval: false
dplyr::tibble(a = 1:3, b = 4:6)
```
Alternatively, you can first load the package at the top of your script so you
can call the functions you want to use without specifying the package name every
time you use them.
```{R library}
#| filename: R script
#| eval: false
library(dplyr)
tibble(a = 1:5, b = 6:10) %>%
filter(a > 3)
```
In Python, you must `import` a package before you can use anything from it and
reference the package name like this:
```{python import-pkg}
#| filename: Python script
#| eval: false
import pandas
pandas.DataFrame(data = {'a': [1,2,3], 'b': [4,5,6]})
```
You can give a package a nickname to cut down on the characters you have to type.
`pandas` is often renamed to `pd`:
```{python import-pkg-pd}
#| filename: Python script
#| eval: false
import pandas as pd
DataFrame(data = {'a': [1,2,3], 'b': [4,5,6]})
```
Alternatively, you can choose to import only the functions you want to use from a package:
```{python import-pkg-dataframe}
#| filename: Python script
#| eval: false
from pandas import DataFrame
DataFrame(data = {'a': [1,2,3], 'b': [4,5,6]})
```
## object-oriented programming
I wish I had enough time to cover object-oriented programming (OOP) in depth,
but we're only going to scratch the surface.
Everything in Python is an object. This is also true in R.
You can find out what class an object belongs to in Python with `type()`:
```{python}
#| filename: "Python"
numbers = [1,2,3,4]
type(numbers)
name = "Kelly"
type(name)
type(name) == str
```
or in R with `class()`:
```{r}
#| filename: "R"
class(1:3)
class(list(1, 2, 3))
name <- "Kelly"
class(name)
class(name) == 'character'
```
#### Objects of different classes have different methods and attributes.
- methods: functions that operate on an object.
- attributes: metadata associated with an object.
Lists have methods that can modify them in place
```{python}
#| filename: "Python"
print(numbers)
numbers.append(5)
numbers.reverse()
print(numbers)
```
You can split a string into a list based on a separator character with `split()`:
```{python}
#| filename: "Python"
filename = "path/to/file.txt"
filename.split('.')
filename.split('.')[0]
```
Use `join()` to do the opposite of `split()`:
```{python}
#| filename: "Python"
column_names = ['patient_id', 'sample_id', 'collection_date']
','.join(column_names)
```
You can even define your own classes! Use the `class` keyword and define `__init__` -- the function that _initializes_ a new instance of the class.
```{python}
#| filename: "Python"
class Dog:
def __init__(self, name):
self.name = name
def is_good(self):
return True
```
`name` is an attribute of the Dog class. `is_good()` is a method of the Dog class.
Create a new instance of the `Dog` class and try using it:
```{python}
fido = Dog('fido')
print(fido.name)
print(fido.is_good())
```
TODO explain how this works in Snakemake input/output/params etc
Take advantage of object-oriented programming to DRY your code by referencing
output files from other rules.
<!--
## reading & writing files
TODO
-->