-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathday2-1-intro.Rmd
417 lines (270 loc) · 10.8 KB
/
day2-1-intro.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
---
title: "Introduction, Day 2"
date: "Point count data analysis workshop, BIOS2 2021, March 16-25"
author: "Peter Solymos"
fontsize: 11pt
urlcolor: blue
output:
beamer_presentation:
theme: "Singapore"
incremental: false
includes:
in_header: preamble.tex
---
```{r setup,include=FALSE}
options(width=53, scipen=999)
library(knitr)
```
# Outline
Day 1
- ~~Introduction~~
- ~~We need to talk about data~~
- ~~A primer in regression techniques~~
Day 2
- Behavioral complexities
- Removal models and assumptions
Day 3
- The detection process
- Dealing with recordings
Day 4
- Putting it all together
- Roadside surveys
***
# Get course materials
1. Visit [https://github.com/psolymos/qpad-workshop/releases](https://github.com/psolymos/qpad-workshop/releases)
2. Download the latest release into a __NEW__ folder
3. Extract the zip/tar.gz archive
4. Open the `workshop.Rproj` file in RStudio (or open any other R GUI/console and `setwd()` to the directory where you downloaded the file)
5. Move your __LOCAL__ files into the new folder to keep things together
***
# Local copy
Avoid conflicts as we update the workshop materials: work in a __LOCAL__ copy of the R markdown files
```{r eval=FALSE}
source("src/functions.R")
qpad_local(day=2)
```
LOCAL copies will not be tracked and overwritten by git. You can copy new files over and it will not impact your local copies.
***
# Update bSims
Patched the Shiny apps, please update!
```{r eval=FALSE}
remotes::install_github("psolymos/bSims")
```
***
# What is detectability?
In the most colloquial terms, $\delta$ is the probability
that a species is detected given it is present:
$$P(Y>0 \mid N>0)$$
***
# Occupancy
In an occupancy framework, we can have:
- A detection: true positives (false positive rate is 0)
- $P(Y>0) = P(Y>0 \mid N>0) P(N>0) + P(Y>0 \mid N=0) P(N=0)$
- $P(Y>0) = \delta \varphi + 0 (1-\varphi) = \delta \varphi$
- A non-detection: false negatives + true negatives
- $P(Y=0) = P(Y=0 \mid N>0) P(N>0) + P(Y=0 \mid N=0) P(N=0)$
- $P(Y=0) = (1-\delta) \varphi + 1 (1-\varphi) = 1- \delta \varphi$,
(These are the marginal probabilities used to estimate the parameters using maximum likelihood.)
***
# Side note on occupancy
People often confuse these two conditional probabilities:
1. Observing 0 given that the species is present:
- $P(Y=0 \mid N>0) = 1-\delta$
2. Presence of a species given that we observe 0:
- $P(N> 0 \mid Y=0) = \frac{P(Y=0 \mid N>0) P(N>0)}{P(Y=0)}$
- $\frac{(1-\delta)\varphi}{\varphi (1-\delta) + (1-\varphi)} = \frac{\varphi (1-\delta)}{1-\delta \varphi}$
***
# Abundance
A lot more combinations of true abundance and observed counts:
| | $Y$ = 0 | 1 | 2 | ... |
|---|---|---|---|---|
| $N$ = 0 | x | | | |
| 1 | x | x | | |
| 2 | x | x | x | |
| ... | x | x | x | x |
***
# Estimating detectability
To estimate $\delta$, we need:
- ancillary information (multiple visits, distance bands, time intervals, multiple observers),
- parametric model assumptions (i.e. $\delta$ varies across locations).
***
# The myth of constant detectability
Detectability zealots often view a method that cannot estimate
constant detection probability $\delta$
(e.g. single-visit occupancy and N-mixture models) as inferior.
Fortunately for the rest of us: $\delta$ can only be constant
in very narrow situations, e.g. when surveys are conducted:
- in the same region,
- in similar habitat,
- in the same year,
- on the same day,
- at the same time,
- by the same observer,
- using the same protocol.
***
# Constant detectability is rare
Often a consequence of small sample size (i.e.
not a lot of detection for a species)[^1][^2]:
\centering
```{r echo=FALSE,out.width="400px",fig.show = 'hold',out.width='48%'}
include_graphics("images/qpad_tab1.jpg")
include_graphics("images/solymos-2018-condor-table-1.png")
```
[^1]: Solymos et al. 2013, [Methods. Ecol. Evol. 4:1047--1058](http://dx.doi.org/10.1111/2041-210X.12106).
[^2]: Solymos et al. 2018, [Condor 120:765--786](http://dx.doi.org/10.1650/CONDOR-18-32.1).
***
# Availability varies
\centering
```{r echo=FALSE,out.width="300px"}
include_graphics("images/qpad_fig3.jpg")
```
***
# Availability really varies
\centering
```{r echo=FALSE,out.width="200px"}
include_graphics("images/solymos-2018-condor-fig-5.png")
```
***
# Biological mechanisms
- Migration timing drives phenology for many species, e.g. ordinal day of year (`DAY`),
- when study spans across biomes, use time since local spring (multi-year average),
- or time since spring green up, last snow day, etc. based on actual survey year,
- time of day,
- time since local sunrise (`TSSR`).
***
# Perceptibility varies too
\centering
```{r echo=FALSE,out.width="300px"}
include_graphics("images/qpad_fig4.jpg")
```
***
# Physical mechanisms
- Trees block the transmission of sound
- Broad leaves rustle more
- Louder sounds travel farther
- Low frequency sounds travel farther
***
# Let's unwrap $\delta$
1. Once the species/individual is present ($N>0$)
2. It needs to signal its presence: make itself heard/visible, make itself available ($p$),
3. Then the signal needs to be received by a sensor: human ear or a microphone (and then the human ear in the lab listening to a recording), perceptibility $q$
These imply a pre-defined total time duration and maximum counting radius.
***
# QPAD
Now we can expand our equation:
$E[Y]=NC=(AD)(pq)=qpAD$
The expected value of the observed count becomes a function of the:
- population density ($D$),
- area sampled ($A$),
- availability ($p$),
- and perceptibility ($q$).
***
# Space and time to the rescue
\centering
```{r echo=FALSE,out.width="190px"}
include_graphics("images/qpad_fig2.jpg")
```
$p$ is a monotonic function of _time_, while $q$ is monotonic function of _area_ (space).
***
# Simulate QPAD
```{r sim-pois}
D <- 2.0 # inds / unit area
A <- 1.0 # area
p <- 0.8 # availability
q <- 0.5 # perceptibility
n <- 100 # sample size
N <- rpois(n, lambda = A * D)
Y <- rbinom(n, size = N, prob = p * q)
```
***
# Output
```{r sim-pois2}
table(N=N, Y=Y)
```
***
# The model
- $(N \mid D,A) \sim Poisson(DA)$
- $(Y \mid N,p,q) \sim Binomial(N, pq)$.
Incorporates key components of reality, but also ignores a lot of details.
***
# Assumptions
- Observations are independent
- $Y$ involves no double counting
- Area is known and measured without error
- Detectability ($pq$) is independent of $N$
And a lot more that we'll cover later
***
# Why do we need simulation
Statistical techniques are expected to provide unbiased estimates when the data generation process follows the assumptions of the model. __Testing this is standard practice for any new technique.__
But testing if the __model assumptions are realistic__ is rarely evaluated with the same rigor.
For that we need to link to biological mechanisms to do sensitivity analysis. This is a lot more complex than a coin toss.
***
# Capturing complexity
\centering
_"I've yet to see any problem, however complicated, which when you looked at it the right way didn’t become still more complicated."_ --- Poul Anderson, Call Me Joe
***
# It is popular
\centering
```{r echo=FALSE,out.width="250px"}
include_graphics("images/physical.png")
```
_"Love the simulation we’re dreaming in"_ --- Dua Lipa, Physical
***
# Some history
Movement trajectories of bird flocks to mitigate mortality near airports and wind farms.
Beth Rigby ([2016](https://conservancy.umn.edu/handle/11299/193439)): _Simulating Effects of Imperfect Detectability in Bird Surveys_
But not a lot more...
***
# bSims goals
- Allow easy __testing of statistical assumptions__ and explore effects of violating these assumptions
- __Aid survey design__ by comparing different options
- And most importantly, to __have fun__ while doing it via an intuitive and interactive user interface
***
# bSims design
- __Isolation__: the spatial scale is small (local point count scale) so that we can treat individual landscapes as more or less homogeneous units (but see below how certain stratified designs and edge effects can be incorporated) and independent in space and time
- __Realism__: the implementation of biological mechanisms and observation processes are realistic, defaults are chosen to reflect common practice and assumptions
- __Efficiency__: implementation is computationally efficient utilizing parallel computing backends when available
- __Extensibility__: the package functionality is well documented and easily extensible
***
# bSims verbs
- __Initialize__ (`bsims_init`): the landscape is defined by the extent and possible habitat stratification
- __Populate__ (`bsims_populate`): the population of finite number of individuals within the extent of the landscape
- __Animate__ (`bsims_animate`): individual behaviours described by movement and vocalization events, i.e. the frequency of sending various types of signals
- __Detect__ (`bsims_detect`): the physical side of the observation process, i.e. transmitting and receiving the signal
- __Transcribe__ (`bsims_transcribe`): the "human" aspect of the observation process, i.e. the perception of the received signal
***
# Behavioral events
Event time ($T$) is a continuous random variable
In the simplest case, its probability density function is the Exponential
distribution: $f(t)=\phi e^{-t\phi}$
The corresponding cumulative distribution function is:
$F(t)=\int_{0}^{t} f(t)dt=1-e^{-t\phi}=p_t$, the probability that the event has occurred by duration $t$
The parameter $\phi$ is the rate of the Exponential distribution
with mean $1/\phi$ and variance $1/\phi^2$.
***
# Survival and hazard function
The complement of $F(t)$ CDF is called the _survival function_ ($S(t)=1-F(t)$, $S(0)=1$), which gives the probability that the event has _not_ occurred by duration $t$.
The _hazard function_ ($\lambda(t)=f(t)/S(t)$) defines the instantaneous rate of occurrence of the event (_risk_, the density of events at $t$ divided by the probability of surviving).
The cumulative hazard (cumulative risk) is the sum of the risks between duration 0 and $t$ ($\Lambda(t)=\int_{0}^{t} \lambda(t)dt$).
***
# Exponential model
The simplest survival distribution assumes constant risk over time ($\lambda(t)=\phi$), which corresponds to the Exponential distribution.
The Exponential distribution also happens to describe the lengths of the
inter-event times in a homogeneous Poisson process
(events are independent, it is a 'memory-less' process).
***
# Exponential model visualized
\centering
```{r beh-events,message=FALSE,warning=FALSE,echo=FALSE,fig.width=6,fig.height=4.5}
library(bSims)
set.seed(321)
l <- bsims_init()
p <- bsims_populate(l, density = 0.5)
e1 <- bsims_animate(p, vocal_rate = 1)
plot(get_events(e1))
curve((1-exp(-1*x)) * get_abundance(e1), col=2, add=TRUE)
```
***
# Zoom in on mechanisms
Today we are focusing on availability, $p \in (0, 1)$
We assume that peceptibility is $q=1$