-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy path03_acoustic-space-use.Rmd
414 lines (322 loc) · 19.4 KB
/
03_acoustic-space-use.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
---
editor_options:
chunk_output_type: console
---
# Acoustic space use
In this study, acoustic space use reflects the amount and pattern of vocalizations within each frequency bin for a given time period.
Previous papers calculated acoustic space use by ascertaining the proportion of recordings in a particular time and frequency bin that is above a certain amplitude. However, this proportion was estimated by counting the number of frequency peaks within a given recording (see _seewave::fpeaks()_) and then scaling it to go from 0 to 1. In our analysis, we want to obtain an understanding of the overall acoustic activity for a given time and frequency bin. In other words, we want the area under the curve for a particular frequency contour.
*Note* From Sound Analysis and Synthesis with R by Jerome Sueur
"The basic premise in calculating ASU is that we compute a Short-time discrete fourier transform for a given frequency bin size (obtained by dividing the sampling frequency/window length over which the fourier transform is computed, for example 48000Hz/256 = 187.5 is the bin size). However, the size of the frequency bin is inversely proportional to time (Uncertainty principle; Pg.312) which would mean a bin size of 187.5 corresponds to a time duration of 0.005s (1/187.5=0.005s). Ultimately what matters is a compromise between frequency resolution and temporal resolution (Fig 11.3; Pg. 313)."
## Install necessary libraries
```{r}
library(seewave)
library(warbleR)
library(tuneR)
library(stringi)
library(tidyverse)
library(ggplot2)
library(RColorBrewer)
library(foreach)
library(doSNOW)
library(rlist)
library(tictoc)
library(patchwork)
# Source any custom/other internal functions necessary for analysis
source("code/01_internal-functions.R")
```
Below, we have summarized approaches taken by previous studies:
Aide et al. 2017 aggregated recordings at time scale of hour of day and used a frequency bin size of 86.13 Hz and an amplitude filtering threshold of 0.02. So if the sampling rate is 22000 Hz, that would mean - 22000/86.13 ~ 256 frequency bins to divide up the frequency space. In this paper, there would be 24hr*256 bins = 6144 time/frequency bins.
Campos-Cerqueira et al. 2019 aggregated recordings at the time scale of hour of day (24 h), used a frequency bin size of 172 Hz, and an amplitude filtering threshold of 0.003. So if the sampling rate is 22000 Hz, that would mean - 22000/172 ~ 128 frequency bins. This resulted in a three‐dimensional (x = hour, y = acoustic frequency, z = proportion of all recordings in each time/frequency bin with a frequency peak value > 0.003 amplitude) matrix of acoustic activity with a total of 3,072 time/frequency bins (24 h × 128 frequency bins).
Campos-Cerqueira and Aide 2017 used the meanspec (f = 44,100, wl = 256, wn = “hanning”) and fpeaks (threshold = 0.1, freq = 172) function from the seewave package in R (Sueur et al., 2008a). The value of each peak was normalized using the maximum amplitude value within all recordings in the soundscape (i.e., site), and thus values ranged from 0 to 1. The number of frequency peaks was determined by counting the number of recordings with a peak within each of the 128 frequency bins that were equal or greater than the amplitude threshold. To control for the different number of recordings in each site and each time interval (i.e., hour), we divided the number of recordings with a peak in each time/frequency class by the total number of recordings collected during each hourly interval.
## Load recordings across multiple days and sites
First, we examined all unique sites and dates for which we had a continuous 24 hours of recordings.
```{r}
# List the path that contains all folders, which contain the audiomoth data
path <- "D:\\2020-summer\\" # change the path depending on where the raw data is
# Listing the folders within which .WAV files are stored
folders <- dir(path, recursive=F,full.names=T)
###### Please note ######
# The below set of lines need to be run only once for a given set of sites and days.
# Let's first rename the files by name of each site (as prefix)
for(i in 1:length(folders)){
setwd(folders[i])
# List the files within each folder and renaming the files with the prefix - SITE_ID
a <- list.files(paste0(path,basename(folders)[i],"\\"), full.names = T)
file.rename(from = a, to=paste0(basename(folders)[i],"_",basename(a)))
}
###### Ending note here ######
# Now get only those files for a full 24 hours across every unique site
files <- list()
for(i in 1:length(folders)){
a <- list.files(paste0(path,basename(folders)[i],"\\"), full.names = T)
site_date <- str_extract(basename(a),'\\w+_\\d+_')
# Choosing all 24 hours of data across every unique site (288 corresponds to 12 files every 1 hour)
for(j in 1:length(unique(site_date))){
dat <- a[str_detect(a,unique(site_date)[j])]
if((length(dat)<288)==TRUE){
next
} else {
files <- c(files, dat)
}
}
}
files <- unlist(files)
```
## Short-time discrete fourier transform (STDFT)
To compute an STDFT, we shall begin with an audio recording that has a sampling rate of s = 48,000 Hz and a window length (for the STDFT) wl = 256 samples, which results in a frequency bin size of z = 187.5 Hz (where z = s/wl). Given the sampling rate s, our Nyquist frequency f = 24,000 Hz (where f = s/2), and therefore, the total number of frequency bins across which an STDFT was run is n = 128 frequency bins (n = f/z). The final output of an STDFT corresponds to a matrix of N * m fourier coefficients (N = length of the audio recording and m = number of frequency bins). In this study, we computed STDFT across 24 hours of audio recordings, which translates to a matrix of 3072 (24 hours * 128 bins) fourier coefficients (Campos-Cerqueira et al. 2019). This matrix of coefficients essentially corresponds to ASU (or a measure of space ‘used, for every hour of a day across the 128 bins in this study). Prior to computing STDFTs and ASU values, we set a threshold amplitude of 0.003 dB for audio recordings (following Campos-Cerqueira et al. 2019 and examination of our data).
To calculate ASU, we selected five consecutive days of acoustic data (24 continuous hours of data) across each site. Five days was the minimum number of days of consecutive sampling without disruptions across sites (eg. memory card switching/battery replacements/failure of equipment etc.).
```{r}
# Select all unique site_date combinations for each unique site
site_date <- str_extract(basename(files),'\\w+_\\d+_')
unique(site_date)
# The only step that involves sub-selection (Selecting 5 days across each site; Note: IPTO06R has 3 days sampled and OLV110R has only 2 days sampled)
site_date <- unique(site_date)[c(1:5,10:14,19:23,28:47,52:61,71:75,
86:90,97:101,107:111,118:122,130:134,
141:143,150:154,161:165,173:177,188:189,
193:197,202:206,211:215,219:223,225:229,
231:235,237:241,249:253,261:265,269:273,
281:285,289:293,300:304,309:313,319:323,
329:333,340:344,351:355,361:365,370:374,
378:382,387:391,396:400)]
# Create a sequence of numbers to combine files by 1 hour
hour_seq <- seq(from=0,to=288, by=12)
# To name files with a suffix for each hour
time_of_day <- c("00:00-01:00","01:00-02:00","02:00-03:00","03:00-04:00",
"04:00-05:00","05:00-06:00","06:00-07:00","07:00-08:00",
"08:00-09:00","09:00-10:00","10:00-11:00","11:00-12:00",
"12:00-13:00","13:00-14:00","14:00-15:00","15:00-16:00",
"16:00-17:00","17:00-18:00","18:00-19:00","19:00-20:00",
"20:00-21:00","21:00-22:00","22:00-23:00","23:00-24:00")
# Loading parameters necessary for the Short-term fourier transform to be performed on hourly aggregates of data for each site_date combination
f <- 48000
wl <- 256 # This window length should be changed as a function of the frequency resolution (ie. bin size) and temporal resolution (ie. time)
ovlp <- 50
wn <- "hanning"
# Store the 24 hour acoustic space use data in a list and name it by a unique site and date
site_date_asu <- list()
# Add a progress bar for the loop
pb <- txtProgressBar(
min = 0,
max = length(unique(site_date)),
style = 3
)
# Select only 24 hours of data (00:00:00 to 23:55:00) for every unique site-date
for(i in 1:length(unique(site_date))){
tic("Total time for a single site-day combination")
# Store the acoustic space use data in a data.frame for plotting and analysis
space_use <- data.frame()
# Extract the strings first by site
dat <- files[stringr::str_detect(files,unique(site_date)[i])]
# Parallelize the runs
cl <- makeCluster(6, type = "SOCK")
registerDoSNOW(cl)
# Store the each hour of data for an entire day as a list here (raw audio files read by tuneR::readWave())
tic("Reading in .WAV files")
dailydata <- foreach(k=1:length(dat), .combine = 'c', .inorder = T, .packages = 'tuneR') %dopar% {
r <- readWave(dat[k])
}
toc()
gc()
stopCluster(cl)
# renaming the files to ensure that data is read in hour by hour and stored in a separate object called hourlydata
# Below we read the first 12 files in and then save it after computing the STDFT and then repeating it for the next 12 files and so on
for(t in 1:(length(hour_seq)-1)) { # Every 12 files correspond to one hour
tic("Computing Short-term fourier transforms for each hour")
if (t == 1) {
hourlydata <- dailydata[hour_seq[t]:hour_seq[t+1]]
} else {
hourlydata <- dailydata[(hour_seq[t]+1):hour_seq[t+1]]
}
gc()
# Parallelize the runs
cl <- makeCluster(6, type = "SOCK")
registerDoSNOW(cl)
# Store every hour's ASU data here
data_per_hour <- foreach(z = 1:length(hourlydata), .combine = 'rbind',
.inorder=T, .packages = 'seewave') %dopar% {
wave <- hourlydata[[z]]
n <- length(wave)
## Short-term Fourier transform (using a seewave internal function)
m <- sspectro(wave, f = f, wl = wl, ovlp = ovlp, wn = wn)
# Frequency selection and frequency axis
# Here, want only a sequence of numbers that correspond to the length of rows # of the short-time fourier transform and we divide it by 1000 to get values # in kHz
freq <- seq(0, (f/2) - (f/wl), length.out = nrow(m))/1000
# Calculate acoustic space use per frequency bin
f.cont <- apply(m, MARGIN = 1, FUN = sum)
# Store the space use data in a dataframe for plotting later
a <- data.frame(freq, f.cont)
}
rm(hourlydata); gc()
data_per_hour <- data_per_hour %>%
group_by(freq) %>%
summarise(f.cont=sum(f.cont))
data_per_hour$f.cont <- (data_per_hour$f.cont)/12
data_per_hour$time_of_day <- time_of_day[t]
space_use <- rbind(data_per_hour, space_use)
stopCluster(cl); gc()
toc()
}
space_use <- as.data.frame(space_use)
site_date_asu <- c(site_date_asu,list(space_use))
names(site_date_asu)[i] <- unique(site_date)[i]
rm(space_use, dailydata, data_per_hour); gc()
setTxtProgressBar(pb, i)
toc()
}
close(pb)
# store the list object for later
list.save(site_date_asu, "data/site_date_asu.rdata")
# Save the list of dataframes for future analysis
siteByDayAsu <- bind_rows(site_date_asu, .id = "Site_Day")
# Write to .csv
write.csv(siteByDayAsu, "data/site-by-day-asu.csv", row.names = F)
```
## Average acoustic space use
Now that we have calculated acoustic space use values for a given frequency bin size and time resolution for every unique site-date combination, we would like to obtain a single set of space use values for every unique site. In other words, we would like to average space use values across all frequency bins for five days.
```{r}
# All unique sites
site <- str_extract(basename(files),'^([[:alnum:]])+')
unique(site)
# Store the site-wise ASU values
site_asu <- list()
# Loop through site-wise data and average data for each site
for(i in 1:length(unique(site))) {
# Extract data needed for every unique site
dat <- site_date_asu[stringr::str_detect(names(site_date_asu),unique(site)[i])]
# Sum up values of acoustic space use across X no. of days for every unique site
asu_sum <- dat %>%
bind_rows(.id="data") %>%
group_by(freq, time_of_day) %>%
summarize(f.cont.sum=sum(f.cont))
# Averaging data as a function of number of days for a given site over which data was summed
asu_sum$f.cont.sum <- (asu_sum$f.cont.sum)/length(dat)
# Scaling the data between 0 to 1 for the sake of comparison across sites
asu_sum$f.cont.sum <- range01(asu_sum$f.cont.sum)
asu_sum <- as.data.frame(asu_sum)
site_asu <- c(site_asu, list(asu_sum))
names(site_asu)[i] <- unique(site)[i]
rm(dat,asu_sum)
}
# store the list object for later
list.save(site_asu, "data/site_asu.rdata")
# Save the list of dataframes for future analysis
siteWiseAsu <- bind_rows(site_asu, .id = "Site")
# Write to .csv
write.csv(siteWiseAsu, "data/site-wise-asu.csv", row.names = F)
```
## Visual examination of ASU
We will use ggplot2 to plot and save individual ASU plots for each unique site-date combination. This will then be repeated for each unique site (calculated in the previous chunk of code)
```{r}
# Saving the ggplots to a folder for every unique site-date combination
for(i in 1:length(site_date_asu)){
# Get the data out from a list to a dataframe (else ggplot won't accept it)
dat <- data.frame(site_date_asu[i])
names(dat) <- c("freq","f.cont","time_of_day")
# Plot the data
g1 <- ggplot(dat, aes(x=time_of_day, y=freq)) +
geom_tile(aes(fill = f.cont)) +
scale_fill_gradientn(colours = brewer.pal(9,"Reds"))+
# scale_fill_scico(palette = "vikO") +
theme_bw() +
labs(x="Time of Day (in hours)",
y="Frequency (in kHz) ") +
theme(axis.title = element_text(size = 16, face = "bold"),
axis.ticks.length.x = unit(.5, "cm"),
axis.text = element_text(size = 14),
axis.text.x = element_text(angle=90, vjust=0.5, hjust=0.5),
legend.title = element_blank(),
legend.key.size = unit(1,"cm"),
legend.text = element_text(size = 12))
# Save the data to a folder
ggsave(filename="",width=12, height=7, units="in", dpi = 300,
plot=g1, device="png", path = paste("figs/siteByDayAsu/", paste(names(site_date_asu)[i], ".png", sep=""), sep=""))
}
# Saving the ggplots to a folder for every unique site
for(j in 1:length(site_asu)){
# Get the data out from a list to a .dataframe (else ggplot won't accept it)
dat <- data.frame(site_asu[j])
names(dat) <- c("freq","time_of_day","f.cont.sum")
# Plot the data
g1 <- ggplot(dat, aes(x=time_of_day, y=freq)) +
geom_tile(aes(fill = f.cont.sum)) +
scale_fill_gradientn(colours = brewer.pal(9,"Reds"))+
# scale_fill_scico(palette = "vikO") +
theme_bw() +
labs(x="Time of Day (in hours)",
y="Frequency (in kHz) ") +
theme(axis.title = element_text(size = 16, face = "bold"),
axis.ticks.length.x = unit(.5, "cm"),
axis.text = element_text(size = 14),
axis.text.x = element_text(angle=90, vjust=0.5, hjust=0.5),
legend.title = element_blank(),
legend.key.size = unit(1,"cm"),
legend.text = element_text(size = 12))
# Save the data to a folder
ggsave(filename="",width=12, height=7, units="in", dpi = 300,
plot=g1, device="png", path = paste("figs/siteWiseAsu/", paste(names(site_asu)[j], ".png", sep=""), sep=""))
}
```
## Representative figure for publication
```{r}
# load .rdata to plot select figures for publication
site_asu <- load("data/site_asu.rdata")
site_asu <- bind_rows(site_asu, .id="Site")
# Selecting three representative sites (PANMP1B, INBS04R, INBS04U)
bm <- site_asu %>%
filter(Site=="PANMP1B")
ar <- site_asu %>%
filter(Site=="INBS04R")
nr <- site_asu %>%
filter(Site=="INBS04U")
# create separate figures for benchmark, active and passive sites and then patchwork them
# benchmark site
fig_bm <- ggplot(bm, aes(x=time_of_day, y=freq)) +
geom_tile(aes(fill = f.cont.sum)) +
scale_fill_gradientn(colours = brewer.pal(9,"Reds"))+
theme_bw() +
labs(fill = "Acoustic Space Use\n") +
theme(axis.title = element_blank(),
axis.ticks.length.x = unit(.5, "cm"),
axis.text = element_text(family="Century Gothic",size = 14),
axis.text.x = element_text(angle=90, vjust=0.5, hjust=0.5),
legend.title = element_text(family="Century Gothic",
size = 14, face = "bold"),
legend.key.size = unit(1,"cm"),
legend.text = element_text(family="Century Gothic",size = 14)) +
annotate(geom = "text", x = max(bm$time_of_day), y = 23, label = "Benchmark site", hjust = 1, family = "Century Gothic", size=5, fontface="bold")
# actively restored site
fig_ar <- ggplot(ar, aes(x=time_of_day, y=freq)) +
geom_tile(aes(fill = f.cont.sum)) +
scale_fill_gradientn(colours = brewer.pal(9,"Reds"))+
theme_bw() +
labs(x="\nTime of Day (in hours)",
fill = "Acoustic Space Use\n") +
theme(axis.title = element_text(family="Century Gothic",
size = 14, face = "bold"),
axis.ticks.length.x = unit(.5, "cm"),
axis.text = element_text(family="Century Gothic",size = 14),
axis.text.x = element_text(angle=90, vjust=0.5, hjust=0.5),
axis.title.y = element_blank(),
legend.position = "none") +
annotate(geom = "text", x = max(ar$time_of_day), y = 23, label = "Actively restored site", hjust = 1, family = "Century Gothic", size=5, fontface="bold")
# naturally regenerating site
fig_nr <- ggplot(nr, aes(x=time_of_day, y=freq)) +
geom_tile(aes(fill = f.cont.sum)) +
scale_fill_gradientn(colours = brewer.pal(9,"Reds"))+
theme_bw() +
labs(y="Frequency (in kHz)\n",
fill = "Acoustic Space Use\n") +
theme(axis.title = element_text(family="Century Gothic",
size = 14, face = "bold"),
axis.ticks.length.x = unit(.5, "cm"),
axis.text = element_text(family="Century Gothic",size = 14),
axis.title.x = element_blank(),
axis.text.x = element_text(angle=90, vjust=0.5, hjust=0.5),
legend.position = "none") +
annotate(geom = "text", x = max(nr$time_of_day), y = 23, label = "Naturally regenerating site", hjust = 1, family = "Century Gothic", size=5, fontface="bold")
# patchwork the above three for the Fig. 3
fig_asu <- fig_nr + fig_ar + fig_bm
ggsave(fig_asu, filename = "figs/fig3.png", width=52, height=17,device = png(), units="cm", dpi = 300); dev.off()
```
![**Visual examination of acoustic space use across a naturally regenerating, actively restored and benchmark site**
In this figure, we visually examined ASU for each of the 128 frequency bins and 24 hours across each site and treatment type. Shown here are representative figures for an NR, AR, and BM site. In this visualization, we estimated the proportion of frequency space (values between 0 to 1) that was occupied by vocalizations/sounds above 0.003 dB for every single hour of recording across 24 hours in a day. We observed largely empty frequency bins between 12 kHz to 24 kHz for the majority of AR and NR sites. For the sake of this representative figure, we show the average ASU calculated across five days for each site. However, the patterns described here are broadly consistent across days and sites (Supporting Information). In the above figure, BM = undisturbed benchmark rainforest sites, AR = Actively restored forest sites, and NR = Naturally regenerating forest sites.](figs/fig04.png)