-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy path07_species-richness-birds.Rmd
230 lines (177 loc) · 10.3 KB
/
07_species-richness-birds.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
---
editor_options:
chunk_output_type: console
---
# Bird Species Richness
In this script, we will calculate:
a) Site-wise species richness to understand if species composition across treatment types are distinctly different.
b) Repeat the above calculations, but using species traits - If a species is a rainforest specialist or an open-country generalist.
## Install required libraries
```{r}
library(tidyverse)
library(dplyr)
library(stringr)
library(vegan)
library(ggplot2)
library(scico)
library(data.table)
library(extrafont)
# Source any custom/other internal functions necessary for analysis
source("code/01_internal-functions.R")
```
## Load manual annotations data
To start with, we will compute species richness from 3 non-consecutive visits for each site for each season. Across two seasons, a total of 6 non-consecutive visits are chosen for each site.
We will use an excel sheet of manual annotations, which contains 10s-clips from each site (A random 16-min was chosen between 6am and 10am and divided into 10s chunks for the ease of annotations).
```{r}
# Attach the annotation data for summer and winter
summer_data <- read.csv("data/2020-summer-annotation-working-document.csv")
winter_data <- read.csv("data/2020-winter-annotation-working-document.csv")
## Please write to the lead author if you would like to access this data (Vijay Ramesh; vr2352@columbia.edu)
# combine the datasets for a single dataframe for the present analysis
data <- bind_rows(summer_data,winter_data)
names(data)
# reorder columns
data <- data %>%
relocate(c("BFO","SBEO", "JN", "AK", "HSWP"), .after = "CR") # certain species codes are mentioned here and these can be found in the supplementary data.
# Site-wise sorting of the 16-min of data
# Split the file names into 4 columns : Site, Date, Time and Splits
data <- separate(data, col = Filename, into = c("Site", "Date", "Time", "Splits"), sep = "_")
data
# Load the species-trait-data
trait_dat <- read.csv("data/species-trait-dat.csv")
```
## Subset data
How many visits (counted as number of unique days) to a site has been annotated? Please note that some sites have more than 6 visits to a site, and hence we ensure that we choose only 6 non-consecutive visits to a given site.
NOTE: You need to run this only once and if you have already run it, move to the next chunk of code and load datSubset.
```{r}
# First we remove OLCAP5B - a site for which only 3 visits were made in summer and not sampled as a result of logistic reasons in winter
data <- data %>%
filter(!str_detect(Site, 'OLCAP5B'))
# Number of visits to a particular site
# Only INBS04U has 5 visits because another visit was not possible due to rain
nSites_Days <- data %>%
dplyr::select(Site, Date)%>%
distinct() %>% arrange(Site) %>% count(Site)
# Unique date site combination to give you a sense of sampling
uniqueSiteDate <- data %>%
group_by(Site) %>%
distinct(Date)
# Convert date column to YMD format using lubridate::ymd()
uniqueSiteDate$Date <- lubridate::ymd(uniqueSiteDate$Date)
# The below lines of code were written following a query on stackOverflow to select six non-consecutive visits to any site
# Link: https://stackoverflow.com/questions/67212152/select-non-consecutive-dates-for-every-grouped-element-in-r
nonConVisits <- uniqueSiteDate%>%
ungroup() %>%
group_split(Site) %>%
map_df(., ~ .x %>% ungroup() %>%
arrange(Date) %>%
mutate(n = 1) %>%
complete(Date = seq.Date(first(Date), last(Date), by = 'days'))%>%
group_by(n = cumsum(is.na(n))) %>%
filter(!is.na(Site)) %>%
filter(row_number() %% 2 == 1) %>%
ungroup() %>%
sample_n(min(n(), 6))
) %>%
dplyr::select(-n)
# Change the structure of the date column back to character for using one of the join functions
nonConVisits$Date <- str_remove_all(as.character(nonConVisits$Date),"-")
# Left-join with the original dataframe to subset the data for analysis
datSubset <- left_join(nonConVisits,data)
names(datSubset)
# renaming columns
datSubset <- rename(datSubset, Restoration.type = Restoration.Type..Benchmark.Active.Passive.)
# Save this data as a .csv for future analysis (later scripts)
write.csv(datSubset,"results/datSubset.csv", row.names = F)
```
## Total number of detections
Now group the data by site and restoration type and sum the number of detections across sites. We will calculate the overall number of detections for each 10s clip, which will be used to estimate species richness below.
```{r}
# If you have already processed subsetting of data, read it in directly
datSubset <- read.csv("results/datSubset.csv")
# Calculate the overall number of detections for each site across 6 days of data (translates to ~ 96 min of data per site)
nDetections_Site <- datSubset %>%
group_by(Site, Restoration.type) %>%
transform() %>% replace(is.na(.), 0) %>%
summarise_at(.vars = vars(c("IP":"HSWP")),.funs = sum)
```
## Calculate species richness
Convert the detections to 1, since we are interested in calculating richness per site by converting values >1 to 1 for multiple visits to a site. In other words, we want to establish overall species richness for a 16-min to 48-min window.
```{r}
richness <- nDetections_Site %>%
mutate_at(vars(c("IP":"HSWP")),~ replace(., . > 0, 1)) %>%
rowwise() %>%
mutate(richness = sum(c_across(IP:HSWP))) %>%
dplyr::select(Site, Restoration.type, richness)
# Test if there are significant differences in richness across treatment types
anovaAll <- aov(richness~Restoration.type, data = richness)
# Tukey test to study each pair of treatment - reveals no signficant difference across treatment types
tukeyAll <- TukeyHSD(anovaAll)
# The above result suggests that there are no differences in overall species richness across treatment types
# Create a boxplot of species richness by Restoration Type
# reordering factors for plotting
richness$Restoration.type <- factor(richness$Restoration.type, levels = c("Benchmark", "Active", "Passive"))
# Add a custom set of colors
mycolors <- c(brewer.pal(name="Dark2", n = 3), brewer.pal(name="Paired", n = 3))
fig_richness <- ggplot(richness, aes(x=Restoration.type, y=richness, fill=Restoration.type)) + geom_boxplot(alpha=0.7) +
scale_fill_manual("Treatment type",values=mycolors, labels=c("BM","AR","NR")) +
theme_bw() +
labs(x="\nTreatment Type",
y="Species Richness\n") +
scale_x_discrete(labels = c('BM','AR','NR')) +
theme(axis.title = element_text(family = "Century Gothic",
size = 14, face = "bold"),
axis.text = element_text(family="Century Gothic",size = 14),
legend.position = "none")
ggsave(fig_richness, filename = "figs/fig_richness.png", width=12, height=7,
device = png(), units="in", dpi = 300);dev.off()
# We observe that the overall species richness is relatively higher in passively restored forest patches, followed by benchmark (protected area) forest patches and actively restored forest patches (But these differences are not significant)
```
![When we examined the overall species richness across treatment types, we did not observe any significant differences.](figs/fig_richness.png)
## Species richness by trait
Using species trait data to check if species richness varies by treatment type as a function of whether a species is a rainforest specialist vs. open-country specialist. To do so: Add an additional column of species-trait data and group data based on the same.
```{r}
# First, we pivot the species-codes and then match the codes with trait_data and reformat the data to keep all detections>0 as 1 else they are 0
richness_trait <- nDetections_Site %>%
pivot_longer(cols=IP:HSWP, names_to="Species_Code", values_to="count") %>%
left_join(.,trait_dat, by=c("Species_Code"="species_annotation_codes")) %>%
mutate(forRichness = case_when(count>0 ~ 1,
count==0 ~ 0))
# Calculate overall richness for each site as a function of rainforest species and open-country species
richness_trait <- richness_trait %>%
dplyr::select(Site, Restoration.type, Species_Code, habitat, forRichness)%>%
group_by(Site, Restoration.type, habitat) %>%
summarise(richness = sum(forRichness)) %>%
drop_na()
# Let's subset data for richness as a function of rainforest specialists and open-country generalists and test for significant differences (if any)
richness_rainforest <- richness_trait %>%
filter(habitat=="RF")
richness_opencountry <- richness_trait %>%
filter(habitat=="OC")
# Test if there are significant differences in richness across treatment types as a function of species trait
anova_rainforest <- aov(richness~Restoration.type, data = richness_rainforest)
anova_opencountry <- aov(richness~Restoration.type, data = richness_opencountry)
# Tukey test to study each pair of treatment - reveals no signficant difference across treatment types for rainforest birds, but a significant difference between benchmark-active and benchmark-passive sites for open country birds
tukey_rainforest <- TukeyHSD(anova_rainforest)
tukey_opencountry <- TukeyHSD(anova_opencountry)
# reordering factors for plotting
richness_trait$Restoration.type <- factor(richness_trait$Restoration.type, levels = c("Benchmark", "Active", "Passive"))
# Plot the above result
fig_richness_trait <- ggplot(richness_trait, aes(x=Restoration.type, y=richness, fill=habitat)) +
geom_boxplot(alpha=0.7) +
scale_fill_scico_d(palette = "roma",
labels=c("Open country","Rainforest")) +
theme_bw() +
labs(x="\nTreatment Type",
y="Species Richness\n") +
scale_x_discrete(labels = c('BM','AR','NR')) +
theme(axis.title = element_text(family="Century Gothic",
size = 14, face = "bold"),
axis.text = element_text(family="Century Gothic",size = 14),
legend.title = element_text(family="Century Gothic",
size = 14, face = "bold"),
legend.key.size = unit(1,"cm"),
legend.text = element_text(family="Century Gothic",size = 14))
ggsave(fig_richness_trait, filename = "figs/fig_richness_trait.png", width=12, height=7, device =png(), units="in", dpi = 300); dev.off()
```
![No significant differences were observed for rainforest species across treatment types, but we noticed a significant difference between BM-NR and BM-AR for open-country birds.](figs/fig_richness_trait.png)