-
Notifications
You must be signed in to change notification settings - Fork 13
/
Copy pathrnrfa-vignette.Rmd
303 lines (231 loc) · 11.4 KB
/
rnrfa-vignette.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
---
title: "An introduction to the rnrfa package"
author: "Claudia Vitolo"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: references.bib
vignette: >
%\VignetteIndexEntry{rnrfa}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, eval = FALSE)
```
## Introduction
The UK National River Flow Archive serves daily streamflow data, spatial rainfall averages and information regarding elevation, geology, land cover and FEH related catchment descriptors.
There is currently an API under development that in future should provide access to the following services: metadata catalogue, catalogue filters based on a geographical bounding-box, catalogue filters based on metadata entries, gauged daily data for about 400 stations available in WaterML2 format, the OGC standard used to describe hydrological time series.
The information returned by the first three services is in JSON format, while the last one is an XML variant.
The RNRFA package aims to achieve a simpler and more efficient access to data by providing wrapper functions to send HTTP requests and interpret XML/JSON responses.
### Dependencies
The rnrfa package depends on the **gdal** library, make sure you have it installed on your system before attempting to install this package.
**R package dependencies** can be installed running the following code:
```{r}
install.packages(c("cowplot", "httr", "xts", "ggmap", "ggplot2", "sp", "rgdal", "parallel", "tibble"))
```
This demo makes also use of external libraries. To install and load them run the following commands:
```{r}
packs <- c("devtools", "DT", "leaflet")
install.packages(packs)
lapply(packs, require, character.only = TRUE)
```
### Installation
The stable version of the **rnrfa** package is available from CRAN:
```{r}
install.packages("rnrfa")
```
Or you can install the development version from Github with [devtools](https://github.com/r-lib/devtools):
```{r}
devtools::install_github("cvitolo/rnrfa")
```
Now, load the rnrfa package:
```{r}
library(rnrfa)
```
## Functions
### List of station identification numbers
The function stations_info() returns a vector of all NRFA station identifiers.
```{r}
# Retrieve station identifiers:
allIDs <- station_ids()
head(allIDs)
```
### List of monitoring stations
The function catalogue() retrieves information for monitoring stations. The function, used with no inputs, requests the full list of gauging stations with associated metadata. The output is a tibble containing one record for each station and as many columns as the number of metadata entries available.
```{r}
# Retrieve information for all the stations in the catalogue:
allStations <- catalogue()
head(allStations)
```
The columns are briefly described below (see also [API documentation](https://nrfaapps.ceh.ac.uk/nrfa/nrfa-api.html#ws-station-info)):
* `id` The station identifier.
* `name` The station name.
* `catchment-area` The catchment area (in km2).
* `grid-reference` The station grid reference. For JSON output the grid-reference is represented as an object with the following properties:
- `ngr` (String) The grid reference in string form (i.e. "SS9360201602").
- `easting` (Number) The grid reference easting (in metres).
- `northing` (Number) The grid reference northing (in metres).
* `lat-long` The station latitude/longitude. For JSON output the lat-long is represented as an object with the following properties:
- `string` (String) The textual representation of the lat/long (i.e. "50°48'15.0265"N 3°30'40.7121"W").
- `latitude` (Number) The latitude (expressed in decimal degrees).
- `longitude` (Number) The longitude (expressed in decimal degrees).
* `river` The name of the river.
* `location` The name of the location on the river.
* `station-level` The altitude of the station, in metres, above Ordnance Datum or, in Northern Ireland, Malin Head.
* `easting` The grid reference easting.
* `northing` The grid reference northing.
* `station-information` Basic station information: id, name, catchment-area, grid-reference, lat-long, river, location, station-level, measuring-authority-id, measuring-authority-station-id, hydrometric-area, opened, closed, station-type, bankfull-flow, structurefull-flow, sensitivity.
category.
* Information about the main station categories: nrfa-mean-flow, nrfa-peak-flow, feh-pooling, feh-qmed, feh-neither, nhmp, benchmark, live-data.
catchment-information
* Basic catchment information: factors-affecting-runoff.
gdf-statistics
* Gauged daily flow statistics: gdf-start-date, gdf-end-date, gdf-mean-flow, gdf-min-flow, gdf-first-date-of-min, gdf-last-date-of-min, gdf-max-flow, gdf-first-date-of-max, gdf-last-date-of-max, gdf-q95-flow, gdf-q70-flow, gdf-q50-flow, gdf-q10-flow, gdf-q05-flow, gdf-base-flow-index, gdf-day-count, gdf-flow-count.
* peak-flow-statistics
* Basic peak-flow statistics: peak-flow-start-date, peak-flow-end-date, qmed.
elevation
* Catchment elevation pecentile data: minimum-altitude, 10-percentile-altitude, 50-percentile-altitude, 90-percentile-altitude, maximum-altitude.
catchment-rainfall
* Catchment rainfall standard period data: saar-1941-1970, saar-1961-1990.
lcm2000
* Land cover map data (2000): lcm2000-woodland, lcm2000-arable-horticultural, lcm2000-grassland, lcm2000-mountain-heath-bog, lcm2000-urban.
lcm2007
* Land cover map data (2007): lcm2007-woodland, lcm2007-arable-horticultural, lcm2007-grassland, lcm2007-mountain-heath-bog, lcm2007-urban.
geology
* Catchment geology data: high-perm-bedrock, moderate-perm-bedrock, low-perm-bedrock, mixed-perm-bedrock, high-perm-superficial, low-perm-superficial, mixed-perm-superficial.
feh-descriptors
* FEH catchment descriptors: propwet, bfihost, farl, dpsbar.
urban-extent
* Urban extent data: urbext-1990, urbext-2000.
spatial-location
* The grid reference and lat/long as individual fields: easting, northing, latitude, longitude.
### Station filtering
The same function catalogue() can be used to filter stations based on a bounding box or any of the metadata entries.
```{r}
# Define a bounding box:
bbox <- list(lon_min = -3.82, lon_max = -3.63, lat_min = 52.43, lat_max = 52.52)
# Filter stations based on bounding box
catalogue(bbox)
```
```{r}
# Filter based on minimum recording years
catalogue(min_rec = 100)
# Filter stations belonging to a certain hydrometric area
catalogue(column_name="river", column_value="Wye")
# Filter based on bounding box & metadata strings
catalogue(bbox, column_name="river", column_value="Wye")
# Filter stations based on threshold
catalogue(bbox, column_name="catchment-area", column_value=">1")
# Filter based on minimum recording years
catalogue(bbox, column_name = "catchment-area",
column_value = ">1",
min_rec = 30)
# Filter stations based on identification number
catalogue(column_name="id", column_value=c(3001,3002,3003))
```
```{r}
# Other combined filtering
someStations <- catalogue(bbox,
column_name = "id",
column_value = c(54022,54090,54091,54092,54097),
min_rec = 35)
```
### Conversions
The RNRFA package allows convenient conversion between UK grid reference and more standard coordinate systems. The function "osg_parse()", for example, converts the string to easting and northing in the BNG coordinate system (EPSG code: 27700), as in the example below:
```{r}
# Where is the first catchment located?
someStations$`grid-reference`$ngr[1]
# Convert OS Grid reference to BNG
osg_parse("SN853872")
```
The same function can also convert from BNG to latitude and longitude in the WSGS84 coordinate system (EPSG code: 4326) as in the example below.
```{r}
# Convert BNG to WSGS84
osg_parse(grid_refs = "SN853872", coord_system = "WGS84")
```
osg_parse() also works with multiple references:
```{r}
osg_parse(grid_refs = someStations$`grid-reference`$ngr)
```
### Get time series data
The first column of the table "someStations" contains the id number. This can be used to retrieve time series data and convert waterml2 files to time series object (of class zoo).
The National River Flow Archive serves two types of time series data: gauged daily flow and catchment mean rainfall.
These time series can be obtained using the functions gdf() and cmr(), respectively. Both functions accept three inputs:
* `id`, the station identification numbers (single string or character vector).
* `metadata`, a logical variable (FALSE by default). If metadata is TRUE means that the result for a single station is a list with two elements: data (the time series) and meta (metadata).
* `cl`, This is a cluster object, created by the parallel package. This is set to NULL by default, which sends sequential calls to the server.
Here is how to retrieve mean rainfall (monthly) data for _Shin at Lairg (id = 3001)_ catchment.
```{r, fig.width=7}
# Fetch only time series data from the waterml2 service
info <- cmr(id = "3001")
plot(info)
# Fetch time series data and metadata from the waterml2 service
info <- cmr(id = "3001", metadata = TRUE)
plot(info$data, main=paste("Monthly rainfall data for the",
info$meta$stationName,"catchment"),
xlab="", ylab=info$meta$units)
```
Here is how to retrieve (daily) flow data for _Shin at Lairg (id = 3001)_ catchment.
```{r, fig.width=7}
# Fetch only time series data
info <- gdf(id = "3001")
plot(info)
# Fetch time series data and metadata from the waterml2 service
info <- gdf(id = "3001", metadata = TRUE)
plot(info$data, main=paste0("Daily flow data for the ",
info$meta$station.name,
" catchment (",
info$meta$data.type.units, ")"))
```
### Multiple sites
By default, the functions `getTS()` can be used to fetch time series data from multiple site in a sequential mode (using 1 core):
```{r, fig.width=7}
# Search data/metadata
s <- cmr(c(3002,3003), metadata = TRUE)
# s is a list of 2 objects (one object for each site)
plot(s[[1]]$data,
main = paste(s[[1]]$meta$station.name, "and", s[[2]]$meta$station.name))
lines(s[[2]]$data, col="green")
```
## Interoperability
Upgrade your data.frame to a data.table:
```{r}
library(DT)
datatable(catalogue())
```
Create interactive maps using leaflet:
```{r}
library(leaflet)
leaflet(data = someStations) %>% addTiles() %>%
addMarkers(~longitude, ~latitude, popup = ~as.character(paste(id,name)))
```
Interactive plots using dygraphs:
```{r}
library(dygraphs)
dygraph(info$data) %>% dyRangeSelector()
```
Sequential vs Concurrent requests: a simple benchmark test
```{r}
library(parallel)
# Use detectCores() to find out many cores are available on your machine
cl <- makeCluster(getOption("cl.cores", detectCores()))
# Filter all the stations within the above bounding box
someStations <- catalogue(bbox)
# Get flow data with a sequential approach
system.time(s1 <- gdf(someStations$id, cl = NULL))
# Get flow data with a concurrent approach (using `parLapply()`)
system.time(s2 <- gdf(id = someStations$id, cl = cl))
stopCluster(cl)
```
The measured flows are expected to increase with the catchment area. Let's show this simple regression on a plot:
```{r}
# Calculate the mean flow for each catchment
someStations$meangdf <- unlist(lapply(s2, mean))
# Linear model
library(ggplot2)
ggplot(someStations, aes(x = as.numeric(`catchment-area`), y = meangdf)) +
geom_point() +
stat_smooth(method = "lm", col = "red") +
xlab(expression(paste("Catchment area [Km^2]",sep=""))) +
ylab(expression(paste("Mean flow [m^3/s]",sep="")))
```