-
Notifications
You must be signed in to change notification settings - Fork 5
/
Geomed19_II.Rmd
830 lines (587 loc) · 38.6 KB
/
Geomed19_II.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
---
title: "R and GIS, or R as GIS: handling spatial data: GIS and R: bridges or R as GIS?"
author: "Roger Bivand"
date: "Tuesday, 27 August 2019, 11:10-12:50; Wolfson Medical School building Gannochy room"
output:
html_document:
toc: true
toc_float:
collapsed: false
smooth_scroll: false
toc_depth: 2
theme: united
bibliography: Geomed19.bib
link-citations: yes
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```
### Copyright
All the material presented here, to the extent it is original, is available under [CC-BY-SA](https://creativecommons.org/licenses/by-sa/4.0/).
### Required current contributed CRAN packages:
I am running R 3.6.1, with recent `update.packages()`.
```{r, echo=TRUE}
needed <- c("sf", "mapview", "sp", "raster", "classInt", "RColorBrewer", "tmap", "rgdal", "rgrass7", "units", "gdistance", "rgeos", "lwgeom", "maptools", "RSQLite")
```
I also have GRASS 7.6.1 (https://grass.osgeo.org/download/software/), but this is not essential (learning GRASS in 30 minutes is not easy).
### Script
Script at https://github.com/rsbivand/geomed19-workshop/raw/master/Geomed19_II.zip. Download to suitable location and use as basis.
## Session II
- 11:10-11:40 (20+10) Ongoing changes in external sofware (GEOS, GDAL), including software and standards used for representing spatial reference systems (PROJ)
- 11:40-12:10 (20+10) GIS bridges (description and using GRASS and **rgrass7**)
- 12:10-12:50 (20+20) Using R as a GIS (topological operations)
## Ongoing changes in external sofware (GEOS, GDAL, PROJ)
```{r echo=FALSE}
knitr::include_graphics('sf_deps.png')
```
### PROJ
Because so much open source (and other) software uses the PROJ library and framework, many are affected when PROJ upgrades. Until very recently, PROJ has been seen as very reliable, and the changes taking place now are intended to confirm and reinforce this reliability. Before PROJ 5 (PROJ 6 is out now, PROJ 7 is coming early in 2020), the `+datum=` tag was used, perhaps with `+towgs84=` with three or seven coefficients, and possibly `+nadgrids=` where datum transformation grids were available. However, transformations from one projection to another first inversed to longitude-latitude in WGS84, then projected on to the target projection.
### Big bump coming:
'Fast-forward 35 years and PROJ.4 is everywhere: It provides coordinate handling for almost every geospatial program, open or closed source. Today, we see a drastical increase in the need for high accuracy GNSS coordinate handling, especially in the agricultural and construction engineering sectors. This need for geodetic-accuracy transformations is not satisfied by "classic PROJ.4". But with the ubiquity of PROJ.4, we can provide these transformations "everywhere", just by implementing them as part of PROJ.4' [@evers+knudsen17].
### Escaping the WGS84 hub/pivot: PROJ and OGC WKT2
Following the introduction of geodetic modules and pipelines in PROJ 5 [@knudsen+evers17; @evers+knudsen17], PROJ 6 moves further. Changes in the legacy PROJ representation and WGS84 transformation hub have been coordinated through the [GDAL barn raising](https://gdalbarn.com/) initiative. Crucially WGS84 often ceases to be the pivot for moving between datums. A new OGC WKT is coming, and an SQLite EPSG file database has replaced CSV files. SRS will begin to support 3D by default, adding time too as SRS change. See also [PROJ migration notes](https://proj.org/development/migration.html).
There are very useful postings on the PROJ mailing list from Martin Desruisseaux, first [proposing clarifications](https://lists.osgeo.org/pipermail/proj/2019-July/008748.html) and a [follow-up](https://lists.osgeo.org/pipermail/proj/2019-August/008750.html) including a summary:
> * "Early binding" ≈ hub transformation technique.
> * "Late binding" ≈ hub transformation technique NOT used, replaced by
a more complex technique consisting in searching parameters in the
EPSG database after the transformation context (source, target,
epoch, area of interest) is known.
> * The problem of hub transformation technique is independent of WGS84.
It is caused by the fact that transformations to/from the hub are
approximate. Any other hub we could invent in replacement of WGS84
will have the same problem, unless we can invent a hub for which
transformations are exact (I think that if such hub existed, we
would have already heard about it).
> The solution proposed by ISO 19111 (in my understanding) is:
> * Forget about hub (WGS84 or other), unless the simplicity of
early-binding is considered more important than accuracy.
> * Associating a CRS to a coordinate set (geometry or raster) is no
longer sufficient. A {CRS, epoch} tuple must be associated. ISO
19111 calls this tuple "Coordinate metadata". From a programmatic
API point of view, this means that getCoordinateReferenceSystem()
method in Geometry objects (for instance) needs to be replaced by a
getCoordinateMetadata() method.
In QGIS built on current PROJ 6 with the `proj.h` API (and GDAL built on current PROJ 6 with the `proj.h` API), we see the following sequence of GUI windows when trying to open the olinda.gpkg file.
```{r echo=FALSE}
knitr::include_graphics('images/A_Screenshot.png')
```
Instead of using the declared coordinate reference system of the added layer to provide a transformation/conversion relationship to possible WGS84 geographical coordinate or web mapping backgrounds, the user of the most recent QGIS version with PROJ 6 faces a choice of three alternatives with varying availabilities and precisions:
```{r echo=FALSE}
knitr::include_graphics('images/B_Screenshot.png')
```
```{r echo=FALSE}
knitr::include_graphics('images/C_Screenshot.png')
```
The third alternative has better precision, but depends on finding and installing an NTv2 grid file in the PROJ `shared/proj` metadata folder:
```{r echo=FALSE}
knitr::include_graphics('images/D_Screenshot.png')
```
If we install the file, the choices change to promote the more precise NTv2-based path to the first position:
```{r echo=FALSE}
knitr::include_graphics('images/E_Screenshot.png')
```
```{r, echo=TRUE}
library(sf)
packageVersion("sf")
```
The final element reported by `sf::sf_extSoftVersion()` shows whether **sf** was built with the `proj.h` interface to PROJ, or the legacy `proj.api.h` interface. However, GDAL also has to be built with the `proj.h` interface for everything to line up:
```{r, echo=TRUE}
sf_extSoftVersion()
```
```{r, echo=TRUE}
st_crs(22525)
```
The OGC WTK2 definition now contains a usage/scope term showing where the definition may be used; there may also be a temporal frame for a definition.
```{r, echo=TRUE}
cat(system("projinfo EPSG:22525", intern=TRUE), sep="\n")
```
If we ask about possible transformations/conversions, we see choices we saw among those represented in QGIS (I work on two apparently identical systems, which may give different choice counts)
```{r, echo=TRUE}
cat(system("projinfo -s EPSG:22525 -t EPSG:31985", intern=TRUE), sep="\n")
```
The input data use the Corrego Alegre 1970-1972 setting, and still provide a `+towgs84=` key representation for pivoting through WGS84:
```{r, echo=TRUE}
olinda <- st_read("data/olinda.gpkg", quiet=TRUE)
st_crs(olinda)
```
We'll just use one point to check things out:
```{r, echo=TRUE}
xy_c <- st_centroid(st_geometry(olinda[ 1,]))
st_coordinates(xy_c)
```
If we manually pivot through WGS84 on the way back to SIRGAS2000 UTM, we get:
```{r, echo=TRUE}
st_coordinates(st_transform(st_transform(xy_c, 4326), 31985))
```
Without the NTv2 grid file `CA7072_003.gsb` we seem to get the same:
```{r, echo=TRUE}
# without CA7072_003.gsb
st_coordinates(st_transform(xy_c, 31985))
```
but we also get the same with the grid file if we leave the `+towgs84=` key in the PROJ string:
```{r, echo=TRUE, eval=FALSE}
# with CA7072_003.gsb
st_coordinates(st_transform(xy_c, 31985))
# X Y
# 1 295489.3 9120352
```
If however we manipulate the PROJ string to specify the grid file instead of the `+towgs84=` key, we can get the improved precision:
```{r, echo=TRUE, eval=FALSE}
# with CA7072_003.gsb
xy_c1 <- xy_c
st_crs(xy_c1) <- "+proj=utm +zone=25 +south +ellps=intl +units=m +nadgrids=CA7072_003.gsb"
print(st_coordinates(st_transform(xy_c1, 31985)), digits=9)
# X Y
# 1 295486.396 9120350.62
```
Let's try to use the PROJ utility program `cs2cs` in its PROJ 6 version. The `cs2cs` version when the grid file is present matches `sf::st_transform()` when the input CRS is modified to point to the grid file:
```{r, echo=TRUE, eval=FALSE}
# with CA7072_003.gsb
cat(system(paste0("echo ", paste(xy, collapse=" "), " | cs2cs EPSG:22525 EPSG:31985"), intern=TRUE))
# 295486.40 9120350.62 0.00
```
`cs2cs` without the grid file gives:
```{r, echo=TRUE}
xy <- st_coordinates(xy_c)
# without CA7072_003.gsb
cat(system(paste0("echo ", paste(xy, collapse=" "), " | cs2cs EPSG:22525 EPSG:31985"), intern=TRUE))
```
This matches the second set of `+towgs84=` coefficients:
```{r, echo=TRUE, warning=FALSE}
# without CA7072_003.gsb
xy_c2 <- xy_c
st_crs(xy_c2) <- "+proj=utm +zone=25 +south +ellps=intl +units=m +towgs84=-206.05,168.28,-3.82,0,0,0,0"
st_coordinates(st_transform(xy_c2, 31985))
```
Using the `lwgeom::st_transform_proj()` for now uses the `proh_api.h` interface:
```{r, echo=TRUE}
# without CA7072_003.gsb
# -DACCEPT_USE_OF_DEPRECATED_PROJ_API_H
st_coordinates(lwgeom::st_transform_proj(xy_c, 31985))
```
Our reprojected objects in SIRGAS2000 used the WGS84 pivot with one of two possible sets of `+towgs84=` coefficients:
```{r, echo=TRUE}
olinda <- st_read("output/olinda_sirgas2000.gpkg", quiet=TRUE)
xy_c <- st_centroid(st_geometry(olinda[ 1,]))
st_coordinates(xy_c)
```
This is the EPSG description of the grid file: https://epsg.io/5541
It was retrieved from: https://www.eye4software.com/files/ntv2/ca70.zip
This [page](https://ww2.ibge.gov.br/home/geociencias/geodesia/default_sirgas_int.shtm?c=11) gives a picture of why the changes in PROJ matter - the arrows are in cm per year displacement.
Some grid files are available from https://proj.org/download.html, but because many others are not as freely available (yet), they may need to be dwnloaded from national mapping agencies. Most are relatively large, and also need to be versioned. Do read the README files in the zip archives!
### GEOS
A recent upgrade of GEOS from 3.7.1 to 3.7.2 on a CRAN test server led to failures in three packages using **rgeos** for topological operations. **rgeos** 0.4-3 set the `checkValidity=` argument to for example `gIntersection()` to FALSE (TRUE threw an error if either geometry was invalid). An [issue](https://github.com/r-spatial/sf/issues/1121) was opened on the **sf** github repository (**rgeos** is developed on R-Forge). The test objects (from an example from **inlmisc**) will be used here:
```{r, echo=TRUE}
rgeos::version_GEOS0()
```
For **rgeos** <= 0.4-3, the default was not to check input geometries for validity before trying topological operations, for >= 0.5-1, the default changes when GEOS > 3.7.1 to check for validity. The mode of the argument also changes to integer from logical:
```{r, echo=TRUE, warning=FALSE}
cV_old_default <- ifelse(rgeos::version_GEOS0() >= "3.7.2", 0L, FALSE)
yy <- rgeos::readWKT(readLines("data/invalid.wkt"))
rgeos::gIsValid(yy, byid=TRUE, reason=TRUE)
```
```{r, echo=TRUE}
sf::sf_extSoftVersion()
```
The same underlyng GEOS code is used in **sf**:
```{r, echo=TRUE}
sf::st_is_valid(sf::st_as_sf(yy), reason=TRUE)
```
The geometries were also invalid in GEOS 3.7.1, but the operations succeeded:
```{r, echo=TRUE, warning=FALSE}
ply <- rgeos::readWKT(readLines("data/ply.wkt"))
oo <- try(rgeos::gIntersection(yy, ply, byid=TRUE, checkValidity=cV_old_default), silent=TRUE)
print(attr(oo, "condition")$message)
```
```{r, echo=TRUE}
ooo <- try(sf::st_intersection(sf::st_as_sf(yy), sf::st_as_sf(ply)), silent=TRUE)
print(attr(oo, "condition")$message)
```
In **rgeos** 0.5-1 and GEOS 3.7.2, new warnings are provided, and advice to check validity.
```{r, echo=TRUE}
cV_new_default <- ifelse(rgeos::version_GEOS0() >= "3.7.2", 1L, TRUE)
try(rgeos::gIntersection(yy, ply, byid=TRUE, checkValidity=cV_new_default), silent=TRUE)
```
New options are provided, `get_RGEOS_CheckValidity()` and `set_RGEOS_CheckValidity()`, because in some packages the use of topological operations may happen through other packages, such as `raster::crop()` calling `rgeos::gIntersection()` without access to the arguments of the latter function.
If we follow the advice, zero-width buffering is used to try to rectify the invalidity:
```{r, echo=TRUE}
oo <- rgeos::gIntersection(yy, ply, byid=TRUE, checkValidity=2L)
rgeos::gIsValid(oo)
```
equivalently:
```{r, echo=TRUE}
oo <- rgeos::gIntersection(rgeos::gBuffer(yy, byid=TRUE, width=0), ply, byid=TRUE, checkValidity=1L)
rgeos::gIsValid(oo)
```
and by extension to **sf** until GEOS 3.7.2 is accommodated:
```{r, echo=TRUE}
ooo <- sf::st_intersection(sf::st_buffer(sf::st_as_sf(yy), dist=0), sf::st_as_sf(ply))
all(sf::st_is_valid(ooo))
```
The actual cause was the use of an ESRI/shapefile style/understanding of the self-touching exterior ring. In OGC style, an interior ring is required, but not in shapefile style. Martin Davis responded in the issue:
> The problem turned out to be a noding robustness issue, which caused the valid input linework to have a self-touch after noding. This caused the output to be invalid. The fix was to tighten up the internal overlay noding validation check to catch this situation. This has the side-effect of detecting (and failing) all self-touches in input geometry. Previously, vertex-vertex self-touches were not detected, and in many cases they would simply propagate through the overlay algorithm. (This made the output invalid as well, but since the inputs were already invalid this behaviour was considered acceptable).
The change in GEOS behaviour was not planned as such, but has consequences, fortunately detected because CRAN checks by default much more than say Travis by default. Zero-width buffering will not repair all cases of invalidity, but does work here.
## Exercise and review
For a later exercise, we'll be using the Soho cholera data set; I converted the shapefiles from https://asdar-book.org/bundles2ed/die_bundle.zip to GPKG to be more modern (using `ogr2ogr` in GDAL 3 built against PROJ 6. **sf** is installed using the `proj.h` interface in PROJ 6:
```{r, echo=TRUE}
buildings <- sf::st_read("data/snow/buildings.gpkg", quiet=TRUE)
st_crs(buildings)
```
To make an interactive display in `mapview()`, conversion/transformation to "Web Mercator" is needed - this uses a WGS84 datum. But PROJ 6 has dropped the `+datum=` tag, so the display is not correctly registered.
```{r, echo=TRUE}
library(mapview)
mapview(buildings)
```
The CRS/SRS values in the GPKG file (it is a multi-table SQLite database) include the datum definition:
```{r, echo=TRUE}
library(RSQLite)
db = dbConnect(SQLite(), dbname="data/snow/buildings.gpkg")
dbReadTable(db, "gpkg_spatial_ref_sys")$definition[4]
dbDisconnect(db)
```
Maybe using **rgdal** which is built using PROJ 6 but the legacy `proj_api.h` interface, and the shapefile as shipped with ASDAR reproduction materials will help?
```{r, echo=TRUE}
buildings1 <- rgdal::readOGR("data/snow/buildings.shp", verbose=FALSE)
sp::proj4string(buildings1)
```
No, same problem:
```{r, echo=TRUE}
mapview(buildings1)
```
But the shapefile has the datum definition:
```{r, echo=TRUE, warning=FALSE}
readLines("data/snow/buildings.prj")
```
So in both cases with PROJ 6, we need to manipulate the CRS read in with the file to insert our choice of how to make the transformation, because the definition as read no longer contains it:
```{r, echo=TRUE, warning=FALSE}
fixed <- "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +nadgrids=OSTN15_NTv2_OSGBtoETRS.gsb +units=m +no_defs"
st_crs(buildings) <- fixed
sp::proj4string(buildings1) <- sp::CRS(fixed)
```
```{r, echo=TRUE}
mapview(buildings)
```
```{r, echo=TRUE}
mapview(buildings1)
```
## GIS bridges (description and using GRASS and **rgrass7**)
### GIS interfaces
Because GIS can be used as databases, and their tools can be better suited to some analyses and operations, it may be sensible to use one in addition to data analysis software. There is an extra effort required when using linked software systems, because they may not integrate easily. Since R is open source, and R spatial packages use open source components, staying with open source GIS means that many of the underlying software components are shared. This certainly applies to R and GRASS, and through GRASS, also to R and QGIS --- QGIS is more file-based than GRASS, which has an underlying data storage specification.
GIS interfaces can be as simple as just reading and writing files using loose coupling, once the file formats have been worked out, that is. The GRASS 7 interface **rgrass7** on CRAN is the current, stable interface. In addition to the GRASS interface, which is actively maintained, there are several others: **link2GI** packages interfaces to several GI systems; **RQGIS** is for QGIS but links through to GRASS and SAGA [@muenchowetal17] using **reticulate**; **RSAGA** links to, scripting and running SAGA from R; **rpostgis** is for PostGIS [@bucklin+basille18]. The **arcgisbinding** package is published and distributed by [ESRI using Github](https://github.com/R-ArcGIS/r-bridge), and provides some file exchange facilities for vector and attribute data (newer versions may have raster too).
### Layering of shells
The interface between R and GRASS uses the fact that GRASS modules can be run as command line programs at the shell prompt. The shell has certain environment variables for GRASS set, for example saying where the data is stored, but is otherwise a regular shell, from which R can be started. This instance of R inherits the environment variables set by GRASS
```{r echo=FALSE}
knitr::include_graphics('gc009_04a.png')
```
Finally, although for research purposes it may be prefered to have data analysis software, such as R, facing the user, it is possible to try to embed this component in the workflow, so that the end user does not need so much training --- but then an ``expert'' has to codify the steps needed in advance.
```{r echo=FALSE}
knitr::include_graphics('links0.png')
```
### Two sides of the R/GRASS interface
The R/GRASS interface came into being in 1998/1999, and is covered in Bivand [-@bivand:00] and [a conference paper by Bivand and Neteler](http://www.geocomputation.org/2000/GC009/Gc009.htm); and Bivand [-@bivand:14]. R was started in a GRASS LOCATION, and spatial data was exchanged between GRASS and R, running as it were in tandem; the workflows were not integrated. **spgrass6** and its use discussed in Neteler and Mitasova [-@neteler+mitasova:08] continued this approach, but about that time steps were taken to permit scripting GRASS from R in existing LOCATIONs, like **RSAGA**. Shortly afterwards, **spgrass6** and now **rgrass7** introduced the possibility of creating a temporary GRASS LOCATION permitting GIS operations on data from the R side.
### GRASS sessions
The package may be used in two ways, either in an R session started from within a GRASS session from the command line, or with the `initGRASS()` function. The function may be used with an existing GRASS location and mapset, or with a one-time throw-away location, and takes the GRASS installation directory as its first argument. It then starts a GRASS session within the R session, and is convenient for scripting GRASS in R, rather than Python, which is be the GRASS scripting language in GRASS 7. Other arguments to `initGRASS()` may be used to set up the default region using standard tools like `Sys.setenv`; resolution and projection may be set or reset subsequently.
### Running GRASS from R
Each GRASS command takes an `--interface-description` flag, which when run returns an XML description of its flags and parameters. These descriptions are used by the GRASS GUI to populate its menus, and are also used in **rgrass7** to check that GRASS commands are used correctly. This also means that the `parseGRASS` function can set up an object in a searchable list on the R side of the interface, to avoid re-parsing interface descriptions that have already been encountered in a session.
The middle function is `doGRASS`, which takes the flags and parameters chosen, checks their validity --- especially type (real, integer, string), and constructs a command string. Note that multiple parameter values should be a vector of values of the correct type. Finally, `execGRASS` uses the `system` or `system2` function to execute the GRASS command with the chosen flag and parameter values; the `intern=` argument asks that what GRASS returns be placed in an R object.
In general use, `execGRASS` calls `doGRASS`, which in turn calls `parseGRASS`. Use of `execGRASS` has been simplified to permit parameters to be passed through the R ellipsis ($\ldots$) argument structure. Consequently, the scripter can readily compare [the help page of any GRASS command](https://grass.osgeo.org/grass76/manuals/index.html) with the version of the value returned by `parseGRASS` showing the parameters and flags expected. GRASS add-ons are also accommodated in the same `parseGRASS` procedure of parsing and caching. We will not need more complex setups here, but it is easy to see that for example `execGRASS` may be run in an R loop with varying parameter values.
### Initialize temporary GRASS session
Here we need three objects to be created, and also set `override=` to `TRUE`, as this document may be run many times. `initGRASS()` looks for an environment variable that GRASS sessions set (`GISRC`) pointing to a file of GRASS environment variables. Real GRASS sessions remove it on exit, but this interface does not (yet) provide for its removal, hence the need here to override.
```{r, echo=TRUE}
library(sf)
```
```{r, echo=TRUE}
olinda_sirgas2000 <- st_read("output/olinda_sirgas2000.gpkg", quiet=TRUE)
bounds <- st_sf(st_union(olinda_sirgas2000))
SG <- maptools::Sobj_SpatialGrid(as(bounds, "Spatial"), n=1000000)$SG
```
From **rgrass7** 0.2-1, the user needs to flag whether **sf**/**stars** or **sp**/**rgdal** object representations are being used, with `use_sp()` or `use_sf()`. This is only needed when objects rather than commands move across the interface; because no **stars** support is yet present, we need to use **sp** and **rgdal** support to set the location resolution.
```{r, echo=TRUE}
library(rgrass7)
packageVersion("rgrass7")
use_sp()
myGRASS <- "/home/rsb/topics/grass/g761/grass76"
myPROJSHARE <- "/usr/local/share/proj"
if (Sys.getenv("GRASS_PROJSHARE") == "") Sys.setenv(GRASS_PROJSHARE=myPROJSHARE)
loc <- initGRASS(myGRASS, tempdir(), SG=SG, override=TRUE)
```
### Setting the projection correctly
As yet `initGRASS` does not set the projection from the input `"SpatialGrid"` object, so we have to do it ourselves, showing how to pass R objects to GRASS parameters:
```{r, echo=TRUE}
execGRASS("g.mapset", mapset="PERMANENT", flag="quiet")
execGRASS("g.proj", flag="c", proj4=st_crs(bounds)$proj4string)
execGRASS("g.mapset", mapset=loc$MAPSET, flag="quiet")
execGRASS("g.region", flag="d")
```
We read the elevation data downloaded before into the GRASS location directly:
```{r, echo=TRUE}
execGRASS("r.in.gdal", flag=c("overwrite", "quiet"), input="output/elevation.tif", output="dem")
execGRASS("g.region", raster="dem")
```
Next, we run `r.watershed` on this high resolution digital elevation model, outputting raster stream lines, then thinned with `r.thin`:
```{r, echo=TRUE}
execGRASS("r.watershed", flag=c("overwrite", "quiet"), elevation="dem", stream="stream", threshold=2500L, convergence=5L, memory=300L)
execGRASS("r.thin", flag=c("overwrite", "quiet"), input="stream", output="stream1", iterations=200L)
```
To mask the output object we switch to the **sf** vector representation, copy `bounds` to GRASS, and set a raster mask using the bounds of the union of tracts. Then we convert the thinned stream lines within the mask to vector representation, and copy this object from GRASS to the R workspace. In both cases, we use GPKG representation for intermediate files.
```{r, echo=TRUE}
use_sf()
writeVECT(bounds, "bounds", v.in.ogr_flags=c("overwrite", "quiet"))
execGRASS("r.mask", vector="bounds", flag=c("overwrite", "quiet"))
execGRASS("r.to.vect", flag=c("overwrite", "quiet"), input="stream1", output="stream", type="line")
imputed_streams <- readVECT("stream", ignore.stderr=TRUE)
```
```{r, echo=TRUE, warning=FALSE}
library(mapview)
mapview(imputed_streams)
```
We can also calculate geomorphometric values, including the simple slope and aspect values for the masked raster using `r.slope.aspect`. If we then move the Olinda setor boundaries to GRASS, we can use `v.rast.stats` to summarize the raster values falling within each setor, here for the geomorphometric measures.
```{r, echo=TRUE}
execGRASS("r.slope.aspect", elevation="dem", slope="slope", aspect="aspect", flag=c("quiet", "overwrite"))
writeVECT(olinda_sirgas2000[, "SETOR_"], "olinda", ignore.stderr=TRUE, v.in.ogr_flags=c("overwrite", "quiet"))
execGRASS("v.rast.stats", map="olinda", raster=c("slope", "aspect"), method=c("first_quartile", "median", "third_quartile"), column_prefix=c("slope", "aspect"), flag=c("c", "quiet"))
```
We can do the same for the Landsat 7 NDVI values:
```{r, echo=TRUE}
execGRASS("r.in.gdal", flag=c("overwrite", "quiet"), input="output/L7_ndvi.tif", output="ndvi")
execGRASS("g.region", raster="ndvi")
execGRASS("v.rast.stats", map="olinda", raster="ndvi", method=c("first_quartile", "median", "third_quartile"), column_prefix="ndvi", flag=c("c", "quiet"))
```
```{r, echo=TRUE}
olinda_gmm_ndvi <- readVECT("olinda", ignore.stderr=TRUE)
head(olinda_gmm_ndvi)
```
## Exercises and review
### Broad Street Cholera Data
```{r echo=FALSE}
knitr::include_graphics('snowmap.png')
```
Even though we know that John Snow already had a working
hypothesis about cholera epidemics, his data remain interesting,
especially if we use a GIS to find the street distances from
mortality dwellings to the Broad Street pump in Soho in central
London. Brody et al. [-@brodyetal:00] point out that John Snow did not use
maps to *find* the Broad Street pump, the polluted water source
behind the 1854 cholera epidemic, because he associated cholera
with water contaminated with sewage, based on earlier experience.
### Broad Street Cholera Data
The basic data to be used here were made available by Jim Detwiler, who had collated them for David O'Sullivan for use on the cover of O'Sullivan and Unwin [-@osullivan+unwin:03], based on earlier work by Waldo Tobler and others. The files were a shapefile of counts of deaths at front doors of houses, two shapefiles of pump locations and a georeferenced copy of the Snow map as an image; the files were registered in the British National Grid CRS. These have been converted to GPKG format. In GRASS, a suitable location was set up in this CRS and the image file was imported; the building contours were then digitised as a vector layer and cleaned.
```{r echo=FALSE}
knitr::include_graphics('brodyetal00_fig1.png')
```
We would like to find the line of equal distances shown on the extract from John Snow's map shown in Brody et al. [-@brodyetal:00] shown here, or equivalently find the distances from the pumps to the front doors of houses with mortalities following the roads, not the straight line distance. We should recall that we only have the locations of counts of mortalities, not of people at risk or of survivors.
```{r, echo=TRUE}
library(sf)
bbo <- st_read("data/snow/bbo.gpkg")
fixed <- "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +nadgrids=OSTN15_NTv2_OSGBtoETRS.gsb +units=m +no_defs"
st_crs(bbo) <- fixed
```
```{r, echo=TRUE}
library(rgrass7)
myPROJSHARE <- "/usr/local/share/proj"
if (Sys.getenv("GRASS_PROJSHARE") == "") Sys.setenv(GRASS_PROJSHARE=myPROJSHARE)
myGRASS <- "/home/rsb/topics/grass/g761/grass76"
td <- tempdir()
SG <- maptools::Sobj_SpatialGrid(as(bbo, "Spatial"))$SG
use_sp()
soho <- initGRASS(gisBase=myGRASS, home=td, SG=SG, override=TRUE)
soho
```
```{r, echo=TRUE}
MAPSET <- execGRASS("g.mapset", flags="p", intern=TRUE)
execGRASS("g.mapset", mapset="PERMANENT", flags="quiet")
execGRASS("g.proj", flags=c("p", "quiet"))
execGRASS("g.proj", proj4=st_crs(bbo)$proj4string, flags=c("c", "quiet"))
```
```{r, echo=TRUE}
execGRASS("g.mapset", mapset=MAPSET, flags="quiet")
execGRASS("g.region", flags="p", intern=TRUE)[3:11]
execGRASS("g.region", flags="a", res="1")
execGRASS("g.region", flags="p", intern=TRUE)[3:11]
```
```{r, echo=TRUE, warning=FALSE}
buildings <- st_read("data/snow/buildings.gpkg", quiet=TRUE)
st_crs(buildings) <- fixed
deaths <- st_read("data/snow/deaths.gpkg", quiet=TRUE)
st_crs(deaths) <- fixed
sum(deaths$Num_Css)
b_pump <- st_read("data/snow/b_pump.gpkg", quiet=TRUE)
st_crs(b_pump) <- fixed
nb_pump <- st_read("data/snow/nb_pump.gpkg", quiet=TRUE)
st_crs(nb_pump) <- fixed
```
```{r, echo=TRUE, warning=FALSE}
use_sf()
fl <- c("overwrite", "quiet")
writeVECT(bbo, vname="bbo", v.in.ogr_flags=c("o", fl), ignore.stderr=TRUE)
writeVECT(buildings[,1], vname="buildings", v.in.ogr_flags=c("o", fl), ignore.stderr=TRUE)
writeVECT(b_pump, vname="b_pump", v.in.ogr_flags=c("o", fl), ignore.stderr=TRUE)
writeVECT(nb_pump, vname="nb_pump", v.in.ogr_flags=c("o", fl), ignore.stderr=TRUE)
writeVECT(deaths, vname="deaths", v.in.ogr_flags=c("o", fl), ignore.stderr=TRUE)
execGRASS("g.list", type="vector", intern=TRUE)
```
### GIS workflow
The buildings vector layer should be converted to its inverse (not buildings), and these roads should then be buffered to include the front doors (here 4m). These operations can be done in the raster or vector representation, but the outcome here will be a raster object from which to find the cost in 1 metre resolution of moving from each front door to each pump. We then need to extract the distance to the Broad Street pump, and to the nearest other pump, for each front door. We could also use vector street centre lines to build a network, and used graph-based methods to find the shortest paths from each front door to the pumps.
### Create roads and convert to raster
First, we cut the buildings out of the extent polygon to leave the roads. Having set the region resolution to 1x1m squares we can convert the vector roads to raster, and can tabulate raster cell values, where asterisks are missing data cells:
```{r , echo = TRUE, mysize=TRUE, size='\\tiny'}
execGRASS("v.overlay", ainput="buildings", binput="bbo", operator="xor", output="roads", flags=fl, ignore.stderr = TRUE)
execGRASS("v.to.rast", input="roads", output="rroads", use="val", value=1, flags=fl)
execGRASS("r.stats", input="rroads", flags=c("c", "quiet"))
```
### Buffer and reclass
We also need to buffer out the roads by an amount sufficient to include the the front door points within the roads --- 4m was found by trial and error and may be too much, giving shorter distances than a thinner buffer would yield. Reclassification of the raster to give only unit cost is also needed:
```{r , echo = TRUE, mysize=TRUE, size='\\tiny'}
execGRASS("r.buffer", input="rroads", output="rroads4", distances=4, flags=fl)
execGRASS("r.stats", input="rroads4", flags=c("c", "quiet"))
tf <- tempfile()
cat("1 2 = 1\n", file=tf)
execGRASS("r.reclass", input="rroads4", output="rroads4a", rules=tf, flags=fl)
execGRASS("r.stats", input="rroads4a", flags=c("c", "quiet"))
```
### Generate distance maps
The `r.cost` command returns a raster with cells set as the cost of moving from the vector start point or points to each cell; we do this twice, once for the Broad Street pump, and then for the other pumps:
```{r , echo = TRUE, mysize=TRUE, size='\\tiny'}
execGRASS("r.cost", input="rroads4a", output="dist_broad", start_points="b_pump", flags=fl)
execGRASS("r.cost", input="rroads4a", output="dist_not_broad", start_points="nb_pump", flags=fl)
```
### Pump to front door distances
Finally, we examine the values of these two distance maps at the front door points, and add these fields (columns) to the vector mortality map:
```{r , echo = TRUE, mysize=TRUE, size='\\tiny'}
execGRASS("v.db.addcolumn", map="deaths", columns="broad double precision", flags="quiet")
execGRASS("v.what.rast", map="deaths", raster="dist_broad", column="broad", flags="quiet")
execGRASS("v.db.addcolumn", map="deaths", columns="not_broad double precision", flags="quiet")
execGRASS("v.what.rast", map="deaths", raster="dist_not_broad", column="not_broad", flags="quiet")
```
### Mortality counts by pump nearness
Moving the data back to R from GRASS permits operations on the distance values. We set the logical variable `b_nearer` to TRUE if the distance to the Broad Street pump is less than the distance to the nearest other pump:
```{r , echo = TRUE, mysize=TRUE, size='\\tiny'}
deaths1 <- readVECT("deaths", ignore.stderr=TRUE)
deaths1$b_nearer <- deaths1$broad < deaths1$not_broad
by(deaths1$Num_Css, deaths1$b_nearer, sum)
```
## Using R as a GIS (topological and other operations)
There is a recently published article (https://onlinelibrary.wiley.com/doi/10.1111/ecog.04617) on the **landscapemetrics** package [@doi:10.1111/ecog.04617]; there are now many R packages for operations previously performed in GIS. It may be the case that even for moderate data set sizes, GIS are more performant, and this will almost certainly be the case for larger data sets, although sensible use of proxy datasets not in memory, and scaling operations to evaluate for the required output resolution may help. Here we first use the imputed streams to show topological operations, first the proportion of the area of setors that are within 50m of an imputed stream:
```{r, echo=TRUE, warning=FALSE}
water_buf_50 <- st_buffer(imputed_streams, dist=50)
setor_area <- st_area(olinda_sirgas2000)
near_water0 <- st_intersection(olinda_sirgas2000[,"SETOR_"], water_buf_50[,"cat"])
near_water <- aggregate(near_water0, by=list(near_water0$SETOR_), head, n=1)
```
```{r, echo=TRUE}
area_near_water <- st_area(near_water)
olinda_sirgas2000$setor_area <- setor_area
o <- match(near_water$SETOR_, olinda_sirgas2000$SETOR_)
olinda_sirgas2000$area_near_water <- 0
olinda_sirgas2000$area_near_water[o] <- area_near_water
olinda_sirgas2000$prop_near_water <- olinda_sirgas2000$area_near_water/olinda_sirgas2000$setor_area
summary(olinda_sirgas2000$prop_near_water)
```
```{r, echo=TRUE}
library(tmap)
tm_shape(olinda_sirgas2000) + tm_fill("prop_near_water", palette="Blues", style="fisher", n=5)
```
Next, we examine the length of imputed streams per unit area by setor, again using intersection followed by aggregation to the setors:
```{r, echo=TRUE, warning=FALSE}
streams_by_setor <- st_intersection(olinda_sirgas2000[,"SETOR_"], imputed_streams[,"cat"])
lngths0 <- aggregate(streams_by_setor, by=list(streams_by_setor$SETOR_), head, n=1)
lngths <- st_length(lngths0)
units(lngths) <- "mm"
o <- match(lngths0$SETOR_, olinda_sirgas2000$SETOR_)
olinda_sirgas2000$lngths <- 0
olinda_sirgas2000$lngths[o] <- lngths
olinda_sirgas2000$lngth_area <- olinda_sirgas2000$lngths/olinda_sirgas2000$setor_area
```
```{r, echo=TRUE}
tm_shape(olinda_sirgas2000) + tm_fill("lngth_area", palette="Blues", style="fisher", n=5)
```
These two measures are highly correlated.
```{r, echo=TRUE}
cor(olinda_sirgas2000$lngth_area, olinda_sirgas2000$prop_near_water)
```
We can also use **raster** for operations on the raster objects that we have at our disposal.
```{r, echo=TRUE}
library(raster)
r <- raster("output/elevation.tif")
r
```
We can use `raster::terrain()` to calculate geomorphometric measures:
```{r, echo=TRUE, cache=TRUE}
slope_aspect <- terrain(r, opt=c('slope','aspect'), unit='degrees', neighbors=8)
```
and extract median values by setor polygon:
```{r, echo=TRUE, cache=TRUE}
slopes <- extract(slope_aspect, olinda_sirgas2000, fun=median)
```
as well as the median NDVI values:
```{r, echo=TRUE, cache=TRUE}
ndvi <- extract(raster("output/L7_ndvi.tif"), olinda_sirgas2000, fun=median)
```
```{r, echo=TRUE}
summary(ndvi[,1])
if (exists("olinda_gmm_ndvi")) summary(olinda_gmm_ndvi$ndvi_median)
```
```{r, echo=TRUE}
summary(slopes[,1])
if (exists("olinda_gmm_ndvi")) summary(olinda_gmm_ndvi$slope_median)
```
```{r, echo=TRUE}
summary(slopes[,2])
if (exists("olinda_gmm_ndvi")) summary(olinda_gmm_ndvi$aspect_median)
```
## Exercises and review
As there is a small difference between the CRS values, we copy across before conducting an intersection operation to clip the buildings to the boundary, then we buffer in the buildings object (to make the roads broader).
```{r, echo=TRUE, warning=FALSE}
library(sf)
buildings1 <- st_intersection(buildings, bbo)
buildings2 <- st_buffer(buildings1, dist=-4)
```
```{r, echo=TRUE, warning=FALSE}
library(mapview)
mapview(buildings2)
```
Next we create a dummy raster using **raster** with 1 meter resolution in the extent of the buildings object (note that `raster::extent()` works with **sf** objects, but the CRS must be given as a string):
```{r, echo=TRUE}
library(raster)
resolution <- 1
r <- raster(extent(buildings2), resolution=resolution, crs=fixed)
r[] <- resolution
summary(r)
```
One of the `building3` component geometries was empty (permitted in **sf**, not in **sp**), so should be dropped before running `raster::cellFromPolygon()` to list raster cells in each geometry (so we need `unlist()` to assign `NA` to the in-buffered buildings):
```{r, echo=TRUE, cache=TRUE}
buildings3 <- as(buildings2[!st_is_empty(buildings2),], "Spatial")
cfp <- cellFromPolygon(r, buildings3)
is.na(r[]) <- unlist(cfp)
summary(r)
```
```{r, echo=TRUE, warning=FALSE}
library(mapview)
mapview(r)
```
Using **gdistance**, we create a symmetric transition object with an internal sparse matrix representation, from which shortest paths can be computed:
```{r, echo=TRUE, warning=FALSE, message=FALSE}
library(gdistance)
```
```{r, echo=TRUE, cache=TRUE}
tr1 <- transition(r, transitionFunction=function(x) 1/mean(x), directions=8, symm=TRUE)
```
We need to find shortest paths from addresses with mortalities to the Broad Street pump first:
```{r, echo=TRUE, cache=TRUE}
sp_deaths <- as(deaths, "Spatial")
d_b_pump <- st_length(st_as_sfc(shortestPath(tr1, as(b_pump, "Spatial"), sp_deaths, output="SpatialLines")))
```
and then in a loop from the same addresses to each of the other pumps in turn, finally taking the minimum:
```{r, echo=TRUE, cache=TRUE}
res <- matrix(NA, ncol=nrow(nb_pump), nrow=nrow(deaths))
sp_nb_pump <- as(nb_pump, "Spatial")
for (i in 1:nrow(nb_pump)) res[,i] <- st_length(st_as_sfc(shortestPath(tr1, sp_nb_pump[i,], sp_deaths, output="SpatialLines")))
d_nb_pump <- apply(res, 1, min)
```
Because `sf::st_length()` uses **units** units, but they get lost in assigning to a matrix, we need to re-assign before testing whether the Broad Street pump is closer or not:
```{r, echo=TRUE}
library(units)
units(d_nb_pump) <- "m"
deaths$b_nearer <- d_b_pump < d_nb_pump
by(deaths$Num_Css, deaths$b_nearer, sum)
```