Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to c5 #698

Merged
merged 10 commits into from
Dec 19, 2021
82 changes: 55 additions & 27 deletions 05-geometry-operations.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,13 @@ library(spDataLarge)

## Introduction

The previous three chapters have demonstrated how geographic datasets are structured in R (Chapter \@ref(spatial-class)) and how to manipulate them based on their non-geographic attributes (Chapter \@ref(attr)) and spatial properties (Chapter \@ref(spatial-operations)).
This chapter extends these skills.
After reading it --- and attempting the exercises at the end --- you should understand and have control over the geometry column in `sf` objects and the geographic location of pixels represented in rasters.
So far the book has explained the structure of geographic datasets (Chapter \@ref(spatial-class)), and how to manipulate them based on their non-geographic attributes (Chapter \@ref(attr)) and spatial relations (Chapter \@ref(spatial-operations)).
This chapter focusses on manipulating the geographic elements of geographic objects, for example by simplifying and converting vector geometries, cropping raster datasets, and converting vector objects into rasters and from rasters into vectors.
After reading it --- and attempting the exercises at the end --- you should understand and have control over the geometry column in `sf` objects and the extent and geographic location of pixels represented in rasters in relation to other geographic objects.

Section \@ref(geo-vec) covers transforming vector geometries with 'unary' and 'binary' operations.
Unary operations work on a single geometry in isolation.
This includes simplification (of lines and polygons), the creation of buffers and centroids, and shifting/scaling/rotating single geometries using 'affine transformations' (Sections \@ref(simplification) to \@ref(affine-transformations)).
Binary transformations modify one geometry based on the shape of another.
This includes clipping and geometry unions\index{vector!union}, covered in Sections \@ref(clipping) and \@ref(geometry-unions), respectively.
Unary operations work on a single geometry in isolation, including simplification (of lines and polygons), the creation of buffers and centroids, and shifting/scaling/rotating single geometries using 'affine transformations' (Sections \@ref(simplification) to \@ref(affine-transformations)).
Binary transformations modify one geometry based on the shape of anothe, including clipping and geometry unions\index{vector!union}, covered in Sections \@ref(clipping) and \@ref(geometry-unions), respectively.
Type transformations (from a polygon to a line, for example) are demonstrated in Section \@ref(type-trans).

Section \@ref(geo-ras) covers geometric transformations on raster objects.
Expand Down Expand Up @@ -321,62 +319,92 @@ two overlapping circles with a center point one unit away from each other and a
```{r points, fig.cap="Overlapping circles.", fig.asp=1, out.width="50%"}
b = st_sfc(st_point(c(0, 1)), st_point(c(1, 1))) # create 2 points
b = st_buffer(b, dist = 1) # convert points to circles
plot(b)
text(x = c(-0.5, 1.5), y = 1, labels = c("x", "y")) # add text
plot(b, border = "grey")
text(x = c(-0.5, 1.5), y = 1, labels = c("x", "y"), cex = 3) # add text
```

Imagine you want to select not one circle or the other, but the space covered by both `x` *and* `y`.
This can be done using the function `st_intersection()`\index{vector!intersection}, illustrated using objects named `x` and `y` which represent the left- and right-hand circles (Figure \@ref(fig:circle-intersection)).

```{r circle-intersection, fig.cap="Overlapping circles with a gray color indicating intersection between them.", fig.asp=1, out.width="50%", , fig.scap="Overlapping circles showing intersection types."}
```{r circle-intersection, fig.cap="Overlapping circles with a gray color indicating intersection between them.", fig.asp=1, out.width="50%", fig.scap="Overlapping circles showing intersection types."}
x = b[1]
y = b[2]
x_and_y = st_intersection(x, y)
plot(b)
plot(x_and_y, col = "lightgrey", add = TRUE) # color intersecting area
plot(b, border = "grey")
plot(x_and_y, col = "lightgrey", border = "grey", add = TRUE) # color intersecting area
```

The subsequent code chunk demonstrates how this works for all combinations of the 'Venn' diagram representing `x` and `y`, inspired by [Figure 5.1](http://r4ds.had.co.nz/transform.html#logical-operators) of the book *R for Data Science* [@grolemund_r_2016].

```{r venn-clip, echo=FALSE, fig.cap="Spatial equivalents of logical operators.", warning=FALSE}
source("https://github.com/Robinlovelace/geocompr/raw/main/code/05-venn-clip.R")
# source("code/05-venn-clip.R") # for testing local version, todo: remove or change
```

To illustrate the relationship between subsetting and clipping spatial data, we will subset points that cover the bounding box of the circles `x` and `y` in Figure \@ref(fig:venn-clip).
### Subsetting and clipping

Clipping objects can change their geometry but it can also subset objects, returning only features that intersect (or partly intersect) with a clipping/subsetting object.
To illustrate this point, we will subset points that cover the bounding box of the circles `x` and `y` in Figure \@ref(fig:venn-clip).
Some points will be inside just one circle, some will be inside both and some will be inside neither.
`st_sample()` is used below to generate a *simple random* distribution of points within the extent of circles `x` and `y`, resulting in output illustrated in Figure \@ref(fig:venn-subset).
`st_sample()` is used below to generate a *simple random* distribution of points within the extent of circles `x` and `y`, resulting in output illustrated in Figure \@ref(fig:venn-subset), raising the question: how to subset the points to only return the point that intersects with *both* `x` and `y`?

```{r venn-subset, fig.cap="Randomly distributed points within the bounding box enclosing circles x and y.", out.width="50%", fig.asp=1, fig.scap="Randomly distributed points within the bounding box."}
```{r venn-subset, fig.cap="Randomly distributed points within the bounding box enclosing circles x and y. The point that intersects with both objects x and y is highlighted.", fig.height=6, fig.width=9, fig.asp=0.6, fig.scap="Randomly distributed points within the bounding box. Note that only one point intersects with both x and y, highlighted with a red circle.", echo=FALSE}
bb = st_bbox(st_union(x, y))
box = st_as_sfc(bb)
set.seed(2017)
p = st_sample(x = box, size = 10)
plot(box)
plot(x, add = TRUE)
plot(y, add = TRUE)
p_xy1 = p[x_and_y]
plot(box, border = "grey", lty = 2)
plot(x, add = TRUE, border = "grey")
plot(y, add = TRUE, border = "grey")
plot(p, add = TRUE)
text(x = c(-0.5, 1.5), y = 1, labels = c("x", "y"))
plot(p_xy1, cex = 3, col = "red", add = TRUE)
text(x = c(-0.5, 1.5), y = 1, labels = c("x", "y"), cex = 2)
```

The logical operator way would find the points inside both `x` and `y` using a spatial predicate such as `st_intersects()`, whereas the intersection\index{vector!intersection} method simply finds the points inside the intersecting region created above as `x_and_y`.
As demonstrated below the results are identical, but the method that uses the clipped polygon is more concise:

```{r venn-subset-to-show, eval=FALSE}
bb = st_bbox(st_union(x, y))
box = st_as_sfc(bb)
set.seed(2017)
p = st_sample(x = box, size = 10)
x_and_y = st_intersection(x, y)
```

The code chunk below demonstrates three ways to achieve the same result.
We can use the intersection\index{vector!intersection} of `x` and `y` (represented by `x_and_y` in the previous code chunk) as a subsetting object directly, as shown in the first line in the code chunk below.
We can also find the *intersection* between the input points represented by `p` and the subsetting/clipping object `x_and_y`, as demonstrated in the second line in the code chunk below.
This second approach will return features that partly intersect with `x_and_y` but with modified geometries for spatially extensive features that cross the border of the subsetting object.
The third approach is to create a subsetting object using the binary spatial predicate `st_intersects()`, introduced in the previous chapter.
The results are identical (except superficial differences in attribute names), but the implementation differs substantially:

```{r 05-geometry-operations-21}
p_xy1 = p[x_and_y]
p_xy2 = st_intersection(p, x_and_y)
sel_p_xy = st_intersects(p, x, sparse = FALSE)[, 1] &
st_intersects(p, y, sparse = FALSE)[, 1]
p_xy1 = p[sel_p_xy]
p_xy2 = p[x_and_y]
identical(p_xy1, p_xy2)
p_xy3 = p[sel_p_xy]
```


```{r 05-geometry-operations-22, echo=FALSE, eval=FALSE}
# test if objects are identical:
identical(p_xy1, p_xy2)
identical(p_xy2, p_xy3)
identical(p_xy1, p_xy3)
waldo::compare(p_xy1, p_xy2) # the same except attribute names
waldo::compare(p_xy2, p_xy3) # the same except attribute names


# An alternative way to sample from the bb
bb = st_bbox(st_union(x, y))
pmulti = st_multipoint(pmat)
box = st_convex_hull(pmulti)
```

Although the example above is rather contrived and provided for educational rather than applied purposes, and we encourage the reader to reproduce the results to deepen your understanding for handling geographic vector objects in R, it raises an important question: which implementation to use?
Generally, more concise implementations should be favored, meaning the first approach above.
We will return to the question of choosing between different implementations of the same technique or algorithm in Chapter \@ref(algorithms).

### Geometry unions

\index{vector!union}
Expand Down Expand Up @@ -636,7 +664,7 @@ If two rasters have different origins, their cells do not overlap completely whi
To change the origin -- use `origin()`.^[
If the origins of two raster datasets are just marginally apart, it sometimes is sufficient to simply increase the `tolerance` argument of `terra::terraOptions()`.
]
Looking at Figure \@ref(fig:origin-example) reveals the effect of changing the origin.
Figure \@ref(fig:origin-example) reveals the effect of changing the origin in this way.

```{r}
# change the origin
Expand All @@ -653,7 +681,7 @@ tm_shape(elev4_poly) +
tm_polygons(col = "elev") +
tm_layout(frame = FALSE, legend.show = FALSE,
inner.margins = c(0.1, 0.12, 0, 0))
# See https://github.com/Robinlovelace/geocompr/issues/695
# # See https://github.com/Robinlovelace/geocompr/issues/695
# knitr::include_graphics("https://user-images.githubusercontent.com/1825120/146618199-786fe3ad-9718-4dd0-a640-41180fc17e63.png")
```

Expand Down
3 changes: 3 additions & 0 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,9 @@ Since commencing work on the Second Edition in September 2021 much has changed,
- Replacement of `raster` with `terra` in Chapters 1 to 7 (see commits related to this update [here](https://github.com/Robinlovelace/geocompr/search?q=terra&type=commits))
- Update of Chapter 7 to include mention alternative ways or reading-in OSM data in [#656](https://github.com/Robinlovelace/geocompr/pull/656)
- Refactor build settings so the book builds on Docker images in the [geocompr/docker](https://github.com/geocompr/docker) repo
- Improve the experience of using the book in Binder (ideal for trying out the code before installing or updating the necessary R packages), as documented in issue [#691](https://github.com/Robinlovelace/geocompr/issues/691) (thanks to [yuvipanda](https://github.com/yuvipanda))
- Improved communication of binary spatial predicates in Chapter 4 (see [#675](https://github.com/Robinlovelace/geocompr/pull/675))
- New section on the links between subsetting and clipping (see [#698](https://github.com/Robinlovelace/geocompr/pull/698)) in Chapter 5
<!-- Todo: update this bullet point (Rl 2021-11) -->
<!-- - Next issue -->

Expand Down
49 changes: 25 additions & 24 deletions code/05-venn-clip.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,33 +8,34 @@ if(!exists("b")) {
x_and_y = st_intersection(x, y)
}

old_par = par(mfrow = c(3, 3), mai = c(0.1, 0.1, 0.1, 0.1))
plot(b)
old_par = par(mfrow = c(2, 3), mai = c(0.1, 0.1, 0.1, 0.1))
plot(b, border = "grey")
plot(x, add = TRUE, col = "lightgrey", border = "grey")
text(cex = 1.2, x = 0.5, y = 1, "x")
plot(b, add = TRUE, border = "grey")
x_not_y = st_difference(x, y)
plot(b, border = "grey")
plot(x_not_y, col = "lightgrey", add = TRUE, border = "grey")
text(cex = 1.2, x = 0.5, y = 1, "st_difference(x, y)")

y_not_x = st_difference(y, x)
plot(y_not_x, col = "grey", add = TRUE)
text(x = 0.5, y = 1, "st_difference(y, x)")
plot(b)
plot(x, add = TRUE, col = "grey")
text(x = 0.5, y = 1, "x")
plot(b, add = TRUE)
plot(b, border = "grey")
plot(y_not_x, col = "lightgrey", add = TRUE, border = "grey")
text(cex = 1.2, x = 0.5, y = 1, "st_difference(y, x)")
x_or_y = st_union(x, y)
plot(x_or_y, col = "grey")
text(x = 0.5, y = 1, "st_union(x, y)")
plot(x_or_y, col = "lightgrey", border = "grey")
text(cex = 1.2, x = 0.5, y = 1, "st_union(x, y)")
x_and_y = st_intersection(x, y)
plot(b)
plot(x_and_y, col = "grey", add = TRUE)
text(x = 0.5, y = 1, "st_intersection(x, y)")
plot(b, border = "grey")
plot(x_and_y, col = "lightgrey", add = TRUE, border = "grey")
text(cex = 1.2, x = 0.5, y = 1, "st_intersection(x, y)")
# x_xor_y = st_difference(x_xor_y, x_and_y) # failing
x_not_y = st_difference(x, y)
x_xor_y = st_sym_difference(x, y)
plot(x_xor_y, col = "grey")
text(x = 0.5, y = 1, "st_sym_difference(x, y)")
plot.new()
plot(b)
plot(x_not_y, col = "grey", add = TRUE)
text(x = 0.5, y = 1, "st_difference(x, y)")
plot(b)
plot(y, col = "grey", add = TRUE)
plot(b, add = TRUE)
text(x = 0.5, y = 1, "y")
plot(x_xor_y, col = "lightgrey", border = "grey")
text(cex = 1.2, x = 0.5, y = 1, "st_sym_difference(x, y)")
# plot.new()
# plot(b, border = "grey")
# plot(y, col = "lightgrey", add = TRUE, border = "grey")
# plot(b, add = TRUE, border = "grey")
# text(cex = 1.2, x = 0.5, y = 1, "y")
par(old_par)