Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update watersurfaces 2024 and watersurfaces_hab v6 #73

Merged
merged 6 commits into from
Jan 9, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ We will create a map with watersurfaces that contain aquatic habitat and rib. Th

* The water surfaces map of Flanders `watersurfaces`

To be sure we will use the correct version of the data sources (version 2023 for the processed habitatmap and version 1.2 for watersurfaces), we will first derive the md5 file hashes and compare them to the file hashes in the [data source version overview table](https://docs.google.com/spreadsheets/d/1E8ERlfYwP3OjluL8d7_4rR1W34ka4LRCE35JTxf3WMI/edit#gid=2100595853)
To be sure we will use the correct version of the data sources (version 2023 for the processed habitatmap and version 2024 for watersurfaces), we will first derive the md5 file hashes and compare them to the file hashes in the [data source version overview table](https://docs.google.com/spreadsheets/d/1E8ERlfYwP3OjluL8d7_4rR1W34ka4LRCE35JTxf3WMI/edit#gid=2100595853)

### Processed habitatmap

Expand All @@ -29,7 +29,7 @@ hashes <-
md5 = map(filepath, function(x) {
x %>% md5sum() %>% str_c(collapse = '')
}) %>% as.character) %>%
mutate(md5_ref = c("5e9a0cb2a53f88001796bd7457a343ac"),
mutate(md5_ref = c("5e9a0cb2a53f88001796bd7457a343ac"), # version 2023_v1
match = md5 == md5_ref) %>%
select(filename,
md5,
Expand Down Expand Up @@ -70,7 +70,7 @@ hashes <-
md5 = map(filepath, function(x) {
x %>% md5sum() %>% str_c(collapse = '')
}) %>% as.character) %>%
mutate(md5_ref = c("72f6575ae7095622cd92eb2be720c7cb"), # version 1.2
mutate(md5_ref = c("d862df5b5e9ee8a2de4c333a7dcd7645"), # version 2024
match = md5 == md5_ref) %>%
select(filename,
md5,
Expand All @@ -84,6 +84,9 @@ if (!all.equal(hashes$md5, hashes$md5_ref)) {
stop(cat("The source map is NOT up to date ! Please check the datasource. "))
}

# Official n2khab not updated yet, I work with an updated dev version installed with
# remotes::install_github(repo="inbo/n2khab", ref = "bd784a5") #0.11.0.9000
# and included in renv

# load watersurfaces with corrected geometry
# (argument fix_geom available since n2khab 0.9.0)
Expand Down
86 changes: 44 additions & 42 deletions src/generate_watersurfaces_hab/20_check_result.Rmd
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Check results

## The new version v5
## The new version v6

Based on habitatmap_stdized_2023_v1 and watersurfaces_v1.2
Based on habitatmap_stdized_2023_v1 and watersurfaces_2024

Checksums:

Expand Down Expand Up @@ -66,59 +66,59 @@ sum(!validities | is.na(validities)) == 0

## Compare with the previous version

We load the previous version of watersurfaces_hab (version 4 based on watersurfaces_v1.1 and habitatmap_stdized_2020_v1)
We load the previous version of watersurfaces_hab (version 5 based on watersurfaces_v1.2 and habitatmap_stdized_2023_v1)

```{r paged.print=FALSE, warning=FALSE}

filepath_v4 <- file.path(path,"20_processed/_versions/watersurfaces_hab/watersurfaces_hab_v4/watersurfaces_hab.gpkg")
filepath_v5 <- file.path(path,"20_processed/_versions/watersurfaces_hab/watersurfaces_hab_v5/watersurfaces_hab.gpkg")

hashes <-
tibble(filepath_v4) %>%
mutate(filename = basename(filepath_v4),
md5 = map(filepath_v4, function(x) {
tibble(filepath_v5) %>%
mutate(filename = basename(filepath_v5),
md5 = map(filepath_v5, function(x) {
x %>% md5sum() %>% str_c(collapse = '')
}) %>% as.character) %>%
mutate(md5_ref = c("96b6a7abc3d637d71052900970e70904"),
mutate(md5_ref = c("e7d9930938f5111de33de6ecaec31a66"),
match = md5 == md5_ref) %>%
select(filename,
md5,
md5_ref,
match)

if (!all.equal(hashes$md5, hashes$md5_ref)) {
stop(cat("The source map is NOT v4 ! Please check the datasource. "))
stop(cat("The source map is NOT v5 ! Please check the datasource. "))
}

(pol_v4 <- read_sf(filepath_v4,
(pol_v5 <- read_sf(filepath_v5,
layer = "watersurfaces_hab_polygons"))
(types_v4 <- read_sf(filepath_v4,
(types_v5 <- read_sf(filepath_v5,
layer = "watersurfaces_hab_types"))
```

- Are there differences between version v4 and version v5 and where are they located?
- Are there differences between version v5 and version v6 and where are they located?

In the table below we check the differences between both versions.
Note that for a large number of records only polygon_habitatmap_id changes, but the geometry and the type description remain the same.

```{r}
# polygons with polygon_id in v5 but not in version v4
check_polygon_id_v5_v4 <- pol %>%
anti_join(pol_v4 %>%
# polygons with polygon_id in v6 but not in version v5
check_polygon_id_v6_v5 <- pol %>%
anti_join(pol_v5 %>%
st_drop_geometry(),
by = c("polygon_id_habitatmap", "polygon_id_ws")) %>%
left_join(pol_v4 %>%
mutate(geom_text_v4 = st_as_text(geom)) %>%
left_join(pol_v5 %>%
mutate(geom_text_v5 = st_as_text(geom)) %>%
st_drop_geometry() %>%
select(polygon_id, description_orig_v4 = description_orig, geom_text_v4),
select(polygon_id, description_orig_v5 = description_orig, geom_text_v5),
by = c("polygon_id")) %>%
mutate(new_polygon_id = !(polygon_id %in% pol_v4$polygon_id),
new_polygon_id_ws = !(polygon_id_ws %in% pol_v4$polygon_id_ws),
new_polygon_id_habitatmap = !(polygon_id_habitatmap %in% pol_v4$polygon_id_habitatmap),
description_orig_update = description_orig != description_orig_v4,
geom_text_v5 = st_as_text(geom),
geom_update = geom_text_v5 != geom_text_v4)

check_polygon_id_v5_v4 %>%
mutate(new_polygon_id = !(polygon_id %in% pol_v5$polygon_id),
new_polygon_id_ws = !(polygon_id_ws %in% pol_v5$polygon_id_ws),
new_polygon_id_habitatmap = !(polygon_id_habitatmap %in% pol_v5$polygon_id_habitatmap),
description_orig_update = description_orig != description_orig_v5,
geom_text_v6 = st_as_text(geom),
geom_update = geom_text_v6 != geom_text_v5)

check_polygon_id_v6_v5 %>%
st_drop_geometry() %>%
group_by(new_polygon_id, new_polygon_id_ws, new_polygon_id_habitatmap, geom_update, description_orig_update) %>%
summarise(n_records = n()) %>%
Expand All @@ -128,26 +128,28 @@ check_polygon_id_v5_v4 %>%

```

We check some of the polygons for which the geometry has changed.
Changes are minor.
We check some of the polygons for which the geometry has changed.

In this case there are only 2 polygons with modified geometry
Changes are minor for Stappersven, and bigger extent for Houtsaegerduinen.

```{r}
check_geom <- check_polygon_id_v5_v4 %>%
check_geom <- check_polygon_id_v6_v5 %>%
filter(geom_update & !is.na(geom_update)) %>%
slice_head(n = 5) %>%
st_transform(4326)

check_geom_v4 <- pol_v4 %>%
check_geom_v5 <- pol_v5 %>%
filter(polygon_id %in% check_geom$polygon_id) %>%
st_transform(4326)

check_geom %>%
leaflet() %>%
addTiles() %>%
addPolygons(group = "v5") %>%
addPolygons(data = check_geom_v4, color = "yellow", group = "v4") %>%
addPolygons(group = "v6") %>%
addPolygons(data = check_geom_v5, color = "red", group = "v5") %>%
addLayersControl(
overlayGroups = c("v5", "v4"),
overlayGroups = c("v6 (blue)", "v5 (red)"),
options = layersControlOptions(collapsed = FALSE)
)
```
Expand All @@ -156,43 +158,43 @@ check_geom %>%


```{r}
check_polygon_id_v4_v5 <- pol_v4 %>%
check_polygon_id_v5_v6 <- pol_v5 %>%
anti_join(pol %>%
st_drop_geometry(),
by = c("polygon_id_habitatmap", "polygon_id_ws")) %>%
mutate(ws_removed = !(polygon_id_ws %in% pol$polygon_id_ws))

nrow(check_polygon_id_v4_v5)
nrow(check_polygon_id_v5_v6)
```

Here we show:

+ new polygons from the watersurfaces layer that are included in `watersurfaces_hab_v5` (blue polygons)
+ polygons from the watersurfaces layer that are removed in `watersurfaces_hab_v5` compared to `watersurfaces_hab_v4` (black polygons)
+ new polygons from the watersurfaces layer that are included in `watersurfaces_hab_v6` (blue polygons)
+ polygons from the watersurfaces layer that are removed in `watersurfaces_hab_v6` compared to `watersurfaces_hab_v5` (black polygons)


```{r}

ws_new <- check_polygon_id_v5_v4 %>%
ws_new <- check_polygon_id_v6_v5 %>%
filter(new_polygon_id_ws) %>%
st_transform(crs = 4326)

ws_removed <- check_polygon_id_v4_v5 %>%
ws_removed <- check_polygon_id_v5_v6 %>%
filter(ws_removed) %>%
st_transform(crs = 4326)

leaflet() %>%
addTiles(group = "OSM (default)") %>%
addPolygons(data = ws_new,
group = "in v5 and not v4 (blue)",
group = "in v6 and not v5 (blue)",
popup = paste("polygon_id_habitatmap:", ws_new$polygon_id_habitatmap, "<br>",
"polygon_id_ws:", ws_new$polygon_id_ws)) %>%
addPolygons(data = ws_removed,
color = "black", group = "in v4 and not v5 (black)",
color = "black", group = "in v5 and not v6 (black)",
popup = paste("polygon_id_habitatmap:", ws_removed$polygon_id_habitatmap, "<br>",
"polygon_id_ws:", ws_removed$polygon_id_ws)) %>%
addLayersControl(
overlayGroups = c("in v5 and not v4 (blue)", "in v4 and not v5 (black)"),
overlayGroups = c("in v6 and not v5 (blue)", "in v5 and not v6 (black)"),
options = layersControlOptions(collapsed = FALSE)
)
```
Expand Down
2 changes: 1 addition & 1 deletion src/generate_watersurfaces_hab/index.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ library(leaflet)

# ISO8601 timestamp to set as fixed value in the GeoPackage
# (to be UPDATED to the actual creation date; at least update for each version):
Sys.setenv(OGR_CURRENT_DATE = "2024-05-15T00:00:00.000Z")
Sys.setenv(OGR_CURRENT_DATE = "2025-01-08T00:00:00.000Z")
# This is used to keep results reproducible, as the timestamp is otherwise
# updated each time.
# Above environment variable OGR_CURRENT_DATE is used by the GDAL driver.
Expand Down
Loading