1 Exploring an internal draft of the watersurfaces data source

Prior to publication, updates of the watersurfaces happen in a MS SQL Server database.

Here we do some checks from R in order to prevent possible problems in a released version of the data source.

Make the database connection and the queries:

con <- inbodb::connect_inbo_dbase("G0102_00_AquaMorf")
query <- sql("SELECT P.CodePlas AS WVLC, 
                  plasnaamalt AS WTRLICHC,
                  NP_ID AS HYLAC,
                  plasnaam AS NAAM, 
                  P.gebied AS GEBIED,
                  P.watertype AS KRWTYPE,
                  P.watertypeAlt AS KRWTYPEA,
                  P.statuswatertype AS KRWTYPES,
                  P.DiepteKlasse AS DIEPKL, 
                  P.connectiviteit11 AS CONNECT, 
                  P.peilbeheer AS PEILBEHEER,
                  shape.STArea() AS OPPWVL,
                  shape.STLength() AS OMTWVL,
                  P.DiepteGem_m AS diepte_gem, 
                  P.DiepteMax_m AS diepte_max, 
                  P.gebiedalternatief AS gebied_alt,
                  connectiviteit AS connect_old,
                  P.jaaraanleg AS aanleg,
                  P.status AS status, 
                  P.globalid AS id_plas,
                  shape.STAsBinary() AS geometry
          FROM AquaMorf.PLAS P
          WHERE GDB_TO_DATE >= '9999' AND status IS NULL")
# show all columns of the database table with:
# tbl(con, I("AquaMorf.PLAS")) |> glimpse()
query_functie <- sql("SELECT PG.gebruiksfunctie AS FUNCTIE,
                      PG.globalid AS id_functie, 
                      PG.CodePlas AS codeplas, 
                      PG.globalid_plas AS id_plas
                      FROM AquaMorf.PLAS_GEBRUIKSFUNCTIE PG
                      WHERE GDB_TO_DATE >= '9999'")

Note that we map variable names of version 1.2 in capitals to database column names. Some other available names have been kept lowercase.

Import the spatial layer:

ws <- read_sf(con, query = query, crs = 31370) |> 
  mutate(across(where(is.character), factor))
glimpse(ws)
## Rows: 93,201
## Columns: 21
## $ WVLC        <fct> ANTBRM0019, ANTBRM0020, ANTBRM0022, ANTBRM0029, ANTBRM0041…
## $ WTRLICHC    <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ HYLAC       <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ NAAM        <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ GEBIED      <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ KRWTYPE     <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ KRWTYPEA    <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ KRWTYPES    <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ DIEPKL      <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ CONNECT     <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ PEILBEHEER  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ OPPWVL      <dbl> 234.78843, 6103.74273, 308.72194, 165.86018, 638.52698, 12…
## $ OMTWVL      <dbl> 79.95916, 366.25567, 79.80889, 49.26253, 105.31879, 318.94…
## $ diepte_gem  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ diepte_max  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ gebied_alt  <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ connect_old <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ aanleg      <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ status      <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ id_plas     <fct> E5BBAEBB-EDF0-41C5-9637-C220BB11864B, 89FDC280-6B61-4DFB-A…
## $ geometry    <POLYGON [m]> POLYGON ((140179.8 200579.8..., POLYGON ((140011.9…

The number of rows is 93201.

Query the values of FUNCTIE that are available for specific watersurfaces:

ws_functie <- tbl(con, query_functie) |> 
  collect() |> 
  mutate(across(where(is.character), factor))

The number of rows is 58.

Sidenote: read_sf() downloads the data, but it is also possible to define a lazy object, execute database-level queries using dplyr verbs on this object (before collecting), and then convert to sf after collecting a filtered result. Example below:

ws_db <- tbl(con, query)
# without conversion to sf
ws_db |> 
  filter(row_number() <= 25) |> 
  select(WVLC, geometry) |> 
  collect()
## # A tibble: 25 × 2
##    WVLC            geometry
##    <chr>             <blob>
##  1 ANTBRM0019   <raw 189 B>
##  2 ANTBRM0020 <raw 2.19 kB>
##  3 ANTBRM0022   <raw 589 B>
##  4 ANTBRM0029   <raw 285 B>
##  5 ANTBRM0041   <raw 525 B>
##  6 ANTBRM0055 <raw 1.25 kB>
##  7 ANTBRM0060   <raw 333 B>
##  8 ANTBRM0064 <raw 1.45 kB>
##  9 ANTBRM0070   <raw 669 B>
## 10 ANTBRM0079 <raw 1.02 kB>
## # ℹ 15 more rows
# one can still convert the binary geometries to WKB; see
# https://hydroecology.net/reading-spatial-data-from-sql-server-without-sf/
ws_db |> 
  filter(row_number() <= 25) |> 
  select(WVLC, geometry) |> 
  collect() |> 
  mutate(geometry = wk::as_wkb(geometry))
## # A tibble: 25 × 2
##    WVLC       geometry                                                          
##    <chr>      <wk_wkb>                                                          
##  1 ANTBRM0019 <POLYGON ((140179.8 200579.8, 140180.5 200581.4, 140175.3 200612.…
##  2 ANTBRM0020 <POLYGON ((140011.9 200747.3, 140010.1 200749.8, 140007.8 200751.…
##  3 ANTBRM0022 <POLYGON ((140599.3 199202.5, 140597 199206.4, 140594.4 199208.4,…
##  4 ANTBRM0029 <POLYGON ((140459.4 200485.9, 140462.4 200486.3, 140466.7 200486.…
##  5 ANTBRM0041 <POLYGON ((139718 200640, 139725.8 200649.3, 139728.1 200652.1, 1…
##  6 ANTBRM0055 <POLYGON ((139611.1 200556, 139608.3 200557.5, 139605.7 200558.5,…
##  7 ANTBRM0060 <POLYGON ((140538.9 200396.9, 140542.1 200399, 140544 200401.3, 1…
##  8 ANTBRM0064 <POLYGON ((140269.1 200595.9, 140268.2 200596.7, 140267.7 200597,…
##  9 ANTBRM0070 <POLYGON ((139980.1 200596.5, 139978 200596.6, 139974.8 200596.7,…
## 10 ANTBRM0079 <POLYGON ((140494.9 200479.4, 140494.1 200480.6, 140492.8 200483,…
## # ℹ 15 more rows
# conversion to sf:
ws_db |> 
  filter(row_number() <= 25) |> 
  select(WVLC, geometry) |> 
  collect() |>
  st_as_sf(crs = 31370)
## Simple feature collection with 25 features and 1 field
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 137940.3 ymin: 193632.9 xmax: 140600.1 ymax: 200825.4
## Projected CRS: BD72 / Belgian Lambert 72
## # A tibble: 25 × 2
##    WVLC                                                                 geometry
##  * <chr>                                                           <POLYGON [m]>
##  1 ANTBRM0019 ((140179.8 200579.8, 140180.5 200581.4, 140175.3 200612.1, 140174…
##  2 ANTBRM0020 ((140011.9 200747.3, 140010.1 200749.8, 140007.8 200751.9, 140006…
##  3 ANTBRM0022 ((140599.3 199202.5, 140597 199206.4, 140594.4 199208.4, 140591.1…
##  4 ANTBRM0029 ((140459.4 200485.9, 140462.4 200486.3, 140466.7 200486.6, 140469…
##  5 ANTBRM0041 ((139718 200640, 139725.8 200649.3, 139728.1 200652.1, 139729.4 2…
##  6 ANTBRM0055 ((139611.1 200556, 139608.3 200557.5, 139605.7 200558.5, 139602.4…
##  7 ANTBRM0060 ((140538.9 200396.9, 140542.1 200399, 140544 200401.3, 140543.9 2…
##  8 ANTBRM0064 ((140269.1 200595.9, 140268.2 200596.7, 140267.7 200597, 140265.4…
##  9 ANTBRM0070 ((139980.1 200596.5, 139978 200596.6, 139974.8 200596.7, 139971.4…
## 10 ANTBRM0079 ((140494.9 200479.4, 140494.1 200480.6, 140492.8 200483, 140490.9…
## # ℹ 15 more rows
DBI::dbDisconnect(con)

1.1 Step-by-step exploration

1.1.1 A summary of the spatial layer

ws %>% 
  st_drop_geometry %>% 
  summary
##          WVLC            WTRLICHC         HYLAC       
##  ANTANT0001:    1   VL22_23  :   21   Min.   : 18003  
##  ANTANT0002:    1   L219_8002:    7   1st Qu.:155089  
##  ANTANT0003:    1   VL24_191 :    6   Median :221078  
##  ANTANT0004:    1   L219_6827:    3   Mean   :210363  
##  ANTANT0005:    1   L219_6910:    3   3rd Qu.:266044  
##  ANTANT0006:    1   (Other)  :  393   Max.   :424108  
##  (Other)   :93195   NA's     :92768   NA's   :88525   
##                      NAAM                              GEBIED     
##  Meerskantpoelen       :   21   Uitkerkse Polder          :  480  
##  Ijsebroeken           :    8   MD Groot Schietveld       :  378  
##  Lokkerse Dammen       :    7   Meetjeslandse Krekengebied:  344  
##  Oude Dijlearm Zennegat:    5   Het Goor-Asbroek          :  135  
##  Oude Schelde Heurne   :    4   Vloethemveld              :  129  
##  (Other)               :  205   (Other)                   : 5120  
##  NA's                  :92951   NA's                      :86615  
##     KRWTYPE         KRWTYPEA           KRWTYPES         DIEPKL     
##  Cb     :  153   -      :  192   definitief:  769   > 6 m  :   38  
##  Zm     :  140   (Zm)   :   60   voorlopig :    1   0 - 2 m:  975  
##  Zs     :   91   Czb    :   59   NA's      :92431   2 - 4 m:   78  
##  Ami-e  :   79   Cb     :   57                      4 - 6 m:   14  
##  Ad     :   59   CFe    :   45                      NA's   :92096  
##  (Other):  248   (Other):  169                                     
##  NA's   :92431   NA's   :92619                                     
##        CONNECT                        PEILBEHEER        OPPWVL         
##  geïsoleerd:  250   aan- en afvoer geregeld:   15   Min.   :      1.5  
##  periodiek :   36   afvoer geregeld        :    9   1st Qu.:     79.1  
##  permanent :   10   geen peilbeheer        :  256   Median :    267.1  
##  NA's      :92905   NA's                   :92921   Mean   :   1748.7  
##                                                     3rd Qu.:    803.1  
##                                                     Max.   :2470433.4  
##                                                                        
##      OMTWVL            diepte_gem      diepte_max                gebied_alt   
##  Min.   :    4.819   Min.   : 0.01   Min.   : 0.05   De Brand         :   43  
##  1st Qu.:   36.903   1st Qu.: 0.25   1st Qu.: 0.60   Laambeekvijvers  :   43  
##  Median :   68.971   Median : 0.45   Median : 1.00   Huttenbeekvijvers:   20  
##  Mean   :  129.790   Mean   : 0.87   Mean   : 1.60   Klotbroek        :    6  
##  3rd Qu.:  130.545   3rd Qu.: 0.85   3rd Qu.: 1.50   Kleine Homo      :    2  
##  Max.   :12930.966   Max.   :11.00   Max.   :28.82   (Other)          :    6  
##                      NA's   :92853   NA's   :92343   NA's             :93081  
##                        connect_old          aanleg       status     
##  DoorstromingMetPeilbeheer   :    9   1939-1971:   13   NA's:93201  
##  DoorstromingZonderPeilbeheer:    7   2017     :   11               
##  geïsoleerd                  :  262   2000     :    9               
##  PeriodiekeDoorstroming      :   27   2004     :    8               
##  NA's                        :92896   2003-2007:    7               
##                                       (Other)  :   74               
##                                       NA's     :93079               
##                                  id_plas     
##  0000D7CF-5C5E-4EAE-B41F-FCF6CA5E91A1:    1  
##  0000E3A6-7FFF-478F-851B-5119689766CD:    1  
##  00047C16-13B9-458D-B1B5-C661880FED89:    1  
##  0007E810-B145-476D-BD7B-84FB8FF97BF3:    1  
##  0007EAA8-3C78-42C8-BF21-79430F68C52D:    1  
##  00087152-8AD3-4E35-A482-3D47FC9EFADF:    1  
##  (Other)                             :93195

Compared to version 1.2:

  • FUNCTION is absent from the query.
  • KRWTYPEA is a new variable.
  • CONNECT has new levels.
  • Note: HYLAC is going to be removed in the official version.
  • Several columns are available (lowercase names in the query) which haven’t been part of the official version yet (except for connect_old, which was CONNECT in previous versions).

Let us look for a few typical errors more systematically.

1.1.2 Are there NA values?

There are plenty of NA’s but only in fields where we expect them.

sapply(ws, function(x) sum(is.na(x)))
##        WVLC    WTRLICHC       HYLAC        NAAM      GEBIED     KRWTYPE 
##           0       92768       88525       92951       86615       92431 
##    KRWTYPEA    KRWTYPES      DIEPKL     CONNECT  PEILBEHEER      OPPWVL 
##       92619       92431       92096       92905       92921           0 
##      OMTWVL  diepte_gem  diepte_max  gebied_alt connect_old      aanleg 
##           0       92853       92343       93081       92896       93079 
##      status     id_plas    geometry 
##       93201           0           0

KRWTYPE / KRWTYPEA / KRWTYPES consistency:

ws |> 
  st_drop_geometry() |> 
  count(KRWTYPE, KRWTYPES) |> 
  print()
## # A tibble: 16 × 3
##    KRWTYPE KRWTYPES       n
##    <fct>   <fct>      <int>
##  1 Ad      definitief    59
##  2 Ai      definitief    42
##  3 Ami     definitief    56
##  4 Ami-e   definitief    79
##  5 Ami-om  definitief     6
##  6 Aw-e    definitief    25
##  7 Aw-om   definitief    16
##  8 Bzl     definitief     6
##  9 C       definitief    17
## 10 Cb      definitief   153
## 11 CFe     definitief    36
## 12 Czb     definitief    44
## 13 Zm      definitief   140
## 14 Zs      definitief    90
## 15 Zs      voorlopig      1
## 16 <NA>    <NA>       92431
ws |> 
  st_drop_geometry() |> 
  filter(!is.na(KRWTYPEA), KRWTYPEA != "-") |> 
  count(KRWTYPE, KRWTYPEA) |> 
  print(n = Inf)
## # A tibble: 26 × 3
##    KRWTYPE KRWTYPEA     n
##    <fct>   <fct>    <int>
##  1 Ad      Ai           8
##  2 Ai      Ami          8
##  3 Ami     Ai          21
##  4 Ami     Cb           7
##  5 Ami-om  Ai           1
##  6 Aw-om   Ami-om       1
##  7 Aw-om   Aw-e         5
##  8 Cb      Ami         30
##  9 Cb      CFe         30
## 10 Cb      Czb         18
## 11 Cb      Zm           2
## 12 CFe     Cb          28
## 13 CFe     Zm           1
## 14 Czb     Cb          18
## 15 Czb     CFe          1
## 16 Czb     Zm          13
## 17 Zm      (Cb)         1
## 18 Zm      (CFe)        2
## 19 Zm      (Czb)        4
## 20 Zm      (Zs)        10
## 21 Zm      Cb           4
## 22 Zm      CFe         14
## 23 Zm      Czb         41
## 24 Zm      Zs          37
## 25 Zs      (Zm)        60
## 26 Zs      Zm          25

One record has KRWTYPES set to ‘voorlopig’:

ws |> filter(KRWTYPES == "voorlopig") |> as.matrix() |> t()
##             1                                     
## WVLC        "ANTHGS0069"                          
## WTRLICHC    "NA"                                  
## HYLAC       NA                                    
## NAAM        "NA"                                  
## GEBIED      "De Elsakker"                         
## KRWTYPE     "Zs"                                  
## KRWTYPEA    "NA"                                  
## KRWTYPES    "voorlopig"                           
## DIEPKL      "0 - 2 m"                             
## CONNECT     "geïsoleerd"                          
## PEILBEHEER  "geen peilbeheer"                     
## OPPWVL      238.6236                              
## OMTWVL      56.67507                              
## diepte_gem  NA                                    
## diepte_max  NA                                    
## gebied_alt  "NA"                                  
## connect_old "geïsoleerd"                          
## aanleg      "NA"                                  
## status      "NA"                                  
## id_plas     "4404EFE0-FE5B-4F09-803F-9F5D31BB4D47"
## geometry    POLYGON ((180500.1 242897.1...

1.1.3 Are there <Null> values?

None:

sapply(ws |> st_drop_geometry(), function(x) sum(as.character(x) == '<Null>', na.rm = TRUE))
##        WVLC    WTRLICHC       HYLAC        NAAM      GEBIED     KRWTYPE 
##           0           0           0           0           0           0 
##    KRWTYPEA    KRWTYPES      DIEPKL     CONNECT  PEILBEHEER      OPPWVL 
##           0           0           0           0           0           0 
##      OMTWVL  diepte_gem  diepte_max  gebied_alt connect_old      aanleg 
##           0           0           0           0           0           0 
##      status     id_plas 
##           0           0

1.1.4 Are there - values?

Yes, in KRWTYPEA, but it’s expected (see further):

sapply(ws |> st_drop_geometry(), function(x) sum(as.character(x) == '-', na.rm = TRUE))
##        WVLC    WTRLICHC       HYLAC        NAAM      GEBIED     KRWTYPE 
##           0           0           0           0           0           0 
##    KRWTYPEA    KRWTYPES      DIEPKL     CONNECT  PEILBEHEER      OPPWVL 
##         192           0           0           0           0           0 
##      OMTWVL  diepte_gem  diepte_max  gebied_alt connect_old      aanleg 
##           0           0           0           0           0           0 
##      status     id_plas 
##           0           0

1.1.5 Are there Zero (0) values?

None:

sapply(ws |> st_drop_geometry(), function(x) sum(as.character(x) == '0', na.rm = TRUE))
##        WVLC    WTRLICHC       HYLAC        NAAM      GEBIED     KRWTYPE 
##           0           0           0           0           0           0 
##    KRWTYPEA    KRWTYPES      DIEPKL     CONNECT  PEILBEHEER      OPPWVL 
##           0           0           0           0           0           0 
##      OMTWVL  diepte_gem  diepte_max  gebied_alt connect_old      aanleg 
##           0           0           0           0           0           0 
##      status     id_plas 
##           0           0

Number of unique values of numeric variable HYLAC:

n_distinct(ws$HYLAC)
## [1] 4646

How many NA values for numeric variable HYLAC?

Many! So the zeroes were replaced by NA.

ws$HYLAC %>% is.na %>% sum
## [1] 88525

1.1.6 Are WVLC codes unique?

No:

ws$WVLC %>% unique %>% length == nrow(ws)
## [1] TRUE

Which codes are duplicated?

ws |> 
  st_drop_geometry() |> 
  count(WVLC) |> 
  filter(n > 1)

1.1.7 Levels for each factor

We can compare the levels for each factor variable with the information given in a draft metadata report.

We do not check NAAM and GEBIED since there are many possible options.

KRWTYPE: the codes are correct but there are more codes mentioned in the metadata report.

levels(ws$KRWTYPE)
##  [1] "Ad"     "Ai"     "Ami"    "Ami-e"  "Ami-om" "Aw-e"   "Aw-om"  "Bzl"   
##  [9] "C"      "Cb"     "CFe"    "Czb"    "Zm"     "Zs"

KRWTYPEA (alternative type): the codes in the spatial layer comply with the metadata report.

levels(ws$KRWTYPEA)
##  [1] "-"      "(Cb)"   "(CFe)"  "(Czb)"  "(Zm)"   "(Zs)"   "Ai"     "Ami"   
##  [9] "Ami-om" "Aw-e"   "Cb"     "CFe"    "Czb"    "Zm"     "Zs"

KRWTYPES (status): the codes in the spatial layer are the same as in the metadata report.

levels(ws$KRWTYPES)
## [1] "definitief" "voorlopig"

DIEPKL: the only codes in the dataset are “0 - 2 m”, “2 - 4 m”, “4 - 6 m” and “> 6 m”; the codes are different in the metadata report.

levels(ws$DIEPKL)
## [1] "> 6 m"   "0 - 2 m" "2 - 4 m" "4 - 6 m"

Notes:

  • in version 1.1, an R chunk in the ‘Windows encoding’ section already reduced the number of levels, which resulted in fewer extra categories. However this was just for exploratory purposes as it was not the purpose to always implement such changes.
  • also in the future we will not rectify this (by default) in reading functions for raw data sources: problems in the data will be returned as-is and should be solved in a future version of the data source. By default we just streamline column names and variable types, we make sure that values referring to NA are effectively returned as NA and we try to avoid some encoding problems.

Because on Windows the ≥/265 character is not well displayed, we will recode it into ‘=>’ (otherwise “≥” in ws$DIEPKL are rendered as “=” in the html output):

levels(ws$DIEPKL) <- gsub(pattern = "\u2265", ">=", levels(ws$DIEPKL))
levels(ws$DIEPKL)
## [1] "> 6 m"   "0 - 2 m" "2 - 4 m" "4 - 6 m"

CONNECT: the codes in the spatial layer are the same as in the metadata report (and they differ from previous official versions).

levels(ws$CONNECT)
## [1] "geïsoleerd" "periodiek"  "permanent"

FUNCTIE: the codes are correct but there are more codes mentioned in the metadata report.

levels(ws_functie$FUNCTIE)
## [1] "duik"            "geen"            "hengelIntensief" "motorrecreatie" 
## [5] "natuur"          "tuin_park"       "veedrenk"        "waterberging"   
## [9] "zachteRecreatie"

And here are the categories in the metadata report for FUNCTIE:

functie toewijzing
natuur doelstelling natuurbehoud
hengelintensief intensief hengelen (met infrastructuur, bepoting of gebruikt voor wedstrijdhengelen)
hengelextensief extensief hengelen (geen infrastructuur, bepoting of wedstrijdhengelen)
jacht jagen
tuin/park esthetisch (verblijfsrecreatie, tuin- en parkvijvers)
vogel waterpartij voor gedomesticeerde watervogels
viskweek opkweken van vis
zwemmen zwemmen
duik duiken
zachterecreatie niet gemotoriseerde waterrecreatie
motorrecreatie gemotoriseerde waterrecreatie
waterberging waterberging ten behoeve van overstromings- of peilbeheer
opslag reservoir voor water (industrie, landbouw, bluswater, waterkracht…)
drinkwater drinkwaterwinning
zuivering (kleinschalige) waterzuivering, infiltratie
bezinking bezinking van proceswater
veedrenk watervoorziening voor vee
geen geen specifieke functie

PEILBEHEER: one code less than in the report.

levels(ws$PEILBEHEER)
## [1] "aan- en afvoer geregeld" "afvoer geregeld"        
## [3] "geen peilbeheer"

1.2 Potential issues

  • There is a duplicated value of WVLC.
  • One record has KRWTYPES as ‘voorlopig’ (after discussion with the authors, this seems to be correct).
  • DIEPKL: codes in the dataset differ from those in the report.
  • KRWTYPE, PEILBEHEER, FUNCTIE: more levels in the metadata report than in the dataset. This is not necessarily a problem.

These will be discussed with the authors of the layers.

1.3 Validity of the geometries

Let’s inspect features with invalid or corrupt geometry:

ws_validity <- st_is_valid(ws)
ws_validity %>% table
## .
## FALSE  TRUE 
##     1 93200
invalid_geoms <- ws[!ws_validity | is.na(ws_validity), ]

Identifying the invalid geometries:

invalid_geoms |> 
  select(WVLC) |> 
  mutate(reason = st_is_valid(invalid_geoms, reason = TRUE)) |> 
  st_drop_geometry()
tm_shape(invalid_geoms) + tm_borders() + tm_facets(by = "WVLC")

The geometry invalidity is the consequence of self-intersecting rings, as a consequence of digitalization errors.

Let’s compare with the same geoms after fixing the self-intersecting rings:

valid_geoms <- st_make_valid(invalid_geoms)
tm_shape(valid_geoms) + tm_borders() + tm_facets(by = "WVLC")

Are all geometries valid now?

all(st_is_valid(valid_geoms))
## [1] TRUE

So this works well; in derived data we will fix these geometries. We might also consider an optional geometry reparation step in read_watersurfaces().

We also check that no empty geometries are present:

all(!is.na(st_dimension(ws$geometry)))
## [1] TRUE

Refer to https://github.com/inbo/n2khab-preprocessing/issues/60 and https://r-spatial.org/r/2017/03/19/invalid.html for more information!

1.4 Let’s plot the watersurfaces as a map

# plot watersurfaces
 p <- ggplot() +
  geom_sf(data = ws, aes(), color = "blue")

# Flanders
sf_vl <- read_admin_areas()

p <- p + 
  geom_sf(data = sf_vl, fill = NA)

print(p)

2 Tidyverse-styled, internationalized column names when using the data source in R

ws %>% colnames %>% cat(sep = "\n")
data source variable data frame variable
WVLC polygon_id
WTRLICHC wfd_code
HYLAC hyla_code
NAAM name
GEBIED area_name
KRWTYPE wfd_type
KRWTYPEA wfd_type_alternative
KRWTYPES wfd_type_certain
DIEPKL depth_class
CONNECT connectivity
FUNCTIE usage
PEILBEHEER water_level_management

3 Other considerations for the R object returned by read_watersurfaces()

  • not uptaking OPPWVL, OMTWVL, SHAPE_Length, SHAPE_Area (area & perimeter are easily calculated etc) – OK
  • sort by polygon_id – OK
  • add translations to long text for wfd_type and connectivity – OK but not by default
  • add translations to long text for usage? – for a later version (as more codes will be used)
  • converting null / 0 values to NA – OK
  • support new variable water_level_management – OK

4 Used environment

  • version: R version 4.4.1 (2024-06-14)
  • os: Linux Mint 21.3
  • system: x86_64, linux-gnu
  • ui: X11
  • language: nl_BE:nl
  • collate: nl_BE.UTF-8
  • ctype: nl_BE.UTF-8
  • tz: Europe/Brussels
  • date: 2024-09-09
  • pandoc: 3.1.11 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/x86_64/ (via rmarkdown)
  • sf GEOS: 3.12.1
  • sf GDAL: 3.8.4
  • sf GDAL_with_GEOS: true
  • sf USE_PROJ_H: true
  • sf PROJ: 9.3.1
  • terra GDAL: 3.8.4
  • terra PROJ: 9.3.1
  • terra GEOS: 3.12.1
Loaded R packages
package loadedversion date source
abind 1.4-5 2016-07-21 CRAN (R 4.2.0)
assertthat 0.2.1 2019-03-21 CRAN (R 4.0.1)
base64enc 0.1-3 2015-07-28 CRAN (R 4.0.2)
bit 4.0.5 2022-11-15 RSPM (R 4.2.0)
bit64 4.0.5 2020-08-30 RSPM (R 4.2.0)
blob 1.2.4 2023-03-17 RSPM (R 4.2.0)
bslib 0.8.0 2024-07-29 RSPM (R 4.4.0)
cachem 1.1.0 2024-05-16 RSPM (R 4.4.0)
class 7.3-22 2023-05-03 RSPM (R 4.2.0)
classInt 0.4-10 2023-09-05 RSPM (R 4.3.0)
cli 3.6.3 2024-06-21 RSPM (R 4.4.0)
codetools 0.2-20 2024-03-31 RSPM (R 4.3.0)
colorspace 2.1-1 2024-07-26 RSPM (R 4.4.0)
crosstalk 1.2.1 2023-11-23 RSPM (R 4.3.0)
curl 5.2.2 2024-08-26 RSPM (R 4.4.0)
DBI 1.2.3 2024-06-02 RSPM (R 4.4.0)
dbplyr 2.5.0 2024-03-19 RSPM (R 4.3.0)
dichromat 2.0-0.1 2022-05-02 RSPM (R 4.2.0)
digest 0.6.37 2024-08-19 RSPM (R 4.4.0)
dplyr 1.1.4 2023-11-17 RSPM (R 4.3.0)
e1071 1.7-14 2023-12-06 RSPM (R 4.3.0)
evaluate 0.24.0 2024-06-10 RSPM (R 4.4.0)
fansi 1.0.6 2023-12-08 RSPM (R 4.3.0)
farver 2.1.2 2024-05-13 RSPM (R 4.4.0)
fastmap 1.2.0 2024-05-15 RSPM (R 4.4.0)
forcats 1.0.0 2023-01-29 RSPM (R 4.2.0)
generics 0.1.3 2022-07-05 RSPM (R 4.2.0)
ggplot2 3.5.1 2024-04-23 RSPM (R 4.3.0)
git2r 0.33.0 2023-11-26 RSPM (R 4.3.0)
git2rdata 0.4.1 2024-09-06 RSPM (R 4.4.0)
glue 1.7.0 2024-01-09 RSPM (R 4.3.0)
gtable 0.3.5 2024-04-22 RSPM (R 4.3.0)
highr 0.11 2024-05-26 RSPM (R 4.4.0)
hms 1.1.3 2023-03-21 RSPM (R 4.2.0)
htmltools 0.5.8.1 2024-04-04 RSPM (R 4.3.0)
htmlwidgets 1.6.4 2023-12-06 RSPM (R 4.3.0)
inbodb 0.0.5 2024-08-14 local
jquerylib 0.1.4 2021-04-26 RSPM (R 4.3.2)
jsonlite 1.8.8 2023-12-04 RSPM (R 4.3.0)
KernSmooth 2.23-24 2024-05-17 RSPM (R 4.4.0)
knitr 1.48 2024-07-07 RSPM (R 4.4.0)
lattice 0.22-6 2024-03-20 RSPM (R 4.3.0)
leafem 0.2.3 2023-09-17 RSPM (R 4.3.0)
leaflet 2.2.2 2024-03-26 RSPM (R 4.3.0)
leafsync 0.1.0 2019-03-05 RSPM (R 4.2.0)
lifecycle 1.0.4 2023-11-07 RSPM (R 4.3.0)
lubridate 1.9.3 2023-09-27 RSPM (R 4.3.0)
lwgeom 0.2-14 2024-02-21 RSPM (R 4.3.3)
magrittr 2.0.3 2022-03-30 RSPM (R 4.2.0)
munsell 0.5.1 2024-04-01 RSPM (R 4.3.0)
n2khab 0.10.1 2024-05-06 https://inbo.r-universe.dev (R 4.4.0)
odbc 1.5.0 2024-06-05 RSPM (R 4.4.0)
pillar 1.9.0 2023-03-22 RSPM (R 4.2.0)
pkgconfig 2.0.3 2019-09-22 CRAN (R 4.0.1)
plyr 1.8.9 2023-10-02 RSPM (R 4.3.0)
png 0.1-8 2022-11-29 RSPM (R 4.2.0)
proxy 0.4-27 2022-06-09 RSPM (R 4.2.0)
purrr 1.0.2 2023-08-10 RSPM (R 4.2.0)
R6 2.5.1 2021-08-19 RSPM (R 4.2.0)
raster 3.6-26 2023-10-14 RSPM (R 4.3.0)
RColorBrewer 1.1-3 2022-04-03 RSPM (R 4.2.0)
Rcpp 1.0.13 2024-07-17 RSPM (R 4.4.0)
remotes 2.5.0 2024-03-17 RSPM (R 4.3.0)
rlang 1.1.4 2024-06-04 RSPM (R 4.4.0)
rmarkdown 2.28 2024-08-17 RSPM (R 4.4.0)
rprojroot 2.0.4 2023-11-05 RSPM (R 4.3.0)
rstudioapi 0.16.0 2024-03-24 RSPM (R 4.3.0)
sass 0.4.9 2024-03-15 RSPM (R 4.3.0)
scales 1.3.0 2023-11-28 RSPM (R 4.3.0)
sessioninfo 1.2.2 2021-12-06 RSPM (R 4.2.0)
sf 1.0-17 2024-09-06 RSPM (R 4.4.1)
sp 2.1-4 2024-04-30 RSPM (R 4.3.0)
stars 0.6-6 2024-07-16 RSPM (R 4.4.0)
stringi 1.8.4 2024-05-06 RSPM (R 4.4.0)
stringr 1.5.1 2023-11-14 RSPM (R 4.3.0)
terra 1.7-78 2024-05-22 RSPM (R 4.4.0)
tibble 3.2.1 2023-03-20 RSPM (R 4.3.0)
tidyr 1.3.1 2024-01-24 RSPM (R 4.3.0)
tidyselect 1.2.1 2024-03-11 RSPM (R 4.3.0)
timechange 0.3.0 2024-01-18 RSPM (R 4.3.0)
tmap 3.3-4 2023-09-12 RSPM (R 4.3.1)
tmaptools 3.1-1 2021-01-19 RSPM (R 4.2.0)
units 0.8-5 2023-11-28 RSPM (R 4.3.0)
utf8 1.2.4 2023-10-22 RSPM (R 4.3.0)
vctrs 0.6.5 2023-12-01 RSPM (R 4.3.0)
viridisLite 0.4.2 2023-05-02 RSPM (R 4.2.0)
withr 3.0.1 2024-07-31 RSPM (R 4.4.0)
wk 0.9.3 2024-09-06 RSPM (R 4.4.0)
xfun 0.47 2024-08-17 RSPM (R 4.4.0)
XML 3.99-0.17 2024-06-25 RSPM (R 4.4.0)
yaml 2.3.10 2024-07-26 RSPM (R 4.4.0)