Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert NASIS database access from RODBC to DBI/odbc #149

Merged
merged 75 commits into from
Mar 23, 2021
Merged

Conversation

brownag
Copy link
Member

@brownag brownag commented Dec 29, 2020

This Pull Request converts all usages of {RODBC} for accessing the NASIS local database to use {DBI}/{odbc}.

There are several cases where the ODBC 11 or 13 drivers available to us on CCE machines may not work correctly for generic queries written against the NASIS database, but all functions and queries built into {soilDB} return correct values.

Why {DBI}?

Notably the conversion to {DBI} allows us to drop in other database drivers, with only minor changes to existing queries. This allows us to utilize the NASIS schema in other "containers." The currently available tools are basic e.g. create a database from your selected set via createStaticNASIS. This uses the RSQLite driver to create file-based SQLite databases. All of the standard NASIS access methods work against those databases, as well as generic queries--which are generally better behaved via the RSQLite driver than they are with ODBC 13 driver + {odbc}. Only minor syntax changes (notably removing RIGHT JOIN and T-SQL specific syntax) were required to support SQLite, and it is possible that other drivers can be implemented.

A static database may be quite convenient for managing "multiple selected sets" that may be project specific and generally facilitates construction of self-contained databases using the NASIS data model.


RE: #145 #146

  • Check if any commits can/need to be cherry-picked off nasisLITE branch.

  • results from old == new?

  • syntax for dbConnectNASIS and dbQueryNASIS

    • function names and aliases (added NASIS() alias for dbConnectNASIS)
    • vectorization over multiple queries (dbQueryNASIS now supports named vector of queries for q, returning list result)
  • investigate stability of open-ended queries e.g. SELECT * FROM foo -- are invalid descriptor errors a side effect of our older MSSQL driver?

@brownag brownag requested a review from smroecker December 29, 2020 19:45
@dylanbeaudette
Copy link
Member

Very cool, thanks for doing all of this work. I'm sure you have tried this, but I have to ask: results from old == new?

@brownag
Copy link
Member Author

brownag commented Dec 29, 2020

This is a draft PR for commits from 2 branches that hit many functions so I have not specifically tested for equivalence e.g. with {daff} of objects produced by the different branches. That certainly could be done before this is marked ready for review, but I was not planning on doing that this week. That idea fits well with the needs we have for improving testing of NASIS functionality.

Stephen asked me if/when I was going to merge this so I created the PR to inspect and harmonize the changes that cannot be merged from nasisLITE branch.

Please clone and run tests on your local if you are curious. I have actually not done that -- for this PR -- on my government machine. The remote tests pass. Results should be equivalent. I don't think anything was changed that would affect object contents. But as with all NASIS things testing is nuanced and sort of depends on your selected set.

@dylanbeaudette dylanbeaudette self-requested a review December 30, 2020 05:31
@dylanbeaudette
Copy link
Member

I can help with testing by creating some before / after datasets. Changes appear superficial, but we are switching to an entirely new DB abstraction layer. I'm particularly interested in how long text fields will be handled--most of the interpretation engine relies on extracting XML from "long" text fields.

@brownag
Copy link
Member Author

brownag commented Dec 30, 2020

Correct me if I am wrong -- but the only methods from soilDB currently used by the interpretation engine are SDA_query and uncode and parseWebReport -- Is there a new version?

I certainly will not close or merge this without verifying that all of the things identified previously in #146 are covered.

@dylanbeaudette
Copy link
Member

The interpretation engine code (https://github.com/ncss-tech/interpretation-engine) is in flux, but the extraction of rules, evaluations, and properties requires interacting with the local NASIS client. That said, I forgot that it uses its own ODBC connection and not the soilDB-specific connection details. So, I'll manually move this over to that code base after this PR is complete. In short, false alarm.

@brownag
Copy link
Member Author

brownag commented Dec 30, 2020

Thanks @dylanbeaudette for testing and the comments.

Another thing I would like you both to consider here is syntax for the new methods dbConnectNASIS and dbQueryNASIS.

These methods are optimized for "one-liners" against the local NASIS db. I considered an even shorter alias for dbConnectNASIS that was just NASIS()

dbQueryNASIS(dbConnectNASIS(), "SELECT * from foo")

By default, dbQueryNASIS closes the connection after the query. I observed that the vast majority of our connections were used for sending just a single query and that many people, myself included, forget to close connections.

To keep the connection open, you set close = FALSE. This is used in e.g. get_extended_data_from_NASIS_db where a sequence of queries can be executed using same connection. Open to feedback on function/argument names. A cool idea would be to allow for list input (i.e. multiple queries) and list output, keeping the backward-compatible result for single queries.

@brownag brownag added the NASIS-local This tag is used for pull requests, issues, discussions etc. for soilDB local NASIS functions label Dec 30, 2020
brownag added a commit that referenced this pull request Jan 4, 2021
@brownag
Copy link
Member Author

brownag commented Jan 5, 2021

nasisDBI branch is passing local tests and R CMD check w/ selected set containing diverse pedon/vegplot data. Local NASIS ODBC connection for relevant fetch/get methods via DBI.

misc/validate-NASIS-results.R shows semi-automated checking of exported methods by file name / method name. I used it to fix the above things (1f872e9, e655a7d) and create some new unit tests (c150687)

@dylanbeaudette
Copy link
Member

I plan on old / new testing tomorrow AM with a large pile of components / pedons. The existing tests look great.

@brownag
Copy link
Member Author

brownag commented Jan 5, 2021

Great! I reorganized the checklist in original post a little bit.

I am going to move two items to a new issue, as more generalized workflows for testing and validating NASIS functionality will follow after this PR is merged and a few more puzzle pieces are put into place.

  • .Rda stored as artifact of R CMD check Action?
  • Presence of NASIS-local tag triggers comparison against master

@brownag
Copy link
Member Author

brownag commented Jan 6, 2021

With most recent merged PR, this type of soilDB syntax is now possible.

library(soilDB)

nasislite_path <- "C:/Geodata/soils/NASIS-data.sqlite"

res <- dbQueryNASIS(dbConnectNASIS(nasislite_path), "SELECT geomfname, geomfiid, geomftiidref FROM geomorfeat")

head(res)
#>       geomfname geomfiid geomftiidref
#> 1          alas      122            1
#> 2  alluvial fan        2            1
#> 3 alluvial flat        3            1
#> 4     anticline      123            1
#> 5         arete        4            1
#> 6        arroyo        6            1

Next is to refine the "dump" method that writes the relevant tables from selected set / local NASIS to .sqlite file.

Above works with the pedon snapshot that is floating around from 2018 for one-off queries (does not contain geomorfeat though!)-- which was the impetus for me developing all of this. Note there are several issues with it compared to the official NASIS schema -- truncation of column names to 16 characters, uppercase/lowercase/underscores, no need for metadata uncoding etc.

The nasisLITE branch "fixed" or at least patched those things to make it work, but this branch will ultimately contain a more general solution that works seamlessly with the standard NASIS data model.

@brownag
Copy link
Member Author

brownag commented Jan 6, 2021

Or, perhaps one is into {dplyr}/{dbplyr} style in their scripts...

library(soilDB)
library(dplyr, warn.conflicts = FALSE)
#> Warning: package 'dplyr' was built under R version 4.0.3

nasislite_path <- "C:/Geodata/soils/NASIS-data.sqlite"
con <- dbConnectNASIS(nasislite_path) 

con %>%
  tbl("geomorfeat") %>% 
  select(everything()) %>% # you can do something more complicated here
  collect() # this executes the query
#> # A tibble: 732 x 3
#>    geomfname        geomfiid geomftiidref
#>    <chr>               <int>        <int>
#>  1 alas                  122            1
#>  2 alluvial fan            2            1
#>  3 alluvial flat           3            1
#>  4 anticline             123            1
#>  5 arete                   4            1
#>  6 arroyo                  6            1
#>  7 ash flow              124            1
#>  8 atoll                 125            1
#>  9 avalanche chute       126            1
#> 10 avalanche debris      147            1
#> # ... with 722 more rows

Created on 2021-01-06 by the reprex package (v0.3.0)

@brownag brownag linked an issue Jan 16, 2021 that may be closed by this pull request
brownag added a commit that referenced this pull request Jan 29, 2021
* Method for cloning local NASIS tables into static SQLite file #149

* local_NASIS_defined: add static_path arg and use RSQLite::dbCanConnect

* Add selected set and static path to getHzErrorsNASIS

* get_cosoilmoist / get_vegplot should use selected set argument

* Special handling for local_NASIS_defined (does not take SS arg)

* Add static_path to local_NASIS_defined
@brownag
Copy link
Member Author

brownag commented Mar 18, 2021

See https://github.com/ncss-tech/soilDB/blob/nasisDBI/misc/run-all-NASIS-get-methods.R for latest testing suite for NASIS "get" and "fetch" methods

@dylanbeaudette
Copy link
Member

Testing a SS based on all of the non-additional MU, representative DMU, component pedons + site records. Noticed:

> cm <- get_comonth_from_NASIS_db(fill = TRUE)
Error in new_result(connection@ptr, statement, immediate) : 
  external pointer is not valid

Also noticed that pedon records have some coded (?) values:
image

@brownag
Copy link
Member Author

brownag commented Mar 20, 2021

get_comonth_from_NASIS_db

Thanks! I didn't test with the fill=TRUE argument! The connection needs to be left open to run the second query that the fill arg uses.

As far as the coding/decoding from NASIS domains... I thought that was working. Will have to look closer

@dylanbeaudette
Copy link
Member

Thanks. Will test again tonight and report back.

@brownag
Copy link
Member Author

brownag commented Mar 20, 2021

Great, thanks. When I saw what was going on had to try and fix.

I was wrong-- the necessary stuff to trigger correct "uncoding" wasn't uniformly in place. Further, in the absence of a NASIS database (i.e. on my home machine where I did the above commits) the uncode() routine would fail to get the right data for db="NASIS" domains. Now it should properly pull from the dsn if specified. The horizon data uncoding was just a mistake on my part, using wrong variable name (a copy that was never used).

Copy link
Member

@dylanbeaudette dylanbeaudette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fully tested using the CA792 legend:

  • MU/DMU/components/component month
  • Site/Pedon

Tested:

  • before / after with {daff}
  • re-made DMU summary reports and side-by-side comparisons

Thanks!

@brownag
Copy link
Member Author

brownag commented Mar 22, 2021

Note that {DBI} is in Imports and the driver packages {odbc} and {RSQLite} are in suggests.

Updated the following resources with code to get DBI/odbc/RSQLite dependencies.

install.packages(c("DBI","odbc","RSQLite"), dependencies = TRUE)

Note that since these are not in Imports/Suggests for the current version of soilDB on CRAN our default heuristic of having you install the CRAN package first to get dependencies before installing the GH version will not "work."

Users will need to manually install the above 3 packages or use remotes::install_github('ncss-tech/soilDB', dependencies=TRUE) to get dependencies from GitHub DESCRIPTION file. As usual, do the latter in a fresh session, without any packages loaded.

The following resources have been updated to ensure the packages will be available:

@brownag brownag merged commit d9d057d into master Mar 23, 2021
@brownag brownag deleted the nasisDBI branch March 30, 2021 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NASIS-local This tag is used for pull requests, issues, discussions etc. for soilDB local NASIS functions
Projects
None yet
2 participants