-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert NASIS database access from RODBC to DBI/odbc #149
Conversation
Very cool, thanks for doing all of this work. I'm sure you have tried this, but I have to ask: results from old == new? |
This is a draft PR for commits from 2 branches that hit many functions so I have not specifically tested for equivalence e.g. with {daff} of objects produced by the different branches. That certainly could be done before this is marked ready for review, but I was not planning on doing that this week. That idea fits well with the needs we have for improving testing of NASIS functionality. Stephen asked me if/when I was going to merge this so I created the PR to inspect and harmonize the changes that cannot be merged from nasisLITE branch. Please clone and run tests on your local if you are curious. I have actually not done that -- for this PR -- on my government machine. The remote tests pass. Results should be equivalent. I don't think anything was changed that would affect object contents. But as with all NASIS things testing is nuanced and sort of depends on your selected set. |
I can help with testing by creating some before / after datasets. Changes appear superficial, but we are switching to an entirely new DB abstraction layer. I'm particularly interested in how long text fields will be handled--most of the interpretation engine relies on extracting XML from "long" text fields. |
Correct me if I am wrong -- but the only methods from soilDB currently used by the interpretation engine are I certainly will not close or merge this without verifying that all of the things identified previously in #146 are covered. |
The interpretation engine code (https://github.com/ncss-tech/interpretation-engine) is in flux, but the extraction of rules, evaluations, and properties requires interacting with the local NASIS client. That said, I forgot that it uses its own ODBC connection and not the soilDB-specific connection details. So, I'll manually move this over to that code base after this PR is complete. In short, false alarm. |
Thanks @dylanbeaudette for testing and the comments. Another thing I would like you both to consider here is syntax for the new methods These methods are optimized for "one-liners" against the local NASIS db. I considered an even shorter alias for
By default, To keep the connection open, you set |
nasisDBI branch is passing local tests and R CMD check w/ selected set containing diverse pedon/vegplot data. Local NASIS ODBC connection for relevant misc/validate-NASIS-results.R shows semi-automated checking of exported methods by file name / method name. I used it to fix the above things (1f872e9, e655a7d) and create some new unit tests (c150687) |
I plan on old / new testing tomorrow AM with a large pile of components / pedons. The existing tests look great. |
Great! I reorganized the checklist in original post a little bit. I am going to move two items to a new issue, as more generalized workflows for testing and validating NASIS functionality will follow after this PR is merged and a few more puzzle pieces are put into place.
|
With most recent merged PR, this type of soilDB syntax is now possible. library(soilDB)
nasislite_path <- "C:/Geodata/soils/NASIS-data.sqlite"
res <- dbQueryNASIS(dbConnectNASIS(nasislite_path), "SELECT geomfname, geomfiid, geomftiidref FROM geomorfeat")
head(res)
#> geomfname geomfiid geomftiidref
#> 1 alas 122 1
#> 2 alluvial fan 2 1
#> 3 alluvial flat 3 1
#> 4 anticline 123 1
#> 5 arete 4 1
#> 6 arroyo 6 1 Next is to refine the "dump" method that writes the relevant tables from selected set / local NASIS to .sqlite file. Above works with the pedon snapshot that is floating around from 2018 for one-off queries (does not contain The nasisLITE branch "fixed" or at least patched those things to make it work, but this branch will ultimately contain a more general solution that works seamlessly with the standard NASIS data model. |
Or, perhaps one is into {dplyr}/{dbplyr} style in their scripts... library(soilDB)
library(dplyr, warn.conflicts = FALSE)
#> Warning: package 'dplyr' was built under R version 4.0.3
nasislite_path <- "C:/Geodata/soils/NASIS-data.sqlite"
con <- dbConnectNASIS(nasislite_path)
con %>%
tbl("geomorfeat") %>%
select(everything()) %>% # you can do something more complicated here
collect() # this executes the query
#> # A tibble: 732 x 3
#> geomfname geomfiid geomftiidref
#> <chr> <int> <int>
#> 1 alas 122 1
#> 2 alluvial fan 2 1
#> 3 alluvial flat 3 1
#> 4 anticline 123 1
#> 5 arete 4 1
#> 6 arroyo 6 1
#> 7 ash flow 124 1
#> 8 atoll 125 1
#> 9 avalanche chute 126 1
#> 10 avalanche debris 147 1
#> # ... with 722 more rows Created on 2021-01-06 by the reprex package (v0.3.0) |
* Method for cloning local NASIS tables into static SQLite file #149 * local_NASIS_defined: add static_path arg and use RSQLite::dbCanConnect * Add selected set and static path to getHzErrorsNASIS * get_cosoilmoist / get_vegplot should use selected set argument * Special handling for local_NASIS_defined (does not take SS arg) * Add static_path to local_NASIS_defined
…frame with NULL data
See https://github.com/ncss-tech/soilDB/blob/nasisDBI/misc/run-all-NASIS-get-methods.R for latest testing suite for NASIS "get" and "fetch" methods |
Testing a SS based on all of the non-additional MU, representative DMU, component pedons + site records. Noticed:
|
Thanks! I didn't test with the As far as the coding/decoding from NASIS domains... I thought that was working. Will have to look closer |
Thanks. Will test again tonight and report back. |
Great, thanks. When I saw what was going on had to try and fix. I was wrong-- the necessary stuff to trigger correct "uncoding" wasn't uniformly in place. Further, in the absence of a NASIS database (i.e. on my home machine where I did the above commits) the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fully tested using the CA792 legend:
- MU/DMU/components/component month
- Site/Pedon
Tested:
- before / after with {daff}
- re-made DMU summary reports and side-by-side comparisons
Thanks!
Note that {DBI} is in Imports and the driver packages {odbc} and {RSQLite} are in suggests. Updated the following resources with code to get DBI/odbc/RSQLite dependencies. install.packages(c("DBI","odbc","RSQLite"), dependencies = TRUE) Note that since these are not in Imports/Suggests for the current version of soilDB on CRAN our default heuristic of having you install the CRAN package first to get dependencies before installing the GH version will not "work." Users will need to manually install the above 3 packages or use The following resources have been updated to ensure the packages will be available:
|
This Pull Request converts all usages of {RODBC} for accessing the NASIS local database to use {DBI}/{odbc}.
There are several cases where the ODBC 11 or 13 drivers available to us on CCE machines may not work correctly for generic queries written against the NASIS database, but all functions and queries built into {soilDB} return correct values.
Why {DBI}?
Notably the conversion to {DBI} allows us to drop in other database drivers, with only minor changes to existing queries. This allows us to utilize the NASIS schema in other "containers." The currently available tools are basic e.g. create a database from your selected set via
createStaticNASIS
. This uses theRSQLite
driver to create file-based SQLite databases. All of the standard NASIS access methods work against those databases, as well as generic queries--which are generally better behaved via the RSQLite driver than they are with ODBC 13 driver + {odbc}. Only minor syntax changes (notably removing RIGHT JOIN and T-SQL specific syntax) were required to support SQLite, and it is possible that other drivers can be implemented.A static database may be quite convenient for managing "multiple selected sets" that may be project specific and generally facilitates construction of self-contained databases using the NASIS data model.
RE: #145 #146
Check if any commits can/need to be cherry-picked off nasisLITE branch.
results from old == new?
master
branch results of running [defined] methods (savetest
object frommisc/validate-NASIS-results.Rhttps://github.com/ncss-tech/soilDB/blob/nasisDBI/misc/run-all-NASIS-get-methods.R, then checkout nasisDBI branch and compare)syntax for
dbConnectNASIS
anddbQueryNASIS
NASIS()
alias fordbConnectNASIS
)dbQueryNASIS
now supports named vector of queries forq
, returning list result)investigate stability of open-ended queries e.g.
SELECT * FROM foo
-- are invalid descriptor errors a side effect of our older MSSQL driver?