Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

include synonyms in odbcListObjects() output #773

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from
Draft

Conversation

simonpcouch
Copy link
Collaborator

@simonpcouch simonpcouch commented Mar 7, 2024

Closes #221.

This PR modifies/adds SQL Server methods for dbListTables(), dbExistsTable(), and odbcListObjects() that include synonyms in output. This also means that synonyms will show up in the Connections pane.

Note that this PR makes no changes to dbListTables(), which is noted in the original issue.

[EDITs: update PR scope]

INNER JOIN sys.databases AS DB
ON Sch.principal_id = DB.database_id
WHERE DB.name = ? AND Sch.name = ?
AND OBJECTPROPERTY(Object_ID(Syn.base_object_name), 'IsTable') = 1;",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I've opted to only include synonyms whose base object is a table, and refer to that synonym as a "table" in the output. This means that synonyms (that are tables) will be previewable through the Connections pane, but also feels a bit hacky. From what I can tell, the drop-down based in the pane based on odbcListFields() also doesn't work (but doesn't cause errors/crashes).

Instead, we could opt not to include this table and return the type as synonym. This means that there would be entries in the Connections pane but no previews.

@simonpcouch simonpcouch requested a review from hadley March 7, 2024 23:04
@ThomasSoeiro
Copy link

@simonpcouch
Hi,
I have the same issue with an Oracle database. Should I open a new issue or can it be taken care of here?
Thanks!

@simonpcouch
Copy link
Collaborator Author

@ThomasSoeiro A separate issue is good! :)

@hadley
Copy link
Member

hadley commented Mar 13, 2024

Hmmm, I think I'd be inclined to include these in dbListTables(), using our principle that if you can do SELECT foo then dbExistsTable(con, "foo") should be true, and "foo" should be included in the output of dbListTables().

@simonpcouch
Copy link
Collaborator Author

simonpcouch commented Mar 26, 2024

Synonyms are now supported with dbListTables() and dbExistsTable().

With `main`
library(DBI)
library(odbc)
con <- dbConnect(odbc(), dsn = "MicrosoftSQLServer", uid = "SA",
               pwd = Sys.getenv("sqlServerPass"))

dbExecute(con, "create schema odbc")
#> [1] 0
dbExecute(con, "create table odbc.test (x int)")
#> [1] 0

# confirm that we can find the table:
odbcListObjects(con, catalog = "master", schema = "odbc")
#>   name  type
#> 1 test table
dbListTables(con, catalog = "master", schema = "odbc")
#> [1] "test"

# make a synonym and show that it can't be found:
dbExecute(con, "create synonym odbc.test2 for odbc.test")
#> [1] 0
odbcListObjects(con, catalog = "master", schema = "odbc")
#>   name  type
#> 1 test table
dbListTables(con, catalog = "master", schema = "odbc")
#> [1] "test"
With this PR

With this PR:

library(DBI)
library(odbc)
con <- dbConnect(odbc(), dsn = "MicrosoftSQLServer", uid = "SA",
                 pwd = Sys.getenv("sqlServerPass"))

dbExecute(con, "create schema odbc")
#> [1] 0
dbExecute(con, "create table odbc.test (x int)")
#> [1] 0

# confirm that we can find the table:
odbcListObjects(con, catalog = "master", schema = "odbc")
#>   name  type
#> 1 test table
dbListTables(con, catalog = "master", schema = "odbc")
#> [1] "test"

# make a synonym and show that it CAN be found:
dbExecute(con, "create synonym odbc.test2 for odbc.test")
#> [1] 0
odbcListObjects(con, catalog = "master", schema = "odbc")
#>    name  type
#> 1  test table
#> 2 test2 table
dbListTables(con, catalog = "master", schema = "odbc")
#> [1] "test"  "test2"

My biggest concern at this point is performance, esp. in cases when databases have many synonyms. Should this feature be gated behind an argument?

had issues passing these as parameters using something like `WHERE (? IS NULL OR DB.name = ?) AND (? IS NULL OR Sch.name = ?)`
R/driver-sql-server.R Outdated Show resolved Hide resolved
@@ -73,11 +73,11 @@ setMethod("dbExistsTable", c("Microsoft SQL Server", "character"),
stopifnot(length(name) == 1)
if (isTempTable(conn, name, ...)) {
query <- paste0("SELECT OBJECT_ID('tempdb..", name, "')")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's worth checking to see if this works for synonyms too? (without the tempdb..). If so, that would substantially simplify this function, and maybe make it simpler?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonpcouch Did you see this comment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did! Just haven't made a moment to come back to this PR to address this comment and run some quick benchmarks to see how it affects performance. Will re-request review then, likely tomorrow!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, I though you had re-requested a review, but I must've been misreading my notifications.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was almost able to make this work with:

    dots <- list(...)
    name_loc <- paste0(
      c(dots[["catalog_name"]], dots[["schema_name"]], name),
      collapse = "."
    )
    return(!is.na(dbGetQuery(conn, paste0("SELECT OBJECT_ID('", name_loc, "')"))[[1]]))

The only issue is that OBJECT_ID() seems to only be able to work with fully qualified names. i.e.

library(DBI)
library(odbc)
con <- dbConnect(odbc(), dsn = "MicrosoftSQLServer", uid = "SA",
                 pwd = Sys.getenv("sqlServerPass"))

dbExecute(con, "create schema odbc")
#> [1] 0
dbExecute(con, "create table odbc.test (x int)")
#> [1] 0
dbExecute(con, "create synonym odbc.test2 for odbc.test")
#> [1] 0

# when fully qualified, works fine:
dbExistsTable(con, SQL("master.odbc.test2"))
#> [1] TRUE

# but not without schema/catalog specified
"test2" %in% dbListTables(con)
#> [1] TRUE
dbExistsTable(con, "test2")
#> [1] FALSE

Created on 2024-04-10 with reprex v2.1.0

Specifically, this triggered this DBItest test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm, is dbListTables() correct here? i.e. SELECT * FROM test2 won't work (IIUC) because it's not in the current schema. So should it actually be listed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:thinkies: The DBI docs for the generic read:

dbListTables() returns a character vector that enumerates all tables and views in the database.

so... I think it's correct that dbListTables() would list test2.


Some statements:

  1. DBI docs: dbListTables() returns a character vector that enumerates all tables and views in the database.
  2. DBItest, linked above: For all tables listed by dbListTables(), dbExistsTable() should return TRUE.
  3. Us, on slack: maybe the principle is that if SELECT * FROM foo works then dbExistsTable(con, "foo") should return TRUE?

So dbListTables() and dbExistsTable() should agree, and dbListTables() seems to be doing the right thing. The connection to SELECT * FROM foo doesn't seem to hold up, at least for SQL Server.

FWIW, with this PR:

library(DBI)
library(odbc)
con <- dbConnect(odbc(), dsn = "MicrosoftSQLServer", uid = "SA",
                 pwd = Sys.getenv("sqlServerPass"))


"test" %in% dbListTables(con)
#> [1] TRUE
dbExistsTable(con, "test")
#> [1] TRUE
dbGetQuery(con, "SELECT * FROM test")
#> Error in eval(expr, envir, enclos): nanodbc/nanodbc.cpp:1711: 00000
#> [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid object name 'test'.

"test2" %in% dbListTables(con)
#> [1] TRUE
dbExistsTable(con, "test2")
#> [1] TRUE
dbGetQuery(con, "SELECT * FROM test2")
#> Error in eval(expr, envir, enclos): nanodbc/nanodbc.cpp:1711: 00000
#> [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid object name 'test2'.

Created on 2024-04-10 with reprex v2.1.0

With CRAN, both "test2" calls are FALSE.

Copy link
Member

@hadley hadley Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that the DBI docs are underspecified here — I think they really should read a "character vector that enumerates all tables and views in the database for the active schema/catalog.".

So maybe we're blowing out the scope of this PR, but I think we should narrow down this behaviour because I think that we should be returning FALSE for the test calls too.

But might be easiest to discuss this live on a call with some examples?

Comment on lines 258 to 260
if (!has_catalog & !has_schema) {
return(paste0(res, filter_is_table))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!has_catalog & !has_schema) {
return(paste0(res, filter_is_table))
}

I think you can just remove this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, went ahead and removed this at the time but just ran into why it was there. If !has_catalog & !has_schema then the filter has to start with WHERE rather than AND. I'll go ahead and add a test there so it's more obvious in the future. da86cc4.

R/driver-sql-server.R Outdated Show resolved Hide resolved
edits to `dbExistsTable()` should resolve possible slowdown when the non-synonym table does exist
Comment on lines 251 to 253
catalog = DB.name,
[schema] = Sch.name,
name = Syn.name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be able to simplify this query by using the DB_ID() and SCHEMA_ID() functions to convert schema/database names directly to IDs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was able to rework using SCHEMA_NAME() and respecting synonyms(catalog_name), so no more joins and a good bit simpler now.

@@ -73,11 +73,11 @@ setMethod("dbExistsTable", c("Microsoft SQL Server", "character"),
stopifnot(length(name) == 1)
if (isTempTable(conn, name, ...)) {
query <- paste0("SELECT OBJECT_ID('tempdb..", name, "')")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm, is dbListTables() correct here? i.e. SELECT * FROM test2 won't work (IIUC) because it's not in the current schema. So should it actually be listed?

@simonpcouch
Copy link
Collaborator Author

simonpcouch commented Apr 10, 2024

Here are those benchmarks with the PR as-is (EDIT: updated timings below):

Elapsed times to the 3 affected functions with 0 to 10000 synonyms present. One column of plots gives CRAN timings and the other giving dev, showing a drastic increase in timings for the dev package.

..where the values to the left of x = 1 are x = 0 (with a small shift for log(0)). "Default" objects means what's available in a relatively fresh SQL Server Docker instance—length(dbListTables(con)) is 618. Timings for dbExistsTable() are for a table that doesn't exist, so the synonyms_query kicks in. Generally, doesn't look great.🏄

Will look into your most recent comments.

Source
 library(tidyverse)
 
 # run the following, once with cran odbc and once with dev ------------------------------
 library(DBI)
 library(odbc)
 con <- dbConnect(odbc(), dsn = "MicrosoftSQLServer", uid = "SA",
                  pwd = Sys.getenv("sqlServerPass"))
 
 dbExecute(con, "create schema odbc")
 dbExecute(con, "create table odbc.test (x int)")
 
 n_synonyms <- c(0, 10^c(0:4))
 timings <- 
   data.frame(
     n = numeric(0), 
     odbcList = numeric(0), 
     dbList = numeric(0), 
     dbExists = numeric(0)
   )
 
 for (n_synonym in n_synonyms) {
   for (n in seq_len(n_synonym)) {
     dbExecute(
       con, 
       SQL(paste0("create synonym odbc.test", n, " for odbc.test"))
     )
   }
   
   for (i in 1:3) {
     timings <- rbind(
       timings,
       data.frame(
         n = n_synonym, 
         odbcList = system.time(odbcListObjects(con))[["elapsed"]], 
         dbList = system.time(dbListTables(con))[["elapsed"]], 
         dbExists = system.time(dbExistsTable(con, "test0"))[["elapsed"]],
         version = ifelse(packageVersion("odbc") == "1.4.2", "cran", "dev")
       )
     )
   }

   for (n in seq_len(n_synonym)) {
     dbExecute(
       con, 
       SQL(paste0("drop synonym odbc.test", n))
     )
   }
 }
 
 # write to the respective .csv
 # write_csv(timings, file = "dev_timings.csv")
 # write_csv(timings, file = "cran_timings.csv")


 # then, reading in and plotting --------------------------------------------------------------
 dev_timings <- read_csv("dev_timings.csv")
 cran_timings <- read_csv("cran_timings.csv")
 
 
 bind_rows(dev_timings, cran_timings) %>%
   pivot_longer(cols = c(odbcList, dbList, dbExists), names_to = "fn") %>%
   mutate(n = n + .1) %>%
   ggplot() +
   geom_point(aes(x = n, y = value)) +
   facet_grid(rows = vars(fn), cols = vars(version)) +
   scale_x_log10() +
   labs(
     x = "Number of Synonyms", 
     y = "Time",
     caption = "Only 'default' objects + 1 0-row table, 3 calls per (n, fn) pair"
   )

since each `sys.synonyms` references only the synonyms that live inside of the database/catalog it's inside of, only query the (fully qualified, if not the one from the active database) `sys.synonyms` in the `catalog_name`. if not `catalog_name` is specified, respect the active database.
@simonpcouch
Copy link
Collaborator Author

simonpcouch commented Apr 11, 2024

Updated benchmarks:

Screenshot 2024-04-11 at 2 00 03 PM

Definitely better than before but not negligible. I haven't seen enough real-world DBs to know how close to "worst-case" this is.

@simonpcouch simonpcouch requested a review from hadley April 11, 2024 19:38
@simonpcouch
Copy link
Collaborator Author

Let's table this PR until at least after the upcoming release. Will mark as draft for now.

@simonpcouch simonpcouch marked this pull request as draft May 10, 2024 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Have synonyms show-up for Microsoft SQL Server dbListTables
3 participants