Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically escape table names in DBI interface #618

Merged
merged 10 commits into from
Dec 11, 2023

Conversation

detule
Copy link
Collaborator

@detule detule commented Nov 18, 2023

This is a solution to the the issue reported in #591

In particular, add an exact boolean argument to odbcConnectionTables and odbcConnectionColumns. Users should set this to TRUE to express that any non-null identifier arguments are to be matched exactly. This may lead to performance improvements for some backends.

However, this should be considered to be optionally implemented, and only as a way to reap any performance benefits from matching exactly, rather than allowing pattern value arguments.

@detule detule mentioned this pull request Nov 18, 2023
@detule detule requested a review from hadley November 18, 2023 20:00
R/Connection.R Outdated Show resolved Hide resolved
R/db.R Outdated Show resolved Hide resolved
R/db.R Outdated Show resolved Hide resolved
R/Table.R Outdated Show resolved Hide resolved
R/Connection.R Show resolved Hide resolved
R/utils.R Outdated Show resolved Hide resolved
hadley added a commit that referenced this pull request Nov 22, 2023
Providing `I()` as an escape hatch if you do want to use pattern matching. Fixes #618
@hadley hadley changed the title snowflake: odbcConnectionTables|Columns: Performance improvements Automatically escape table names in DBI interface Nov 27, 2023
R/Connection.R Outdated
#' @param ... additional parameters to methods
#' @param exact Se to TRUE if any non-null identifier arguments are to be interpreted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my only remaining question is if we want to use exact = TRUE (or maybe escape = TRUE or pattern = FALSE) or I(). The advantage of an argument is that it makes it quite clear in the function signature; the advantage of I() is that it keeps the two concerns very closely coupled.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question:

Can go either way on this one, but maybe slightly in favor of the argument because the user may be interacting with the API by passing in a DBI::Id or DBI::SQL argument.

Haven't thought too hard about how to pass the I() through those containers / feels like there is a way but it might start to stretch the proposition.

R/utils.R Outdated

# Will iterate over charsToEsc argument and for each:
# will escape any un-escaped occurance in `x`.
escapeChars <- function(x, charsToEsc = c("_"), escChar ="\\\\") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to call this escapePattern() and hard code it to escape % and _.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Hadley:

Renamed it - but I am not sold on escaping "%". It seems to me that it is so infrequently used as an identifier that if someone is passing "%" sign they are trying to use a wild card ( but may have forgotten to wrap the input appropriately ).

I keep thinking, should we escape the percent character in odbcConnectionColumns(conn, catalog_name = "my_cat", schema_name = "my_sch", name = "my_tbl", column_name = "%", exact = TRUE)?

Anyway, I can be moved here just let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My feeling is that we want to make the leaky abstraction on top of the ODBC API clear, so there are two choices, either a pattern match (the default) or a literal string.

I think what you're recognising in the API is that it's a bit weird to have one parameter that controls whether or not three other arguments are escaped. That's one advantage of the interface that uses I(), but that would mean making escape = TRUE the default, which I know you're not keen on.

So only escaping _ seems like a pragmatic choice and I'm fine with it if you want to revert back to your original idea.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - thanks for continuing to think about this - good points about pros/cons. I am still thinking about how the user would pass I() through the DBI::Id container.

  > str(DBI::Id(name=I("abc"))@name)
   Named chr "abc"
   - attr(*, "names")= chr "name"

In the meantime - I think by escaping the identifiers in the DBI API endpoints we've covered most (all?) of the performance gains to be had. Copy your note on let's merge and then iterate as needed.

R/utils.R Outdated Show resolved Hide resolved
hadley added a commit that referenced this pull request Dec 7, 2023
And inherit less from DBI in order to avoid confusing text. Mostly taken from #618. Fixes #645
hadley added a commit that referenced this pull request Dec 8, 2023
And inherit less from DBI in order to avoid confusing text. Mostly taken from #618. Fixes #645

Drop aliases for other DBI functions
@detule
Copy link
Collaborator Author

detule commented Dec 10, 2023

Re-based. Will merge when checks are complete.

@hadley hadley merged commit 56494c0 into r-dbi:main Dec 11, 2023
16 checks passed
@fh-mthomson
Copy link

This is a game changer for being able to actually write to Snowflake! Thank you @detule @fh-afrachioni @hadley!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants