Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Pass file_format values as-is in external table configuration #1183

Conversation

sfc-gh-hachouraria
Copy link
Contributor

Common use of FILE_FORMAT within CREATE EXTERNAL TABLE includes specifying
a previously created format name (FORMAT_NAME = 'NAME') or the various
type options (TYPE=CSV FIELD_DELIMITER='|').

Both of these require defining string literals with the single quote
character (') that should not be escaped (\').

This change removes the escaping of single quotes performed for the
file_format values to allow them to be specified without failing
the query compilation.

Some examples have also been added to the documentation
to make it easier to understand how values for file_format
need to be passed for external tables.

Test Plan

  • acceptance tests (make test-acceptance with env-vars set to a trial account)
  • unit tests (make test)
  • manual test with FORMAT_NAME = 'NAME'
  • manual test with TYPE = CSV FIELD_DELIMITER = '|' NULL_IF = ('0', '')
  • executed modified example

References

@sfc-gh-swinkler
Copy link
Collaborator

will this break existing use cases that utilize the string escape currently? if so, may need to add backward compatability for that.

@sfc-gh-hachouraria
Copy link
Contributor Author

Thank you for taking a look @sfc-gh-swinkler!

I checked all of the format options displayed in the following documents:

For existing file format names that may have included a ' character in them, the escapes would have prevented it from working unless one was oddly relying on the backslash injection:

  file_format  = "FORMAT_NAME=$$FOO'BAR$$"
  # Desired: FOO'BAR
  # Would look instead for FOO\'BAR (different identifier with 8 characters instead of 7)

For file format names that may include a \ character the .tf file will require an escape during definition so the second escape would break the name. There's a chance one could be relying on this doubling effect but then it'd be a user-error because a \\ in the .tf files is an escaped single \.

  file_format = "FORMAT_NAME=$$identifier\\with\\slashes$$"
  # Desired: IDENTIFIER\WITH\SLASHES, with \\ required to write \ in .tf files
  # Would look instead for IDENTIFIER\\WITH\\SLASHES (different identifier with 25 instead of 23 characters)

For other values that accept string/character literals through the use of ', it would simply never have worked due to the unnecessary escaping leading to \' (2 characters instead of 1) appearing in the generated SQL.

Please let me know if I've missed something.

@sfc-gh-swinkler
Copy link
Collaborator

Okay so that makes sense. Two things:

  1. need to run make docs to rebuild docs
  2. I am wondering if we should make this a more complex object like:
    file_format = {
    name =
    type =
    options =
    }

Common use of FILE_FORMAT under CREATE EXTERNAL TABLE includes specifying
a previously created format name (FORMAT_NAME = 'NAME') or the various
type options (TYPE=CSV FIELD_DELIMITER='|').

Both of these require defining string literals with the single quote
character (') that should not be escaped (\\').

This change removes the escaping of single quotes performed for the
file_format values to allow them to be specified without failing
the query compilation.

Some examples have also been added to the documentation
to make it easier to understand how values for file_format
need to be passed for external tables.

Testing:

  - Modified unit tests to exercise passing of typical file format values
  - Ran 'make test' to confirm existing tests continue to pass
  - Setup a trial account, exported env-vars and ran 'make test-acceptance'
    and verified all tests passed
  - Attempted an external table creation on a personal account with the
    changes included through a local buildand observed it to execute a
    successful SQL. Tried with both FORMAT_NAME and FIELD_DELIMITER
    options with string literal values.

Fixes Snowflake-Labs#1046
@sfc-gh-hachouraria sfc-gh-hachouraria force-pushed the hachouraria/SFDC-00394589_single_quote_escapes_in_external_tbl_file_format branch from 29690ed to 4b82042 Compare September 10, 2022 01:44
@sfc-gh-hachouraria
Copy link
Contributor Author

I've added the missing doc changes.

The complex object would certainly be nicer to define, perhaps something equivalent to the file_format resource which has an exhaustive-looking options list: https://registry.terraform.io/providers/Snowflake-Labs/snowflake/latest/docs/resources/file_format

@sfc-gh-swinkler
Copy link
Collaborator

sfc-gh-swinkler commented Sep 12, 2022

would it make sense to deprecate file_format for extenral table and then add support for external table to the file format resource? for example we could refactor the file_format resource to accept an external table like so:

resource "snowflake_file_format" "example_file_format" {
  object_name = snowflake_external_table.external_table.name
  object_type = "EXTERNAL_TABLE"
  format_type = "CSV"
}

we do something similiar for the tag_association resource https://registry.terraform.io/providers/Snowflake-Labs/snowflake/latest/docs/resources/tag_association

@sfc-gh-hachouraria
Copy link
Contributor Author

It does make sense to do it that way since external tables do always need a format specified. However, not all of the file format options available to SELECT FROM @STAGE or COPY INTO TABLE are supported in CREATE EXTERNAL TABLE according to https://docs.snowflake.com/en/sql-reference/sql/create-external-table.html (for example, NULL_IF = (…) support appears absent for TYPE=CSV) so it could cause some user confusion?

@sfc-gh-swinkler
Copy link
Collaborator

what i will do is merge this for now and then make a note to revisit this later. I don't know that users would be too confused as long as it is well documented. however i could also see creating a resource such as "snowflake_external_table_file_format" if the file format features supported are sufficiently different than "snowflake_file_format".

@sfc-gh-swinkler
Copy link
Collaborator

/ok-to-test sha=4b82042

@github-actions
Copy link

Integration tests success for 4b82042

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

String escaping on snowflake_external_table.file_format prevents passing field_delimiter as a string literal
2 participants