Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ISSUE] Error when creating UC Managed table : Missing cloud file system scheme #1151

Closed
ebarault opened this issue Mar 3, 2022 · 9 comments · Fixed by #1154
Closed

[ISSUE] Error when creating UC Managed table : Missing cloud file system scheme #1151

ebarault opened this issue Mar 3, 2022 · 9 comments · Fixed by #1154

Comments

@ebarault
Copy link

ebarault commented Mar 3, 2022

Configuration

resource "databricks_table" "this" {
  name               = "test"
  
  catalog_name       = local.catalog_name
  schema_name        = local.schema_name
  
  table_type         = "MANAGED"
  storage_location   = ""
  data_source_format = "DELTA"

  column {
    name      = "id"
    position  = 0
    type_name = "INT"
    type_text = "int"
    type_json = "{\"name\":\"id\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}}"
  }
}

Expected Behavior

Table is created

Actual Behavior

Error:

2022-03-03T13:09:13.489+0100 [ERROR] vertex "databricks_table.this" error: cannot create table: Missing cloud file system scheme with databricks_table.this

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

  1. terraform apply

Terraform and provider versions

terraform v1.1.0
databricks provider v0.5.0

Debug Output

databricks_table.this: Creating...
2022-03-03T13:09:13.017+0100 [INFO]  Starting apply for databricks_table.this
2022-03-03T13:09:13.017+0100 [DEBUG] databricks_table.this: applying the planned Create change
2022-03-03T13:09:13.018+0100 [INFO]  provider.terraform-provider-databricks_v0.5.0: Using directly configured basic authentication: timestamp=2022-03-03T13:09:13.018+0100
2022-03-03T13:09:13.018+0100 [INFO]  provider.terraform-provider-databricks_v0.5.0: Configured basic auth: host=https://***REDACTED***.cloud.databricks.com/, username=***REDACTED***, password=***REDACTED***, account_id=***REDACTED***: timestamp=2022-03-03T13:09:13.018+0100
2022-03-03T13:09:13.018+0100 [DEBUG] provider.terraform-provider-databricks_v0.5.0: POST /api/2.0/unity-catalog/tables {
2022-03-03T13:09:13.488+0100 [DEBUG] provider.terraform-provider-databricks_v0.5.0: 400 Bad Request {
2022-03-03T13:09:13.489+0100 [WARN]  provider.terraform-provider-databricks_v0.5.0: /api/2.0/unity-catalog/tables:400 - Missing cloud file system scheme https://docs.databricks.com/dev-tools/api/latest/unity-catalog.html#tables: timestamp=2022-03-03T13:09:13.489+0100
2022-03-03T13:09:13.489+0100 [WARN]  provider.terraform-provider-databricks_v0.5.0: /api/2.0/unity-catalog/tables:400 - Missing cloud file system scheme https://docs.databricks.com/dev-tools/api/latest/unity-catalog.html#tables: timestamp=2022-03-03T13:09:13.489+0100
2022-03-03T13:09:13.489+0100 [ERROR] vertex "databricks_table.this" error: cannot create table: Missing cloud file system scheme
│   with databricks_table.this,
│   on main.tf line 65, in resource "databricks_table" "this":
│   65: resource "databricks_table" "this" {
path=.terraform/providers/registry.terraform.io/databrickslabs/databricks/0.5.0/darwin_arm64/terraform-provider-databricks_v0.5.0 pid=63201
@ebarault
Copy link
Author

ebarault commented Mar 3, 2022

cc: sridharitv

@ebarault
Copy link
Author

ebarault commented Mar 3, 2022

I'm fully willing to work on a fix, but I don't have access to the API spec.

In the code, only the "EXTERNAL" table scenario is tested:
https://github.com/databrickslabs/terraform-provider-databricks/blob/master/catalog/resource_table_test.go#L24
I tried creating an EXTERNAL table and it works.

So the problem is around a missing parameter for the MANAGED tables

@ebarault ebarault changed the title [ISSUE] Error when creating UC table : Missing cloud file system scheme [ISSUE] Error when creating UC Managed table : Missing cloud file system scheme Mar 3, 2022
@nfx
Copy link
Contributor

nfx commented Mar 3, 2022

@ebarault it's an empty storage location that is causing the error. try filling it in.

can you elaborate your idea on databricks_table resources, by the way?

@ebarault
Copy link
Author

ebarault commented Mar 3, 2022

hi @nfx it is meant to be this way for Managed table. The doc states:

storage_location - URL of storage location for Table data (required for EXTERNAL Tables. For Managed Tables, if the path is provided it needs to be a Staging Table path that has been generated through the Staging Table API, otherwise should be empty)

And it is just the way the example is provided.
(and yes, i tried filling this property with different values, including and dumb s3:// location, as well as not providing it at all)

I find it weird that we have to provide an empty string for an Managed table, the best should be to not provide it at all, but then the provider raises an error that this property is required.

What is your point regarding your last comment ?

can you elaborate your idea on databricks_table resources, by the way?

@nfx
Copy link
Contributor

nfx commented Mar 3, 2022

@ebarault this resource needs a bit more love 😂

What is your point regarding your last comment ?

can you elaborate your idea on databricks_table resources, by the way?

I'm looking at telemetry and I don't see plenty of customers using databricks_table. So I wonder - what scenarios do people want to use this resource? Eg setting up entire warehouse with terraform or just some individual tables?

@ebarault
Copy link
Author

ebarault commented Mar 3, 2022

Ah, great question @nfx, and we're still evaluating the response to it.
My answer is linked to the rationale behind this feature proposal

We don't plan to use this module to manage the tables schemas, but to manage the rights on those tables.

Possible workflow:

  • A data team has a new project, an needs new tables
  • Data ops create the tables with terraform with a basic structure (just an id column for example). We make sure that further terraform apply ignore the changes make on the table structure from outside terraform
  • Data ops give the rights to the data team or to an external workflow to modify the tables structure
  • Data team develops workflows to feed those tables
  • Data ops manage the access control to those tables with terraform (grants) to give read access throughout the organization

We're also looking for evolutions where the privileges could be inherited from the database (schema), but we do keep the use case where a common database would gather multiple tables with different access control requirements for different teams (no read access for all)

Generally speaking we're looking for a way to centralize all the access control on all databases/tables in one place

@nfx
Copy link
Contributor

nfx commented Mar 4, 2022

@ebarault So it looks like you need https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs/resources/grants. It's working with tables defined outside of terraform as well. Perhaps I should tune documentation to refer to it better.

And yes, data resources for listing tables are coming

@ebarault
Copy link
Author

ebarault commented Mar 4, 2022

@nfx Hi, I'm already using the grants module for sure. For metastore, for schemas, and which to use it for tables... in combination with terraform, not out the blue, referencing the table names.

I'm not looking for workarounds :-) I trully want to rely on terraform to create UC tables, the module exists, it just has a bug, is it hard to fix (I fill it's just few things) ?

Good thing that a data source is coming to list tables, but it's not my main path, i mostly this as a workaround.

nfx added a commit that referenced this issue Mar 4, 2022
@nfx nfx closed this as completed in #1154 Mar 4, 2022
nfx added a commit that referenced this issue Mar 4, 2022
@ebarault
Copy link
Author

ebarault commented Mar 5, 2022

Hi @nfx
thanks for the fix! , the doc should be updated to reflect the change.
As of now it states to pass an empty string.
Could you also tag the repo and make it to the terraform registry ?

@nfx nfx mentioned this issue Mar 7, 2022
michael-berk pushed a commit to michael-berk/terraform-provider-databricks that referenced this issue Feb 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants