Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow creation of UDFs in a dataset #4493

Comments

@chetaldrich
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment. If the issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If the issue is assigned to a user, that user is claiming responsibility for the issue. If the issue is assigned to "hashibot", a community member has claimed the issue already.

Description

I was looking for something that might allow me to define a UDF inside a Google Cloud BigQuery dataset, and wasn't able to find anything in the provider at present. It seems like the Go client library allows for creation of these by creation of a Routine, defined here.

It might be worthwhile to consider adding this functionality to the module considering they can now be defined in a persistent manner (see this)

New or Affected Resource(s)

  • google_bigquery_udf
  • google_bigquery_dataset

Potential Terraform Configuration

resource "google_bigquery_dataset" "fn" {
  dataset_id = "fn"
  location = "US"
  project = google_project.project_here.project_id
}

data "udf_parameters" "udf_args" {
  named_parameter {
    param_name = "my_arg1"
    param_type = "FLOAT64"
  }

  named_parameter {
    param_name = "my_arg2"
    param_type = "FLOAT64"
  }
}

resource "google_bigquery_udf" "my_udf" {
  project = google_project.project_here.project_id
  dataset_id = google_bigquery_dataset.fn.dataset_id
  persistent_function_id = "MY_UDF"
  language = "js"
  arguments = data.udf_parameters.udf_args
  returns = "FLOAT64"
  body = <<EOF
  function body here, or import a file
EOF
}

I'm not confident this covers all the potential knobs that you can turn, but I think it covers the ones mentioned here.

References

@shollyman
Copy link

Thanks for filing this.

Essentially Routines (at the API level) are a meta-collection for different kinds of function-like resources. As you've noted, user-defined functions are one aspect, but other kinds of things map into this concept like stored procedures and table-valued functions.

Is the general preference of the terraform community to have different kinds of resources exposed (e.g. google_bigquery_udf, google_bigquery_procedure, etc) or to adhere closer to the API and expose a more generic google_bigquery_routine resource with optional fields that aren't relevant for all kinds of routines? Main precedent among bigquery resources here is that currently google_bigquery_table can be used to create managed tables, logical views, and federated tables that reference external resources.

The other thing that's going to be interesting is expressing the data typing of inputs and outputs. The common case is something like the scalar syntax here, but BQ's ability to use structs and arrays means that typing is more complex, and it may make more sense to do the same "just use a json string" technique as used for expressing schemas in table resources.

@chetaldrich
Copy link
Author

So, it seems like I'm seeing a few questions here:

  1. Routines represent multiple "things" (stored procedures, table-valued functions, and UDFs). Should we have more than one resource or just a routine resource that conforms to the API?

So I think there's a trade-off here. Specifically, if you have multiple resources for each case, the Terraform API differs more from the corresponding Google API, and will increase the maintenance burden, since there are now more resources to maintain.

The other side of the coin is that having a single resource for all of these that requires different parameters that are required/not depending on the circumstance makes it less clear to users what is actually needed to define any particular one of these three, and makes it more likely when users are building these that issues won't be caught during the Terraform planning phase and will instead be noticed when applies are run and they fail. This is, of course, not a great user experience.

The main thing that would make a decision more clear is how different each of these is when you request against the Routine API defined here. If they require much different configurations in practice, I'd argue it makes more sense to split them up into separate resources. The opposite if they mostly require the same set of parameters. Unfortunately I wasn't able to find documentation like the following for stored procedures or table-valued functions beyond a very cursory mention of stored procedures here after searching the BigQuery docs. If you happen to have more clear documentation somewhere that I missed on how these differ in practice it'd make this call easier.

  1. BQ's ability to use structs and arrays means that typing is more complex, so it'd probably be better to just use a JSON, right?

Yeah, I'd agree. I wasn't really thinking about that at the time of creating this issue. We could use JSON that hews closer to the Argument definition instead of using the named_parameter objects I constructed in the initial example.

@jamiekt
Copy link

jamiekt commented Dec 7, 2019

Bringing over the points I made on #4619 which @paddycarver has (rightly) closed as a duplicate. This may add discussion to the points that @chetaldrich raised above.

Description

On 2019-10-03 the BigQuery service gained the ability to use stored procedures and UDFs, see the release notes here: https://cloud.google.com/bigquery/docs/release-notes#October_03_2019

The routines resource on the BigQuery API supports management of stored procedures and UDFs: https://cloud.google.com/bigquery/docs/reference/rest/v2/routines

This issue is to request support in the Terraform GCP Provider for creating, modifying & destroying BigQuery stored procedures and UDFs.

New or Affected Resource(s)

  • google_bigquery_routine

Potential Terraform Configuration

resource "google_bigquery_routine" "sproc" {
  project_id  = "myproject"
  dataset_id = "${google_bigquery_dataset.default.dataset_id}"
  routine_id     = "mysproc"
  routine_type = "PROCEDURE"
  language = "SQL"
  definition_body = "CREATE PROCEDURE ${google_bigquery_dataset.default.dataset_id}.mysproc () BEGIN SET x = x + delta; END"
  # see full list of attributes [here](https://cloud.google.com/bigquery/docs/reference/rest/v2/routines)
}
resource "google_bigquery_routine" "udf" {
  project_id  = "myproject"
  dataset_id = "${google_bigquery_dataset.default.dataset_id}"
  routine_id     = "myudf"
  routine_type = "SCALAR_FUNCTION"
  language = "SQL"
  definition_body = "${data.template.function_body}"
  # see full list of attributes [here](https://cloud.google.com/bigquery/docs/reference/rest/v2/routines)
}

@danawillow danawillow added this to the Goals milestone Dec 9, 2019
@griffinator76
Copy link

Regarding whether to have a single Terraform resource or multiple resources for stored procs, functions - the existing google_bigquery_table resource can be used to implement both tables and views in a dataset, even though these are conceptually different, because that is how the table API is implemented. For consistency, I think I would prefer a single "routines" resource that maps to the routines API, rather than abstracting this into multiple resources. There's little difference between the API definition of functions and stored procedures since most of the logic is implemented within the arbitrary code of the "definition" property.

@megan07 megan07 modified the milestones: Goals, Near-Term Goals Mar 20, 2020
@jmcarp
Copy link

jmcarp commented Apr 18, 2020

I was curious about this feature and looked at generating the resource via api.yaml, but the api includes a recursive data type: https://cloud.google.com/bigquery/docs/reference/rest/v2/StandardSqlDataType. I'm not very familiar with the api.yaml format--is it possible to represent recursive data types?

@nozieren
Copy link

nozieren commented Jul 9, 2020

Hi all, does Terraform give us a release date or do we have to get enough 👍 ?
Thanks !

@ghost
Copy link

ghost commented Nov 20, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

@ghost ghost locked as resolved and limited conversation to collaborators Nov 20, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.