diff --git a/docs/guides/aws-e2-firewall-hub-and-spoke.md b/docs/guides/aws-e2-firewall-hub-and-spoke.md index a8fd78f487..cee2247e18 100644 --- a/docs/guides/aws-e2-firewall-hub-and-spoke.md +++ b/docs/guides/aws-e2-firewall-hub-and-spoke.md @@ -4,13 +4,13 @@ page_title: "Provisioning AWS Databricks E2 with a Hub & Spoke firewall for data # Provisioning AWS Databricks E2 with a Hub & Spoke firewall for data exfiltration protection -You can provision multiple Databricks workspaces with Terraform and where many Databricks workspaces are deployed, we recommend a hub and spoke topology reference architecture, powered by AWS Transit Gateway. The hub will consist of a central inspection and egress virtual private cloud (VPC), while the Spoke VPC houses federated Databricks workspaces for different business units or segregated teams. In this way, you create your own version of a centralized deployment model for your egress architecture, as is recommended for large enterprises. For more information please visit [Data Exfiltration Protection With Databricks on AWS](https://databricks.com/blog/2021/02/02/data-exfiltration-protection-with-databricks-on-aws.html). +You can provision multiple Databricks workspaces with Terraform, and where many Databricks workspaces are deployed, we recommend a hub and spoke topology reference architecture powered by AWS Transit Gateway. The hub will consist of a central inspection and egress virtual private cloud (VPC), while the Spoke VPC houses federated Databricks workspaces for different business units or segregated teams. In this way, you create your version of a centralized deployment model for your egress architecture, as is recommended for large enterprises. For more information, please visit [Data Exfiltration Protection With Databricks on AWS](https://databricks.com/blog/2021/02/02/data-exfiltration-protection-with-databricks-on-aws.html). ![Data Exfiltration](https://raw.githubusercontent.com/databricks/terraform-provider-databricks/master/docs/images/aws-exfiltration-replace-1.png) ## Provider initialization for E2 workspaces -This guide assumes you have the `client_id`, which is the `application_id` of the [Service Principal](resources/service_principal.md), `client_secret`, which is its secret and `databricks_account_id` which can be found in the bottom left corner of the [Account Console](https://accounts.cloud.databricks.com). (see [instruction](https://docs.databricks.com/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal)). This guide is provided as is and assumes you will use it as the basis for your setup.. If you are using AWS Firewall to block most traffic but allow the URLs that Databricks needs to connect to please update the configuration based on your region. You can get the configuration details for your region from [Firewall Appliance](https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html#firewall-appliance-infrastructure) document. +This guide assumes you have the `client_id`, which is the `application_id` of the [Service Principal](resources/service_principal.md), `client_secret`, which is its secret, and `databricks_account_id`, which can be found in the bottom left corner of the [Account Console](https://accounts.cloud.databricks.com). (see [instruction](https://docs.databricks.com/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal)). This guide is provided as is and assumes you will use it as the basis for your setup. If you use AWS Firewall to block most traffic but allow the URLs to which Databricks needs to connect, please update the configuration based on your region. You can get the configuration details for your region from [Firewall Appliance](https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html#firewall-appliance-infrastructure) document. ```hcl variable "client_id" {} @@ -85,7 +85,7 @@ Before [managing workspace](workspace-management.md), you have to create: - [Databricks E2 workspace](aws-workspace.md#databricks-e2-workspace) - [Host and Token outputs](aws-workspace.md#provider-configuration) -> Initializing provider with `alias = "mws"` and using `provider = databricks.mws` for all `databricks_mws_*` resources. We require all `databricks_mws_*` resources to be created within it's own dedicated terraform module of your environment. Usually this module creates VPC and IAM roles as well. +> Initializing provider with `alias = "mws"` and using `provider = databricks.mws` for all `databricks_mws_*` resources. We require all `databricks_mws_*` resources to be created within its own dedicated terraform module of your environment. Usually, this module creates VPC and IAM roles as well. ```hcl terraform { @@ -119,7 +119,7 @@ The very first step is Hub & Spoke VPC creation. Please consult [main documentat ### Spoke VPC for Databricks Workspace -First step is to create Spoke VPC which houses federated Databricks workspaces for different business units or segregated teams. +The first step is to create Spoke VPC, which houses federated Databricks workspaces for different business units or segregated teams. ![SpokeVPC](https://raw.githubusercontent.com/databricks/terraform-provider-databricks/master/docs/images/aws-e2-firewall-spoke-vpc.png) @@ -197,8 +197,8 @@ Security groups must have the following rules: ***Ingress (inbound):***: -- Allow TCP on all ports when traffic source uses the same security group -- Allow UDP on all ports when traffic source uses the same security group +- Allow TCP on all ports when the traffic source uses the same security group +- Allow UDP on all ports when the traffic source uses the same security group ```hcl /* VPC's Default Security Group */ @@ -259,7 +259,7 @@ resource "databricks_mws_networks" "this" { ### VPC Endpoint for Spoke VPC -For STS, S3 and Kinesis, it's important to create VPC gateway or interface endpoints such that the relevant in-region traffic from clusters could transit over the secure AWS backbone rather than the public network, for more direct connections and reduced cost compared to AWS global endpoints. +For STS, S3, and Kinesis, it's important to create VPC gateway or interface endpoints such that the relevant in-region traffic from clusters could transit over the secure AWS backbone rather than the public network for more direct connections and reduced cost compared to AWS global endpoints. ```hcl /* Create VPC Endpoint */ @@ -306,7 +306,7 @@ module "vpc_endpoints" { ### Hub VPC -The hub will consist of a central inspection and egress virtual private cloud (VPC). We're going to create a central inspection/egress VPC, which once we’ve finished should look like this: +The hub will consist of a central inspection and egress virtual private cloud (VPC). We're going to create a central inspection/egress VPC, which, once we’ve finished, should look like this: ![HubVPC](https://raw.githubusercontent.com/databricks/terraform-provider-databricks/master/docs/images/aws-e2-firewall-hub-vpc.png) @@ -384,7 +384,7 @@ resource "aws_nat_gateway" "hub_nat" { ### Route Tables for Hub -Next, we're going to create route tables for Hub VPC subnets, NAT gateway, Internet Gateway and add some routes. +Next, we will create route tables for Hub VPC subnets, NAT gateway, and Internet Gateway and add some routes. ```hcl /* Routing table for hub private subnet */ @@ -465,9 +465,9 @@ resource "aws_main_route_table_association" "set-worker-default-rt-assoc" { ## AWS Transit Gateway -Now that our spoke and inspection/egress VPCs are ready to go, all you need to do is link them all together, and AWS Transit Gateway is the perfect solution for that. -First, we're going to create a Transit Gateway and link our Databricks data plane via TGW subnets. -All of the logic that determines what routes are going via a Transit Gateway is encapsulated within Transit Gateway Route Tables. We’re going to create some TGW routes tables for our Hub & Spoke networks. +Now that our spoke, and inspection/egress VPCs are ready to go, all you need to do is link them all together, and AWS Transit Gateway is the perfect solution for that. +First, we will create a Transit Gateway and link our Databricks data plane via TGW subnets. +All of the logic that determines what routes are going via a Transit Gateway is encapsulated within Transit Gateway Route Tables. We will create some TGW route tables for our Hub & Spoke networks. ![TransitGateway](https://raw.githubusercontent.com/databricks/terraform-provider-databricks/master/docs/images/aws-e2-firewall-tgw.png) @@ -517,7 +517,7 @@ resource "aws_ec2_transit_gateway_vpc_attachment" "spoke" { ### Route Table Configurations for Transit Gateway -The Transit Gateway should be set up and ready to go, now all that needs to be done is update the route tables in each of the subnets so that traffic flows through it. +The Transit Gateway should be set up and ready to go. Now, all that needs to be done is update the route tables in each subnet so traffic flows through it. ```hcl # Create Route to Internet @@ -558,7 +558,7 @@ Once [VPC](#vpc) is ready, we're going to create AWS Network Firewall for your V ### AWS Firewall Rule Groups -First we're going to create a Firewall Rule group for accessing hive metastore and public repositories. +First, we will create a Firewall Rule group for accessing hive metastore and public repositories. ```hcl /*Firewall Rule group for accessing hive metastore and public repositories*/ @@ -588,7 +588,7 @@ resource "aws_networkfirewall_rule_group" "databricks_fqdns_rg" { ``` -As next step, we're going to create Firewall Rule group that allows control plane traffic from the VPC. +As the next step, we will create a Firewall Rule group that allows control plane traffic from the VPC. ```hcl locals { @@ -635,7 +635,7 @@ resource "aws_networkfirewall_rule_group" "allow_db_cpl_protocols_rg" { ``` -Next, we're going to create basic deny rules to cater for common firewall scenarios such as preventing the use of protocols like SSH/SFTP, FTP and ICMP. +Next, we will create basic deny rules to cater for common firewall scenarios, such as preventing the use of protocols like SSH/SFTP, FTP, and ICMP. ```hcl /* Firewall Rule group for dropping ICMP, FTP, SSH*/ @@ -680,7 +680,7 @@ resource "aws_networkfirewall_rule_group" "deny_protocols_rg" { ### AWS Network Firewall Policy -Now we can create AWS Firewall Policy and include stateful firewall rule groups created in previous steps. +Now, we can create an AWS Firewall Policy and include stateful firewall rule groups created in previous steps. ```hcl resource "aws_networkfirewall_firewall_policy" "egress_policy" { @@ -704,7 +704,7 @@ resource "aws_networkfirewall_firewall_policy" "egress_policy" { ### AWS Firewall -Next step is to create an AWS Network Firewall with the Firewall Policy we defined in the previous step. +The next step is to create an AWS Network Firewall with the Firewall Policy we defined in the previous step. ```hcl /* Create Firewall*/ @@ -735,7 +735,7 @@ data "aws_vpc_endpoint" "firewall" { ``` -Finally, AWS Network Firewall is now deployed and configured, all you need to do now is route traffic to it. +Finally, the AWS Network Firewall is now deployed and configured; all you need to do now is route traffic to it. ```hcl /* Add Route from Nat Gateway to Firewall */ diff --git a/docs/guides/aws-e2-firewall-workspace.md b/docs/guides/aws-e2-firewall-workspace.md index 2286091c17..7060bbd2d9 100644 --- a/docs/guides/aws-e2-firewall-workspace.md +++ b/docs/guides/aws-e2-firewall-workspace.md @@ -4,15 +4,15 @@ page_title: "Provisioning AWS Databricks E2 with a AWS Firewall" # Provisioning AWS Databricks E2 with a AWS Firewall -You can provision multiple Databricks workspaces with Terraform. This example shows how to deploy a Databricks workspace into a VPC which uses AWS Network firewall to manage egress out to the public network. For smaller Databricks deployments this would be our recommended configuration. For larger deployments see [Provisioning AWS Databricks E2 with a Hub & Spoke firewall for data exfiltration protection](aws-e2-firewall-hub-and-spoke.md). +You can provision multiple Databricks workspaces with Terraform. This example shows how to deploy a Databricks workspace into a VPC, which uses an AWS Network firewall to manage egress out to the public network. For smaller Databricks deployments, this is our recommended configuration; for larger deployments, see [Provisioning AWS Databricks E2 with a Hub & Spoke firewall for data exfiltration protection](aws-e2-firewall-hub-and-spoke.md). -For more information please visit [Data Exfiltration Protection With Databricks on AWS](https://databricks.com/blog/2021/02/02/data-exfiltration-protection-with-databricks-on-aws.html). +For more information, please visit [Data Exfiltration Protection With Databricks on AWS](https://databricks.com/blog/2021/02/02/data-exfiltration-protection-with-databricks-on-aws.html). ![Data Exfiltration_Workspace](https://raw.githubusercontent.com/databricks/terraform-provider-databricks/master/docs/images/aws-e2-firewall-workspace.png) ## Provider initialization for E2 workspaces -This guide assumes you have the `client_id`, which is the `application_id` of the [Service Principal](resources/service_principal.md), `client_secret`, which is its secret and `databricks_account_id` which can be found in the bottom left corner of the [Account Console](https://accounts.cloud.databricks.com). (see [instruction](https://docs.databricks.com/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal)). This guide is provided as is and assumes you will use it as the basis for your setup. If you are using AWS Firewall to block most traffic but allow the URLs that Databricks needs to connect to please update the configuration based on your region. You can get the configuration details for your region from [Firewall Appliance](https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html#firewall-appliance-infrastructure) document. +This guide assumes you have the `client_id`, which is the `application_id` of the [Service Principal](resources/service_principal.md), `client_secret`, which is its secret, and `databricks_account_id`, which can be found in the bottom left corner of the [Account Console](https://accounts.cloud.databricks.com). (see [instruction](https://docs.databricks.com/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal)). This guide is provided as is and assumes you will use it as the basis for your setup. If you are using AWS Firewall to block most traffic but allow the URLs that Databricks needs to connect to, please update the configuration based on your region. You can get the configuration details for your region from [Firewall Appliance](https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html#firewall-appliance-infrastructure) document. ```hcl variable "client_id" {} @@ -83,7 +83,7 @@ Before [managing workspace](workspace-management.md), you have to create: - [Databricks E2 workspace](aws-workspace.md#databricks-e2-workspace) - [Host and Token outputs](aws-workspace.md#provider-configuration) -> Initializing provider with `alias = "mws"` and using `provider = databricks.mws` for all `databricks_mws_*` resources. We require all `databricks_mws_*` resources to be created within it's own dedicated terraform module of your environment. Usually this module creates VPC and IAM roles as well. +> Initializing provider with `alias = "mws"` and using `provider = databricks.mws` for all `databricks_mws_*` resources. We require all `databricks_mws_*` resources to be created within its own dedicated terraform module of your environment. Usually, this module creates VPC and IAM roles as well. ```hcl terraform { @@ -205,8 +205,8 @@ Security groups must have the following rules: ***Ingress (inbound):*** Required for all workspaces (these can be separate rules or combined into one): -- Allow TCP on all ports when traffic source uses the same security group -- Allow UDP on all ports when traffic source uses the same security group +- Allow TCP on all ports when the traffic source uses the same security group +- Allow UDP on all ports when the traffic source uses the same security group ```hcl /* VPC's Default Security Group */ @@ -267,7 +267,7 @@ resource "databricks_mws_networks" "this" { ### Route Tables -Next, we're going to create route tables for VPC subnets, NAT gateway, Internet Gateway and add some routes. +Next, we will create route tables for VPC subnets, NAT gateway, and Internet Gateway and add some routes. ```hcl /* Routing table for private subnet */ @@ -348,7 +348,7 @@ resource "aws_main_route_table_association" "set-worker-default-rt-assoc" { ### VPC Endpoints -For STS, S3 and Kinesis, it's important to create VPC gateway or interface endpoints such that the relevant in-region traffic from clusters could transit over the secure AWS backbone rather than the public network, for more direct connections and reduced cost compared to AWS global endpoints. +For STS, S3, and Kinesis, it's important to create VPC gateway or interface endpoints such that the relevant in-region traffic from clusters could transit over the secure AWS backbone rather than the public network for more direct connections and reduced cost compared to AWS global endpoints. ```hcl module "vpc_endpoints" { @@ -395,11 +395,11 @@ module "vpc_endpoints" { ## AWS Network Firewall -Once [VPC](#vpc) is ready, create AWS Network Firewall for your VPC that restricts outbound http/s traffic to an approved set of Fully Qualified Domain Names (FQDNs). +Once [VPC](#vpc) is ready, create an AWS Network Firewall for your VPC that restricts outbound http/s traffic to an approved set of Fully Qualified Domain Names (FQDNs). ### AWS Firewall Rule Groups -First we are going to create a Firewall Rule group for accessing hive metastore and public repositories. +First, we will create a Firewall Rule group for accessing hive metastore and public repositories. ```hcl resource "aws_networkfirewall_rule_group" "databricks_fqdns_rg" { @@ -427,7 +427,7 @@ resource "aws_networkfirewall_rule_group" "databricks_fqdns_rg" { } ``` -As the next step, we are going to create a Firewall Rule group that allows control plane traffic from the VPC. +As the next step, we will create a Firewall Rule group that allows control plane traffic from the VPC. ```hcl resource "aws_networkfirewall_rule_group" "allow_db_cpl_protocols_rg" { @@ -473,7 +473,7 @@ locals { } ``` -Finally, we are going to add some basic deny rules to cater for common firewall scenarios such as preventing the use of protocols like SSH/SFTP, FTP and ICMP. +Finally, we will add some basic deny rules to cater for common firewall scenarios, such as preventing the use of protocols like SSH/SFTP, FTP, and ICMP. ```hcl /* Firewall Rule group for dropping ICMP, FTP, SSH*/ @@ -518,7 +518,7 @@ resource "aws_networkfirewall_rule_group" "deny_protocols_rg" { ### AWS Network Firewall Policy -First, we are going to create AWS Firewall Policy and include stateful firewall rule groups created in previous steps. +First, we will create an AWS Firewall Policy and include stateful firewall rule groups created in previous steps. ```hcl resource "aws_networkfirewall_firewall_policy" "egress_policy" { @@ -542,7 +542,7 @@ resource "aws_networkfirewall_firewall_policy" "egress_policy" { ### AWS Firewall -As the next step, we are going to create an AWS Network Firewall with the Firewall Policy that we defined in the previous step. +As the next step, we will create an AWS Network Firewall with the Firewall Policy we defined in the previous step. ```hcl resource "aws_networkfirewall_firewall" "exfiltration_firewall" { @@ -572,7 +572,7 @@ data "aws_vpc_endpoint" "firewall" { ``` -Finally, AWS Network Firewall is now deployed and configured, all you need to do now is route traffic to it. +Finally, the AWS Network Firewall is now deployed and configured - all you need to do now is route traffic to it. ```hcl /* Add Route from Nat Gateway to Firewall */ diff --git a/docs/guides/aws-private-link-workspace.md b/docs/guides/aws-private-link-workspace.md index 16d34f2531..9101834038 100644 --- a/docs/guides/aws-private-link-workspace.md +++ b/docs/guides/aws-private-link-workspace.md @@ -4,7 +4,7 @@ page_title: "Provisioning Databricks on AWS with PrivateLink" # Deploying pre-requisite resources and enabling PrivateLink connections -Databricks PrivateLink support enables private connectivity between users and their Databricks workspaces and between clusters on the data plane and core services on the control plane within the Databricks workspace infrastructure. You can use Terraform to deploy the underlying cloud resources and the private access settings resources automatically, using a programmatic approach. This guide assumes you are deploying into an existing VPC and you have set up credentials and storage configurations as per prior examples, notably here. +Databricks PrivateLink support enables private connectivity between users and their Databricks workspaces and between clusters on the data plane and core services on the control plane within the Databricks workspace infrastructure. You can use Terraform to deploy the underlying cloud resources and the private access settings resources automatically using a programmatic approach. This guide assumes you are deploying into an existing VPC and have set up credentials and storage configurations as per prior examples, notably here. ![Private link backend](https://raw.githubusercontent.com/databricks/terraform-provider-databricks/master/docs/images/aws-e2-private-link-backend.png) @@ -12,11 +12,11 @@ This guide uses the following variables in configurations: - `client_id`: `application_id` of the service principal, see [instruction](https://docs.databricks.com/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal) - `client_secret`: the secret of the service principal. -- `databricks_account_id`: The numeric ID for your Databricks account. When you are logged in, it appears in the bottom left corner of the page. +- `databricks_account_id`: The numeric ID for your Databricks account. When logged in, it appears in the bottom left corner of the page. - `vpc_id` - The ID for the AWS VPC. - `region` - AWS region. - `security_group_id` - Security groups set up for the existing VPC. -- `subnet_ids` - Existing subnets being used for the customer managed VPC. +- `subnet_ids` - Existing subnets used for the customer-managed VPC. - `workspace_vpce_service` - Choose the region-specific service endpoint from this table. - `relay_vpce_service` - Choose the region-specific service from this table. - `vpce_subnet_cidr` - CIDR range for the subnet chosen for the VPC endpoint. @@ -24,9 +24,9 @@ This guide uses the following variables in configurations: - `root_bucket_name` - AWS bucket name required for [databricks_mws_storage_configurations](https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/mws_storage_configurations). - `cross_account_arn` - AWS EC2 role ARN required for [databricks_mws_credentials](https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/mws_credentials). -This guide is provided as-is and you can use this guide as the basis for your custom Terraform module. +This guide is provided as-is, and you can use this guide as the basis for your custom Terraform module. -To get started with AWS PrivateLink integration, this guide takes you throw the following high-level steps: +This guide takes you through the following high-level steps to set up a workspace with AWS PrivateLink: - Initialize the required providers - Configure AWS objects @@ -37,7 +37,7 @@ To get started with AWS PrivateLink integration, this guide takes you throw the ## Provider initialization -Initialize [provider with `mws` alias](https://www.terraform.io/language/providers/configuration#alias-multiple-provider-configurations) to set up account-level resources. See [provider authentication](../index.md#authenticating-with-hostname,-username,-and-password) for more details. +To set up account-level resources, initialize [provider with `mws` alias](https://www.terraform.io/language/providers/configuration#alias-multiple-provider-configurations). See [provider authentication](../index.md#authenticating-with-hostname,-username,-and-password) for more details. ```hcl terraform { @@ -64,7 +64,7 @@ provider "databricks" { } ``` -Define the required variables +Define the required variables: ```hcl variable "databricks_account_id" {} @@ -120,14 +120,14 @@ In this section, the goal is to create the two back-end VPC endpoints: - Back-end VPC endpoint for SSC relay - Back-end VPC endpoint for REST APIs --> **Note** If you want to implement the front-end VPC endpoint as well for the connections from the user to the workspace front-end, use the transit (bastion) VPC that terminates your AWS Direct Connect or VPN gateway connection or one that is routable from such a transit (bastion) VPC. Once the front-end endpoint is created, it can be supplied to [databricks_mws_networks](../resources/mws_networks.md) resource using vpc_endpoints argument. Use the [databricks_mws_private_access_settings](../resources/mws_private_access_settings.md) resource to control which VPC endpoints can connect to the UI or API of any workspace that attaches this private access settings object. +-> **Note** If you want to implement the front-end VPC endpoint as well for the connections from the user to the workspace front-end, use the transit (bastion) VPC that terminates your AWS Direct Connect or VPN gateway connection or one that is routable from such a transit (bastion) VPC. Once the front-end endpoint is created, it can be supplied to [databricks_mws_networks](../resources/mws_networks.md) resource using `vpc_endpoints` argument. Use the [databricks_mws_private_access_settings](../resources/mws_private_access_settings.md) resource to control which VPC endpoints can connect to the UI or API of any workspace that attaches this private access settings object. The first step is to create the required AWS objects: - A subnet dedicated to your VPC endpoints. - A security group dedicated to your VPC endpoints and satisfying required inbound/outbound TCP/HTTPS traffic rules on ports 443 and 6666, respectively. -For workspace with [compliance security profile](https://docs.databricks.com/security/privacy/security-profile.html#prepare-a-workspace-for-the-compliance-security-profile), you need *additionally* allow bidirectional access to port 2443 for FIPS connections. The total set of ports to allow bidirectional access are 443, 2443, and 6666. +For workspace with [compliance security profile](https://docs.databricks.com/security/privacy/security-profile.html#prepare-a-workspace-for-the-compliance-security-profile), you need *additionally* allow bidirectional access to port 2443 for FIPS connections. The ports to allow bidirectional access are 443, 2443, and 6666. ```hcl data "aws_vpc" "prod" { @@ -275,7 +275,7 @@ resource "databricks_mws_networks" "this" { For a workspace to support any of the PrivateLink connectivity scenarios, the workspace must be created with an attached [databricks_mws_private_access_settings](../resources/mws_private_access_settings.md) resource. -The credentials ID which is referenced below is one of the attributes which is created as a result of configuring the cross-account IAM role, which Databricks uses to orchestrate EC2 resources. The credentials are created via [databricks_mws_credentials](https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/mws_credentials). Similarly, the storage configuration ID is obtained from the [databricks_mws_storage_configurations](https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/mws_storage_configurations) resource. +The credentials ID, referenced below, is one of the attributes created as a result of configuring the cross-account IAM role, which Databricks uses to orchestrate EC2 resources. The credentials are created via [databricks_mws_credentials](https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/mws_credentials). Similarly, the storage configuration ID is obtained from the [databricks_mws_storage_configurations](https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/mws_storage_configurations) resource. ```hcl resource "databricks_mws_private_access_settings" "pas" { @@ -299,3 +299,5 @@ resource "databricks_mws_workspaces" "this" { depends_on = [databricks_mws_networks.this] } ``` + + diff --git a/docs/guides/aws-workspace.md b/docs/guides/aws-workspace.md index 68e441d17a..7cc2cfcdc0 100644 --- a/docs/guides/aws-workspace.md +++ b/docs/guides/aws-workspace.md @@ -10,7 +10,7 @@ You can provision multiple Databricks workspaces with Terraform. ## Provider initialization for E2 workspaces -This guide assumes you have the `client_id`, which is the `application_id` of the [Service Principal](resources/service_principal.md), `client_secret`, which is its secret and `databricks_account_id` which can be found in the bottom left corner of the [Account Console](https://accounts.cloud.databricks.com). (see [instruction](https://docs.databricks.com/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal)). This guide is provided as is and assumes you will use it as the basis for your setup. +This guide assumes you have the `client_id`, which is the `application_id` of the [Service Principal](resources/service_principal.md), `client_secret`, which is its secret, and `databricks_account_id`, which can be found in the bottom left corner of the [Account Console](https://accounts.cloud.databricks.com). (see [instruction](https://docs.databricks.com/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal)). This guide is provided as is and assumes you will use it as the basis for your setup. ```hcl variable "client_id" {} @@ -48,7 +48,7 @@ Before [managing workspace](workspace-management.md), you have to create: - [Databricks E2 workspace](#databricks-e2-workspace) - [Host and Token outputs](#provider-configuration) -> Initialize provider with `alias = "mws"` and use `provider = databricks.mws` for all `databricks_mws_*` resources. We require all `databricks_mws_*` resources to be created within its own dedicated terraform module of your environment. Usually this module creates VPC and IAM roles as well. +> Initialize provider with `alias = "mws"` and use `provider = databricks.mws` for all `databricks_mws_*` resources. We require all `databricks_mws_*` resources to be created within a dedicated terraform module of your environment. Usually, this module creates VPC and IAM roles as well. ```hcl terraform { @@ -111,7 +111,7 @@ resource "databricks_mws_credentials" "this" { ## VPC -The very first step is VPC creation with necessary firewall rules. Please consult [main documentation page](https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html) for **the most complete and up-to-date details on networking**. AWS VPS is registered as [databricks_mws_networks](../resources/mws_networks.md) resource. For STS, S3 and Kinesis, you can create VPC gateway or interface endpoints such that the relevant in-region traffic from clusters could transit over the secure AWS backbone rather than the public network, for more direct connections and reduced cost compared to AWS global endpoints. For more information, see [Regional endpoints](https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html#regional-endpoints-1). +The very first step is VPC creation with necessary firewall rules. Please consult [main documentation page](https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html) for **the most complete and up-to-date details on networking**. AWS VPS is registered as [databricks_mws_networks](../resources/mws_networks.md) resource. For STS, S3, and Kinesis, you can create VPC gateway or interface endpoints such that the relevant in-region traffic from clusters could transit over the secure AWS backbone rather than the public network for more direct connections and reduced cost compared to AWS global endpoints. For more information, see [Regional endpoints](https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html#regional-endpoints-1). ```hcl data "aws_availability_zones" "available" {} @@ -198,7 +198,7 @@ resource "databricks_mws_networks" "this" { ## Root bucket -Once [VPC](#vpc) is ready, create AWS S3 bucket for DBFS workspace storage, which is commonly referred to as **root bucket**. This provider has [databricks_aws_bucket_policy](../data-sources/aws_bucket_policy.md) with the necessary IAM policy template. The AWS S3 bucket has to be registered through [databricks_mws_storage_configurations](../resources/mws_storage_configurations.md). +Once [VPC](#vpc) is ready, create an AWS S3 bucket for DBFS workspace storage, commonly called **root bucket**. This provider has [databricks_aws_bucket_policy](../data-sources/aws_bucket_policy.md) with the necessary IAM policy template. The AWS S3 bucket has to be registered through [databricks_mws_storage_configurations](../resources/mws_storage_configurations.md). ```hcl resource "aws_s3_bucket" "root_storage_bucket" { @@ -258,9 +258,9 @@ resource "databricks_mws_storage_configurations" "this" { Once [VPC](#vpc), [cross-account role](#cross-account-iam-role), and [root bucket](#root-bucket) are set up, you can create Databricks AWS E2 workspace through [databricks_mws_workspaces](../resources/mws_workspaces.md) resource. -Code that creates workspaces and code that [manages workspaces](workspace-management.md) must be in separate terraform modules to avoid common confusion between `provider = databricks.mws` and `provider = databricks.created_workspace`. This is why we specify `databricks_host` and `databricks_token` outputs, which have to be used in the latter modules. +Code that creates workspaces and code that [manages workspaces](workspace-management.md) must be in separate terraform modules to avoid common confusion between `provider = databricks.mws` and `provider = databricks.created_workspace`. We specify `databricks_host` and `databricks_token` outputs, which must be used in the latter modules. --> **Note** If you experience technical difficulties with rolling out resources in this example, please make sure that [environment variables](../index.md#environment-variables) don't [conflict with other](../index.md#empty-provider-block) provider block attributes. When in doubt, please run `TF_LOG=DEBUG terraform apply` to enable [debug mode](https://www.terraform.io/docs/internals/debugging.html) through the [`TF_LOG`](https://www.terraform.io/docs/cli/config/environment-variables.html#tf_log) environment variable. Look specifically for `Explicit and implicit attributes` lines, that should indicate authentication attributes used. The other common reason for technical difficulties might be related to missing `alias` attribute in `provider "databricks" {}` blocks or `provider` attribute in `resource "databricks_..." {}` blocks. Please make sure to read [`alias`: Multiple Provider Configurations](https://www.terraform.io/docs/language/providers/configuration.html#alias-multiple-provider-configurations) documentation article. +-> **Note** If you experience technical difficulties with rolling out resources in this example, please make sure that [environment variables](../index.md#environment-variables) don't [conflict with other](../index.md#empty-provider-block) provider block attributes. When in doubt, please run `TF_LOG=DEBUG terraform apply` to enable [debug mode](https://www.terraform.io/docs/internals/debugging.html) through the [`TF_LOG`](https://www.terraform.io/docs/cli/config/environment-variables.html#tf_log) environment variable. Look specifically for `Explicit and implicit attributes` lines, which should indicate authentication attributes used. The other common reason for technical difficulties might be related to missing `alias` attribute in `provider "databricks" {}` blocks or `provider` attribute in `resource "databricks_..." {}` blocks. Please make sure to read [`alias`: Multiple Provider Configurations](https://www.terraform.io/docs/language/providers/configuration.html#alias-multiple-provider-configurations) documentation article. ```hcl resource "databricks_mws_workspaces" "this" { @@ -290,7 +290,7 @@ output "databricks_token" { ### Data resources and Authentication is not configured errors -*In Terraform 0.13 and later*, data resources have the same dependency resolution behavior [as defined for managed resources](https://www.terraform.io/docs/language/resources/behavior.html#resource-dependencies). Most data resources make an API call to a workspace. If a workspace doesn't exist yet, `default auth: cannot configure default credentials` error is raised. To work around this issue and guarantee a proper lazy authentication with data resources, you should add `depends_on = [databricks_mws_workspaces.this]` to the body. This issue doesn't occur if workspace is created *in one module* and resources [within the workspace](workspace-management.md) are created *in another*. We do not recommend using Terraform 0.12 and earlier, if your usage involves data resources. +*In Terraform 0.13 and later*, data resources have the same dependency resolution behavior [as defined for managed resources](https://www.terraform.io/docs/language/resources/behavior.html#resource-dependencies). Most data resources make an API call to a workspace. If a workspace doesn't exist yet, `default auth: cannot configure default credentials` error is raised. To work around this issue and guarantee proper lazy authentication with data resources, you should add `depends_on = [databricks_mws_workspaces.this]` to the body. This issue doesn't occur if a workspace is created *in one module* and resources [within the workspace](workspace-management.md) are created *in another*. We do not recommend using Terraform 0.12 and earlier if your usage involves data resources. ```hcl data "databricks_current_user" "me" { @@ -309,18 +309,18 @@ provider "databricks" { } ``` -We assume that you have a terraform module in your project that creats a workspace (using [Databricks E2 Workspace](#databricks-e2-workspace) section) and you named it as `e2` while calling it in the **main.tf** file of your terraform project. And `workspace_url` and `token_value` are the output attributes of that module. This provider configuration will allow you to use the generated token during workspace creation to authenticate to the created workspace. +We assume that you have a terraform module in your project that creates a workspace (using [Databricks E2 Workspace](#databricks-e2-workspace) section) and you named it as `e2` while calling it in the **main.tf** file of your terraform project. And `workspace_url` and `token_value` are the output attributes of that module. This provider configuration will allow you to use the generated token to authenticate to the created workspace during workspace creation. ### Credentials validation checks errors -Due to a bug in the Terraform AWS provider (spotted in v3.28) the Databricks AWS cross-account policy creation and attachment to the IAM role takes longer than the AWS request confirmation to Terraform. As Terraform continues creating the Workspace, validation checks for the credentials are failing, as the policy doesn't get applied quick enough. Showing the error: +Due to a bug in the Terraform AWS provider (spotted in v3.28) the Databricks AWS cross-account policy creation and attachment to the IAM role takes longer than the AWS request confirmation to Terraform. As Terraform continues creating the Workspace, validation checks for the credentials fail, as the policy isn’t applied more quickly. Showing the error: ```sh Error: MALFORMED_REQUEST: Failed credentials validation checks: Spot Cancellation, Create Placement Group, Delete Tags, Describe Availability Zones, Describe instances, Describe Instance Status, Describe Placement Group, Describe Route Tables, Describe Security Groups, Describe Spot Instances, Describe Spot Price History, Describe Subnets, Describe Volumes, Describe Vpcs, Request Spot Instances (400 on /api/2.0/accounts/{UUID}/workspaces) ``` -As a workaround give the `aws_iam_role` more time to be created with a `time_sleep` resource, which you need to add as a dependency to the `databricks_mws_workspaces` resource. +As a workaround, give the `aws_iam_role` more time to be created with a `time_sleep` resource, which you need to add as a dependency to the `databricks_mws_workspaces` resource. ```hcl resource "time_sleep" "wait" { @@ -332,7 +332,7 @@ resource "time_sleep" "wait" { #### IAM policy error -If you notice below error: +If you notice the below error: ```sh Error: MALFORMED_REQUEST: Failed credentials validation checks: Spot Cancellation, Create Placement Group, Delete Tags, Describe Availability Zones, Describe instances, Describe Instance Status, Describe Placement Group, Describe Route Tables, Describe Security Groups, Describe Spot Instances, Describe Spot Price History, Describe Subnets, Describe Volumes, Describe Vpcs, Request Spot Instances @@ -342,6 +342,6 @@ Error: MALFORMED_REQUEST: Failed credentials validation checks: Spot Cancellatio ![create_workspace_error](https://github.com/databricks/terraform-provider-databricks/raw/master/docs/images/create_workspace_error.png) -- Verify if the role and policy exists (assume role should allow external id) +- Verify if the role and policy exist (assume role should allow external ID) ![iam_role_trust_error](https://github.com/databricks/terraform-provider-databricks/raw/master/docs/images/iam_role_trust_error.png) diff --git a/docs/guides/azure-private-link-workspace-simplified.md b/docs/guides/azure-private-link-workspace-simplified.md index cc4cf9ec6f..b4e8bf01fe 100644 --- a/docs/guides/azure-private-link-workspace-simplified.md +++ b/docs/guides/azure-private-link-workspace-simplified.md @@ -1,5 +1,5 @@ --- -page_title: "Provisioning Azure Databricks with Private Link - Simple deployment" +page_title: "Provisioning Azure Databricks with Private Link - Simple deployment." --- # Deploying pre-requisite resources and enabling Private Link connections - Simple deployment @@ -8,16 +8,16 @@ page_title: "Provisioning Azure Databricks with Private Link - Simple deployment -> **Note** This guide assumes that connectivity from the on-premises user environment is already configured using ExpressRoute or a VPN gateway connection. -[Azure Private Link](https://learn.microsoft.com/en-us/azure/private-link/private-link-overview) support enables private connectivity between users and their Databricks workspaces and between clusters on the data plane and core services on the control plane within the Databricks workspace infrastructure. +[Azure Private Link](https://learn.microsoft.com/en-us/azure/private-link/private-link-overview) support enables private connectivity between users and their Databricks workspaces and between clusters on the data plane and core services on the control plane within the Databricks workspace infrastructure. -You can use Terraform to deploy the underlying cloud resources and the private access settings resources automatically, using a programmatic approach. +You can use Terraform to deploy the underlying cloud resources and the private access settings resources automatically using a programmatic approach. This guide covers a [simple deployment](https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/private-link-simplified) to configure Azure Databricks with Private Link: * No separate VNet separates user access from the VNet that you use for your compute resources in the Classic data plane * A transit subnet in the data plane VNet is used for user access -* Only a single private endpoint is used for both front-end and back-end connectivity. +* Only a single private endpoint is used for both front-end and back-end connectivity. * A separate private endpoint is used for web authentication -* The same Databricks workspace is used for web authentication traffic but Databricks strongly recommends creating a separate workspace called a private web auth workspace for each region to host the web auth private network settings. +* The same Databricks workspace is used for web authentication traffic. Databricks still strongly recommends creating a separate workspace called a private web auth workspace for each region to host the web auth private network settings. ![Azure Databricks with Private Link - Simple deployment](https://github.com/databricks/terraform-provider-databricks/raw/master/docs/images/azure-private-link-simplified.png) @@ -27,25 +27,25 @@ This guide uses the following variables: - `rg_name`: The name of the existing resource group - `location`: The location for Azure resources -This guide is provided as-is and you can use it as the basis for your custom Terraform module. +This guide is provided as-is, and you can use it as the basis for your custom Terraform module. -To get started with Azure Private Link integration, this guide takes you through the following high-level steps: +This guide takes you through the following high-level steps to set up a workspace with Azure Private Link: - Initialize the required providers - Configure Azure objects - Deploy an Azure Vnet with the following subnets: - - Public and private subnets for Azure Databricks workspace - - Private Link subnet that will contain the following private endpoints: - - Frontend / Backend private endpoint - - Web_auth private endpoint - - Configure the private DNS zone in order to add: - - DNS A record to map connection for workspace access - - DNS A record(s) for web_auth + - Public and private subnets for Azure Databricks workspace + - Private Link subnet that will contain the following private endpoints: + - Frontend / Backend private endpoint + - Web_auth private endpoint + - Configure the private DNS zone to add: + - DNS A record to map connection for workspace access + - DNS A record(s) for web_auth - Workspace Creation ## Provider initialization -Initialize provider +Initialize provider ```hcl terraform { @@ -96,11 +96,11 @@ locals { } ``` -## Configure network +## Configure network -### Deploy Azure vnet and subnets +### Deploy Azure VNet and subnets -Create a new Azure VNet, the required subnets and associated security groups: +Create a new Azure VNet, the required subnets, and associated security groups: ```hcl resource "azurerm_virtual_network" "this" { @@ -216,7 +216,7 @@ resource "azurerm_subnet" "plsubnet" { #### Frontend / Backend private endpoint -Create a private endpoint with sub resource **databricks_ui_api**: +Create a private endpoint with sub-resource **databricks_ui_api**: ```hcl @@ -254,7 +254,7 @@ resource "azurerm_private_dns_zone_virtual_network_link" "uiapidnszonevnetlink" #### Web auth private endpoint -Create a private endpoint with sub resource **browser_authentication**: +Create a private endpoint with sub-resource **browser_authentication**: ```hcl resource "azurerm_private_endpoint" "auth" { @@ -279,7 +279,7 @@ resource "azurerm_private_endpoint" "auth" { ## Configure workspace -Deploy an Azure Databricks workspace +Deploy an Azure Databricks workspace: ```hcl resource "azurerm_databricks_workspace" "this" { @@ -308,4 +308,4 @@ resource "azurerm_databricks_workspace" "this" { } ``` --> **Note** The public network access to the workspace is disabled. You can access the workspace only through the private connectivity to the on-premises user environment. For testing purposes, you can deploy an Azure VM in the same vnet in order to test the frontend connectivity. +-> **Note** The public network access to the workspace is disabled. You can access the workspace only through private connectivity to the on-premises user environment. For testing purposes, you can deploy an Azure VM in the same VNet to test the frontend connectivity. diff --git a/docs/guides/azure-private-link-workspace-standard.md b/docs/guides/azure-private-link-workspace-standard.md index 7fbd7a4eb9..845f3e0c06 100644 --- a/docs/guides/azure-private-link-workspace-standard.md +++ b/docs/guides/azure-private-link-workspace-standard.md @@ -1,31 +1,31 @@ --- -page_title: "Provisioning Azure Databricks with Private Link - Standard deployment" +page_title: "Provisioning Azure Databricks with Private Link - Standard deployment." --- -# Deploying pre-requisite resources and enabling Private Link connections - Standard deployment +# Deploying pre-requisite resources and enabling Private Link connections - Standard deployment. --> **Note** - - Refer to [adb-with-private-link-standard](https://github.com/databricks/terraform-databricks-examples/tree/main/modules/adb-with-private-link-standard), a Terraform module that contains code used to deploy an Azure Databricks workspace with Azure Private Link using the Standard deployment approach. +-> **Note** + - Refer to [adb-with-private-link-standard](https://github.com/databricks/terraform-databricks-examples/tree/main/modules/adb-with-private-link-standard), a Terraform module that contains code used to deploy an Azure Databricks workspace with Azure Private Link using the Standard deployment approach. - Refer to the [Databricks Terraform Registry modules](https://registry.terraform.io/modules/databricks/examples/databricks/latest) for more Terraform modules and examples to deploy Azure Databricks resources. - This guide assumes that connectivity from the on-premises user environment is already configured using ExpressRoute or a VPN gateway connection. -[Azure Private Link](https://learn.microsoft.com/en-us/azure/private-link/private-link-overview) support enables private connectivity between users and their Databricks workspaces and between clusters on the data plane and core services on the control plane within the Databricks workspace infrastructure. +[Azure Private Link](https://learn.microsoft.com/en-us/azure/private-link/private-link-overview) support enables private connectivity between users and their Databricks workspaces and between clusters on the data plane and core services on the control plane within the Databricks workspace infrastructure. -You can use Terraform to deploy the underlying cloud resources and the private access settings resources automatically, using a programmatic approach. +You can use Terraform to deploy the underlying cloud resources and the private access settings resources automatically using a programmatic approach. This guide covers a [standard deployment](https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/private-link-standard) to configure Azure Databricks with Private Link: -* Two seperate VNets are used: - * A transit VNet +* Two separate VNets are used: + * A transit VNet * A customer Data Plane VNet * A private endpoint is used for back-end connectivity and deployed in the customer Data Plane VNet. * A private endpoint is used for front-end connectivity and deployed in the transit VNet. * A private endpoint is used for web authentication and deployed in the transit VNet. -* A dedicated Databricks workspace, called Web Auth workspace, is used for web authentication traffic. This workspace is configured with the sub resource **browser_authentication** and deployed using subnets in the transit VNet. +* A dedicated Databricks workspace, called Web Auth workspace, is used for web authentication traffic. This workspace is configured with the sub-resource **browser_authentication** and deployed using subnets in the transit VNet. -> **Note** -* A seperate Web Auth workspace is not mandatory but recommended. -* DNS mapping for SSO login callbacks to the Azure Databricks web application can either be managed by the Web Auth workspace or another workspace that is associated with the **browser_authentication** private endpoint. +* A separate Web Auth workspace is not mandatory but recommended. +* DNS mapping for SSO login callbacks to the Azure Databricks web application can be managed by the Web Auth workspace or another workspace associated with the **browser_authentication** private endpoint. ![Azure Databricks with Private Link - Standard deployment](https://github.com/databricks/terraform-provider-databricks/raw/master/docs/images/azure-private-link-standard.png) @@ -37,26 +37,26 @@ This guide uses the following variables: - `rg_dp`: The name of the existing resource group that will contain the Azure Data Plane VNet and the private DNS zone for the Backend private endpoint - `location`: The location for Azure resources -This guide is provided as-is and you can use it as the basis for your custom Terraform module. +This guide is provided as-is, and you can use it as the basis for your custom Terraform module. -To get started with Azure Private Link integration, this guide takes you through the following high-level steps: +This guide takes you through the following high-level steps to set up a workspace with Azure Private Link: - Initialize the required providers - Configure Azure objects - Deploy two Azure VNets with the following subnets: - - Public and private subnets for each Azure Databricks workspace in the Data Plane VNet - - Private Link subnet, in the Data Plane VNet, that will contain the Backend private endpoint - - Private Link subnet, in the Transit VNet, that will contain the following private endpoints: - - Frontend private endpoint - - Web auth private endpoint - - Configure the private DNS zone in order to add: - - DNS A record to map connection for workspace access - - DNS A record(s) for web_auth + - Public and private subnets for each Azure Databricks workspace in the Data Plane VNet + - Private Link subnet in the Data Plane VNet that will contain the Backend private endpoint + - Private Link subnet in the Transit VNet that will contain the following private endpoints: + - Frontend private endpoint + - Web auth private endpoint + - Configure the private DNS zone to add: + - DNS A record to map connection for workspace access + - DNS A record(s) for web_auth - Workspace Creation ## Provider initialization -Initialize provider +Initialize provider ```hcl terraform { @@ -127,13 +127,13 @@ locals { * In the Transit resource group: 1. Create a Transit VNet 2. Create a private DNS zone - 3. Create Web Auth Databricks workspace with the sub resource **browser_authentication** - 4. Create a Frontend private endpoint with the sub resource **databricks_ui_api** + 3. Create Web Auth Databricks workspace with the sub-resource **browser_authentication** + 4. Create a Frontend private endpoint with the sub-resource **databricks_ui_api** * In the Data Plane resource group: 1. Create a Data Plane VNet 2. Create a private DNS zone 3. Create a new Azure Databricks workspace - 4. Create a Backend private endpoint with the sub resource **databricks_ui_api** + 4. Create a Backend private endpoint with the sub-resource **databricks_ui_api** ## Deploy Transit resources @@ -200,7 +200,7 @@ resource "azurerm_private_dns_zone_virtual_network_link" "transitdnszonevnetlink } ``` -3. Create Web Auth Databricks workspace with the sub resource **browser_authentication**: +3. Create Web Auth Databricks workspace with the sub-resource **browser_authentication**: ```hcl resource "azurerm_subnet" "transit_public" { @@ -311,7 +311,7 @@ resource "azurerm_databricks_workspace" "web_auth_workspace" { } ``` -4. Create a Frontend private endpoint with the sub resource **databricks_ui_api**: +4. Create a Frontend private endpoint with the sub-resource **databricks_ui_api**: ```hcl resource "azurerm_private_endpoint" "front_pe" { @@ -494,7 +494,7 @@ resource "azurerm_databricks_workspace" "app_workspace" { } ``` -4. Create a Backend private endpoint with the sub resource **databricks_ui_api**: +4. Create a Backend private endpoint with the sub-resource **databricks_ui_api**: ```hcl resource "azurerm_private_endpoint" "app_dpcp" { @@ -517,6 +517,6 @@ resource "azurerm_private_endpoint" "app_dpcp" { } ``` --> **Note** -- The public network access to the workspace is disabled. You can access the workspace only through the private connectivity to the on-premises user environment. For testing purposes, you can deploy an Azure VM in the Transit VNet in order to test the frontend connectivity. +-> **Note** +- The public network access to the workspace is disabled. You can access the workspace only through private connectivity to the on-premises user environment. For testing purposes, you can deploy an Azure VM in the Transit VNet to test the frontend connectivity. - If you wish to deploy a test VM in the Data Plane VNet, you should configure a peering connection between the two VNets diff --git a/docs/guides/azure-workspace.md b/docs/guides/azure-workspace.md index 6660675b6a..de80891bf6 100644 --- a/docs/guides/azure-workspace.md +++ b/docs/guides/azure-workspace.md @@ -68,7 +68,7 @@ output "databricks_host" { ### Data resources and Authentication is not configured errors -*In Terraform 0.13 and later*, data resources have the same dependency resolution behavior [as defined for managed resources](https://www.terraform.io/docs/language/resources/behavior.html#resource-dependencies). Most data resources make an API call to a workspace. If a workspace doesn't exist yet, `default auth: cannot configure default credentials` error is raised. To work around this issue and guarantee a proper lazy authentication with data resources, you should add `depends_on = [azurerm_databricks_workspace.this]` to the body. This issue doesn't occur if workspace is created *in one module* and resources [within the workspace](workspace-management.md) are created *in another*. We do not recommend using Terraform 0.12 and earlier, if your usage involves data resources. +*In Terraform 0.13 and later*, data resources have the same dependency resolution behavior [as defined for managed resources](https://www.terraform.io/docs/language/resources/behavior.html#resource-dependencies). Most data resources make an API call to a workspace. If a workspace doesn't exist yet, `default auth: cannot configure default credentials` error is raised. To work around this issue and guarantee a proper lazy authentication with data resources, add `depends_on = [azurerm_databricks_workspace.this]` to the body. This issue doesn't occur if a workspace is created *in one module* and resources [within the workspace](workspace-management.md) are created *in another*. We do not recommend using Terraform 0.12 and earlier if your usage involves data resources. ```hcl data "databricks_current_user" "me" { diff --git a/docs/guides/experimental-exporter.md b/docs/guides/experimental-exporter.md index 4066433bc8..fc613b91e0 100644 --- a/docs/guides/experimental-exporter.md +++ b/docs/guides/experimental-exporter.md @@ -3,15 +3,15 @@ page_title: "Experimental resource exporter" --- # Experimental resource exporter --> **Note** This tooling is experimental and provided as is. It has an evolving interface, which may change or be removed in future versions of the provider. +-> **Note** This tooling is experimental and provided as is. It has an evolving interface, which may change or be removed in future provider versions. -> **Note** Use the same user who did the exporting to import the exported templates. Otherwise, it could cause changes in the ownership of the objects. -Generates `*.tf` files for Databricks resources as well as `import.sh` that is used to import objects into the Terraform state. Available as part of provider binary. The only possible way to authenticate is through [environment variables](../index.md#Environment-variables). It's best used when you need to quickly export Terraform configuration for an existing Databricks workspace. After generating the configuration, we strongly recommend manually reviewing all created files. +Generates `*.tf` files for Databricks resources together with `import.sh` that is used to import objects into the Terraform state. Available as part of provider binary. The only way to authenticate is through [environment variables](../index.md#Environment-variables). It's best used when you need to export Terraform configuration for an existing Databricks workspace quickly. After generating the configuration, we strongly recommend manually reviewing all created files. ## Example Usage -After downloading the [latest released binary](https://github.com/databricks/terraform-provider-databricks/releases), unpack it and place it in the same folder. In fact, you may have already downloaded this binary - check the `.terraform` folder of any state directory, where you've used the `databricks` provider. It could also be in your plugin cache `~/.terraform.d/plugins/registry.terraform.io/databricks/databricks/*/*/terraform-provider-databricks`. Here's the tool in action: +After downloading the [latest released binary](https://github.com/databricks/terraform-provider-databricks/releases), unpack it and place it in the same folder. You may have already downloaded this binary - check the `.terraform` folder of any state directory where you've used the `databricks` provider. It could also be in your plugin cache `~/.terraform.d/plugins/registry.terraform.io/databricks/databricks/*/*/terraform-provider-databricks`. Here's the tool in action: [![asciicast](https://asciinema.org/a/Rv8ZFJQpfrfp6ggWddjtyXaOy.svg)](https://asciinema.org/a/Rv8ZFJQpfrfp6ggWddjtyXaOy) @@ -21,77 +21,77 @@ Exporter can also be used in a non-interactive mode: export DATABRICKS_HOST=... export DATABRICKS_TOKEN=... ./terraform-provider-databricks exporter -skip-interactive \ - -services=groups,secrets,access,compute,users,jobs,storage \ - -listing=jobs,compute \ - -last-active-days=90 \ - -debug + -services=groups,secrets,access,compute,users,jobs,storage \ + -listing=jobs,compute \ + -last-active-days=90 \ + -debug ``` ## Argument Reference -!> **Warning** This tooling was only extensively tested with administrator privileges. +!> **Warning** This tooling was only extensively tested with administrator privileges. -All arguments are optional and they tune what code is being generated. +All arguments are optional, and they tune what code is being generated. * `-directory` - Path to a directory, where `*.tf` and `import.sh` files would be written. By default, it's set to the current working directory. -* `-module` - Name of module in Terraform state, that would affect reference resolution and prefixes for generated commands in `import.sh`. +* `-module` - Name of module in Terraform state that would affect reference resolution and prefixes for generated commands in `import.sh`. * `-last-active-days` - Items older than `-last-active-days` won't be imported. By default, the value is set to 3650 (10 years). Has an effect on listing [databricks_cluster](../resources/cluster.md) and [databricks_job](../resources/job.md) resources. -* `-services` - Comma-separated list of services to import. By default, all services are imported. -* `-listing` - Comma-separated list of services to be listed and further passed on for importing. `-services` parameter controls which transitive dependencies will be processed. We recommend limiting with `-listing` more often, than with `-services`. +* `-services` - Comma-separated list of services to import. By default, all services are imported. +* `-listing` - Comma-separated list of services to be listed and further passed on for importing. `-services` parameter controls which transitive dependencies will be processed. We recommend limiting with `-listing` more often than with `-services`. * `-match` - Match resource names during listing operation. This filter applies to all resources that are getting listed, so if you want to import all dependencies of just one cluster, specify `-match=autoscaling -listing=compute`. By default, it is empty, which matches everything. -* `-mounts` - List DBFS mount points, which is an extremely slow operation and would not trigger unless explicitly specified. +* `-mounts` - List DBFS mount points, an extremely slow operation that would not trigger unless explicitly specified. * `-generateProviderDeclaration` - the flag that toggles the generation of `databricks.tf` file with the declaration of the Databricks Terraform provider that is necessary for Terraform versions since Terraform 0.13 (disabled by default). * `-prefix` - optional prefix that will be added to the name of all exported resources - that's useful for exporting resources from multiple workspaces for merging into a single one. * `-skip-interactive` - optionally run in a non-interactive mode. * `-includeUserDomains` - optionally include domain name into generated resource name for `databricks_user` resource. * `-importAllUsers` - optionally include all users and service principals even if they are only part of the `users` group. -* `-incremental` - experimental option for incremental export of modified resources and merging with existing resources. *Please note that only a limited set of resources (notebooks, SQL queries/dashboards/alerts, ...) provides information about the last modified date - all other resources will be re-exported again! Also, it's not possible to detect the deletion of the resources, so you will need to do periodic full export if resources are deleted!* **Requires** `-updated-since` option if no `exporter-run-stats.json` file exists in the output directory. -* `-updated-since` - timestamp (in ISO8601 format supported by Go language) for exporting of resources modified since a given timestamp. I.e. `2023-07-24T00:00:00Z`. If not specified, the exporter will try to load the last run timestamp from the `exporter-run-stats.json` file generated during the export, and use it. -* `-notebooksFormat` - optional format for exporting of notebooks. Supported values are `SOURCE` (default), `DBC`, `JUPYTER`. This could be used to export notebooks with embedded dashboards. -* `-noformat` - optionally disable the execution of `terraform fmt` on the exported files (enabled by default). +* `-incremental` - experimental option for incremental export of modified resources and merging with existing resources. *Please note that only a limited set of resources (notebooks, SQL queries/dashboards/alerts, ...) provides information about the last modified date - all other resources will be re-exported again! Also, it's impossible to detect the deletion of the resources, so you must do periodic full export if resources are deleted!* **Requires** `-updated-since` option if no `exporter-run-stats.json` file exists in the output directory. +* `-updated-since` - timestamp (in ISO8601 format supported by Go language) for exporting of resources modified since a given timestamp. I.e., `2023-07-24T00:00:00Z`. If not specified, the exporter will try to load the last run timestamp from the `exporter-run-stats.json` file generated during the export and use it. +* `-notebooksFormat` - optional format for exporting of notebooks. Supported values are `SOURCE` (default), `DBC`, `JUPYTER`. This option could be used to export notebooks with embedded dashboards. +* `-noformat` - optionally turn off the execution of `terraform fmt` on the exported files (enabled by default). ## Services -Services are just logical groups of resources used for filtering and organization in files written in `-directory`. All resources are globally sorted by their resource name, which technically allows you to use generated files for compliance purposes. Nevertheless, managing the entire Databricks workspace with Terraform is the preferred way. With the exception of notebooks and possibly libraries, which may have their own CI/CD processes. +Services are just logical groups of resources used for filtering and organization in files written in `-directory`. All resources are globally sorted by their resource name, which allows you to use generated files for compliance purposes. Nevertheless, managing the entire Databricks workspace with Terraform is the preferred way. Except for notebooks and possibly libraries, which may have their own CI/CD processes. * `access` - [databricks_permissions](../resources/permissions.md), [databricks_instance_profile](../resources/instance_profile.md) and [databricks_ip_access_list](../resources/ip_access_list.md). * `compute` - **listing** [databricks_cluster](../resources/cluster.md). Includes [cluster policies](../resources/cluster_policy.md). * `directories` - **listing** [databricks_directory](../resources/directory.md) * `dlt` - **listing** [databricks_pipeline](../resources/pipeline.md) * `groups` - [databricks_group](../data-sources/group.md) with [membership](../resources/group_member.md) and [data access](../resources/group_instance_profile.md). -* `jobs` - **listing** [databricks_job](../resources/job.md). Usually, there are more automated jobs than interactive clusters, so they get their own file in this tool's output. +* `jobs` - **listing** [databricks_job](../resources/job.md). * `mlflow-webhooks` - **listing** [databricks_mlflow_webhook](../resources/mlflow_webhook.md). * `model-serving` - **listing** [databricks_model_serving](../resources/model_serving.md). * `mounts` - **listing** works only in combination with `-mounts` command-line option. * `notebooks` - **listing** [databricks_notebook](../resources/notebook.md) and [databricks_workspace_file](../resources/workspace_file.md) * `pools` - **listing** [instance pools](../resources/instance_pool.md). * `repos` - **listing** [databricks_repo](../resources/repo.md) -* `secrets` - **listing** [databricks_secret_scope](../resources/secret_scope.md) along with [keys](../resources/secret.md) and [ACLs](../resources/secret_acl.md). +* `secrets` - **listing** [databricks_secret_scope](../resources/secret_scope.md) along with [keys](../resources/secret.md) and [ACLs](../resources/secret_acl.md). * `sql-alerts` - **listing** [databricks_sql_alert](../resources/sql_alert.md). * `sql-dashboards` - **listing** [databricks_sql_dashboard](../resources/sql_dashboard.md) along with associated [databricks_sql_widget](../resources/sql_widget.md) and [databricks_sql_visualization](../resources/sql_visualization.md) * `sql-dashboards` - **listing** [databricks_sql_dashboard](../resources/sql_dashboard.md) along with associated [databricks_sql_widget](../resources/sql_widget.md) and [databricks_sql_visualization](../resources/sql_visualization.md). * `sql-endpoints` - **listing** [databricks_sql_endpoint](../resources/sql_endpoint.md) along with [databricks_sql_global_config](../resources/sql_global_config.md) * `sql-queries` - **listing** [databricks_sql_query](../resources/sql_query.md) * `storage` - any referenced [databricks_dbfs_file](../resources/dbfs_file.md) will be downloaded locally and properly arranged into terraform state. -* `users` - [databricks_user](../resources/user.md) and [databricks_service_principal](../resources/service_principal.md) are written to their own file, simply because of their amount. If you use SCIM provisioning, the only use case for importing `users` service is to migrate workspaces. +* `users` - [databricks_user](../resources/user.md) and [databricks_service_principal](../resources/service_principal.md) are written to their own file, simply because of their amount. If you use SCIM provisioning, migrating workspaces is the only use case for importing `users` service. * `workspace` - [databricks_workspace_conf](../resources/workspace_conf.md) and [databricks_global_init_script](../resources/global_init_script.md) ## Secrets -For security reasons, [databricks_secret](../resources/secret.md) cannot contain actual plaintext secrets. Importer will create a variable in `vars.tf`, that would have the same name as the secret. You are supposed to [fill in the value of the secret](https://blog.gruntwork.io/a-comprehensive-guide-to-managing-secrets-in-your-terraform-code-1d586955ace1#0e7d) after that. +For security reasons, [databricks_secret](../resources/secret.md) cannot contain actual plaintext secrets. Importer will create a variable in `vars.tf`, with the same name as the secret. You are supposed to [fill in the value of the secret](https://blog.gruntwork.io/a-comprehensive-guide-to-managing-secrets-in-your-terraform-code-1d586955ace1#0e7d) after that. ## Parallel execution -To speed up export, Terraform Exporter performs many operations, such as listing & actual data exporting, in parallel using Goroutines. There are built-in defaults controlling the parallelism, but it's also possible to tune some parameters using environment variables specific to the exporter: +To speed up export, Terraform Exporter performs many operations, such as listing & actual data exporting, in parallel using Goroutines. Built-in defaults are controlling the parallelism, but it's also possible to tune some parameters using environment variables specific to the exporter: * `EXPORTER_WS_LIST_PARALLELISM` (default: `5`) controls how many Goroutines are used to perform parallel listing of Databricks Workspace objects (notebooks, directories, workspace files, ...). -* `EXPORTER_DIRECTORIES_CHANNEL_SIZE` (default: `100000`) controls the capacity of the channel that is used when listing workspace objects. Please make sure that this value is big enough (bigger than the number of directories in the workspace, default value should be ok for most cases), otherwise, there is a chance of deadlock. -* `EXPORTER_PARALLELISM_NNN` - number of Goroutines used to process resources of a specific type (replace `NNN` with the exact resource name, for example, `EXPORTER_PARALLELISM_databricks_notebook=10` sets the number of Goroutines for `databricks_notebook` resource to `10`). Defaults for some resources are defined by the `goroutinesNumber` map in `exporter/context.go`, or equal to `2` if there is no value there. *Don't increase default values too much to avoid REST API throttling!* +* `EXPORTER_DIRECTORIES_CHANNEL_SIZE` (default: `100000`) controls the channel's capacity when listing workspace objects. Please ensure that this value is big enough (greater than the number of directories in the workspace; default value should be ok for most cases); otherwise, there is a chance of deadlock. +* `EXPORTER_PARALLELISM_NNN` - number of Goroutines used to process resources of a specific type (replace `NNN` with the exact resource name, for example, `EXPORTER_PARALLELISM_databricks_notebook=10` sets the number of Goroutines for `databricks_notebook` resource to `10`). Defaults for some resources are defined by the `goroutinesNumber` map in `exporter/context.go` or equal to `2` if there is no value. *Don't increase default values too much to avoid REST API throttling!* ## Support Matrix -Exporter aims to generate HCL code for the most of resources within the Databricks workspace: +Exporter aims to generate HCL code for most of the resources within the Databricks workspace: | Resource | Generated code | Incremental | | --- | --- | --- | @@ -136,4 +136,3 @@ Exporter aims to generate HCL code for the most of resources within the Databric | [databricks_user_role](../resources/user_role.md) | Yes | No | | [databricks_workspace_conf](../resources/workspace_conf.md) | Yes (partial) | No | | [databricks_workspace_file](../resources/workspace_file.md) | Yes | Yes | - diff --git a/docs/guides/gcp-private-service-connect-workspace.md b/docs/guides/gcp-private-service-connect-workspace.md index 022b70c199..0d838ff9b5 100644 --- a/docs/guides/gcp-private-service-connect-workspace.md +++ b/docs/guides/gcp-private-service-connect-workspace.md @@ -4,21 +4,21 @@ page_title: "Provisioning Databricks on Google Cloud with Private Service Connec # Provisioning Databricks workspaces on GCP with Private Service Connect -Secure a workspace with private connectivity and mitigate data exfiltration risks by [enabling Google Private Service Connect (PSC) on the workspace](https://docs.gcp.databricks.com/administration-guide/cloud-configurations/gcp/private-service-connect.html). This guide assumes that you are already familiar with Hashicorp Terraform and provisioned some of the Google Compute Cloud infrastructure with it. +Secure a workspace with private connectivity and mitigate data exfiltration risks by [enabling Google Private Service Connect (PSC) on the workspace](https://docs.gcp.databricks.com/administration-guide/cloud-configurations/gcp/private-service-connect.html). This guide assumes that you are already familiar with Hashicorp Terraform and provisioned some of the Google Compute Cloud infrastructure with it. ## Creating a GCP service account for Databricks Provisioning and Authenticate with Databricks account API -To work with Databricks in GCP in an automated way, please create a service account and manually add it in the [Accounts Console](https://accounts.gcp.databricks.com/users) as an account admin. Databricks account-level APIs can only be called by account owners and account admins, and can only be authenticated using Google-issued OIDC tokens. The simplest way to do this would be via [Google Cloud CLI](https://cloud.google.com/sdk/gcloud). Please refer to [Provisioning Databricks workspaces on GCP](gcp_workspace.md) for details. +To work with Databricks in GCP in an automated way, please create a service account and manually add it in the [Accounts Console](https://accounts.gcp.databricks.com/users) as an account admin. Databricks account-level APIs can only be called by account owners and account admins, and can only be authenticated using Google-issued OIDC tokens. The simplest way to do this would be via [Google Cloud CLI](https://cloud.google.com/sdk/gcloud). For details, please refer to [Provisioning Databricks workspaces on GCP](gcp_workspace.md). ## Creating a VPC network -The very first step is VPC creation with necessary resources. Please consult [main documentation page](https://docs.gcp.databricks.com/administration-guide/cloud-configurations/gcp/customer-managed-vpc.html) for **the most complete and up-to-date details on networking**. A GCP VPC is registered as [databricks_mws_networks](../resources/mws_networks.md) resource. +The very first step is VPC creation with the necessary resources. Please consult [main documentation page](https://docs.gcp.databricks.com/administration-guide/cloud-configurations/gcp/customer-managed-vpc.html) for **the most complete and up-to-date details on networking**. A GCP VPC is registered as [databricks_mws_networks](../resources/mws_networks.md) resource. -To enable [back-end Private Service Connect (data plane to control plane)](https://docs.gcp.databricks.com/administration-guide/cloud-configurations/gcp/private-service-connect.html#two-private-service-connect-options), configure network with the two back-end VPC endpoints: +To enable [back-end Private Service Connect (data plane to control plane)](https://docs.gcp.databricks.com/administration-guide/cloud-configurations/gcp/private-service-connect.html#two-private-service-connect-options), configure the network with the two back-end VPC endpoints: - Back-end VPC endpoint for [Secure cluster connectivity](https://docs.gcp.databricks.com/security/secure-cluster-connectivity.html) relay - Back-end VPC endpoint for REST APIs --> Note If you want to implement the front-end VPC endpoint as well for the connections from users to to the Databricks web application, REST API, and Databricks Connect API over a Virtual Private Cloud (VPC) endpoint, use the transit (bastion) VPC. Once the front-end endpoint is created, use the databricks_mws_private_access_settings resource to control which VPC endpoints can connect to the UI or API of any workspace that attaches this private access settings object. +-> Note: If you want to implement the front-end VPC endpoint as well for the connections from users to to the Databricks web application, REST API, and Databricks Connect API over a Virtual Private Cloud (VPC) endpoint, use the transit (bastion) VPC. Once the front-end endpoint is created, use the databricks_mws_private_access_settings resource to control which VPC endpoints can connect to the UI or API of any workspace that attaches this private access settings object. ```hcl resource "google_compute_network" "dbx_private_vpc" { @@ -104,7 +104,7 @@ For a workspace to support any of the Private Service Connect connectivity scena Code that creates workspaces and code that [manages workspaces](workspace-management.md) must be in separate terraform modules to avoid common confusion between `provider = databricks.accounts` and `provider = databricks.created_workspace`. This is why we specify `databricks_host` and `databricks_token` outputs, which have to be used in the latter modules. --> **Note** If you experience technical difficulties with rolling out resources in this example, please make sure that [environment variables](../index.md#environment-variables) don't [conflict with other](../index.md#empty-provider-block) provider block attributes. When in doubt, please run `TF_LOG=DEBUG terraform apply` to enable [debug mode](https://www.terraform.io/docs/internals/debugging.html) through the [`TF_LOG`](https://www.terraform.io/docs/cli/config/environment-variables.html#tf_log) environment variable. Look specifically for `Explicit and implicit attributes` lines, that should indicate authentication attributes used. The other common reason for technical difficulties might be related to missing `alias` attribute in `provider "databricks" {}` blocks or `provider` attribute in `resource "databricks_..." {}` blocks. Please make sure to read [`alias`: Multiple Provider Configurations](https://www.terraform.io/docs/language/providers/configuration.html#alias-multiple-provider-configurations) documentation article. +-> **Note** If you experience technical difficulties with rolling out resources in this example, please make sure that [environment variables](../index.md#environment-variables) don't [conflict with other](../index.md#empty-provider-block) provider block attributes. When in doubt, please run `TF_LOG=DEBUG terraform apply` to enable [debug mode](https://www.terraform.io/docs/internals/debugging.html) through the [`TF_LOG`](https://www.terraform.io/docs/cli/config/environment-variables.html#tf_log) environment variable. Look specifically for `Explicit and implicit attributes` lines, which should indicate authentication attributes used. The other common reason for technical difficulties might be related to missing `alias` attribute in `provider "databricks" {}` blocks or `provider` attribute in `resource "databricks_..." {}` blocks. Please make sure to read [`alias`: Multiple Provider Configurations](https://www.terraform.io/docs/language/providers/configuration.html#alias-multiple-provider-configurations) documentation article. ```hcl resource "databricks_mws_private_access_settings" "pas" { @@ -153,7 +153,7 @@ output "databricks_token" { ### Data resources and Authentication is not configured errors -*In Terraform 0.13 and later*, data resources have the same dependency resolution behavior [as defined for managed resources](https://www.terraform.io/docs/language/resources/behavior.html#resource-dependencies). Most data resources make an API call to a workspace. If a workspace doesn't exist yet, `default auth: cannot configure default credentials` error is raised. To work around this issue and guarantee a proper lazy authentication with data resources, you should add `depends_on = [databricks_mws_workspaces.this]` to the body. This issue doesn't occur if workspace is created *in one module* and resources [within the workspace](workspace-management.md) are created *in another*. We do not recommend using Terraform 0.12 and earlier, if your usage involves data resources. +*In Terraform 0.13 and later*, data resources have the same dependency resolution behavior [as defined for managed resources](https://www.terraform.io/docs/language/resources/behavior.html#resource-dependencies). Most data resources make an API call to a workspace. If a workspace doesn't exist yet, `default auth: cannot configure default credentials` error is raised. To work around this issue and guarantee proper lazy authentication with data resources, you should add `depends_on = [databricks_mws_workspaces.this]` to the body. This issue doesn't occur if workspace is created *in one module* and resources [within the workspace](workspace-management.md) are created *in another*. We do not recommend using Terraform 0.12 and earlier if your usage involves data resources. ```hcl data "databricks_current_user" "me" { diff --git a/docs/guides/gcp-workspace.md b/docs/guides/gcp-workspace.md index 571d40cbed..1c302efd50 100644 --- a/docs/guides/gcp-workspace.md +++ b/docs/guides/gcp-workspace.md @@ -1,5 +1,5 @@ --- -page_title: "Provisioning Databricks workspaces on GCP" +page_title: "Provisioning Databricks workspaces on GCP." --- # Provisioning Databricks workspaces on GCP @@ -8,7 +8,7 @@ You can provision multiple Databricks workspaces with Terraform. ## Creating a GCP service account for Databricks Provisioning -This guide assumes that you are already familiar with Hashicorp Terraform and provisioned some of the Google Compute Cloud infrastructure with it. To work with Databricks in GCP in an automated way, please create a service account and manually add it in the [Accounts Console](https://accounts.gcp.databricks.com/users) as an account admin. You can use the following Terraform configuration to create a Service Account for Databricks Provisioning, which can be impersonated by a list of principals defined in `delegate_from` variable. Service Account would be automatically assigned to the newly created Databricks Workspace Creator custom role +This guide assumes that you are already familiar with Hashicorp Terraform and have provisioned some of the Google Compute Cloud infrastructure. To work with Databricks in GCP in an automated way, please create a service account and manually add it to the [Accounts Console](https://accounts.gcp.databricks.com/users) as an account admin. You can use the following Terraform configuration to create a Service Account for Databricks Provisioning, which can be impersonated by a list of principals defined in `delegate_from` variable. Service Account would be automatically assigned to the newly created Databricks Workspace Creator custom role: ```hcl variable "prefix" {} @@ -84,13 +84,13 @@ resource "google_project_iam_member" "sa2_can_create_workspaces" { } ``` -After you’ve added Service Account to Databricks Accounts Console, please copy its name into `databricks_google_service_account` variable. If you prefer environment variables - `DATABRICKS_GOOGLE_SERVICE_ACCOUNT` is the one you’ll use instead. Please also copy Account ID into `databricks_account_id` variable. +After you’ve added the Service Account to Databricks Accounts Console, please copy its name into `databricks_google_service_account` variable. If you prefer environment variables - `DATABRICKS_GOOGLE_SERVICE_ACCOUNT` is the one you’ll use instead. Please also copy the Account ID into `databricks_account_id` variable. ## Authenticate with Databricks account API -Databricks account-level APIs can only be called by account owners and account admins, and can only be authenticated using Google-issued OIDC tokens. The simplest way to do this would be via [Google Cloud CLI](https://cloud.google.com/sdk/gcloud). The `gcloud` command is available after installing the SDK. Then run the following commands +Databricks account-level APIs can only be called by account owners and account admins and can only be authenticated using Google-issued OIDC tokens. The simplest way to do this would be via [Google Cloud CLI](https://cloud.google.com/sdk/gcloud). The `gcloud` command is available after installing the SDK. Then run the following commands: -* `gcloud auth application-default login` to authorise your user with Google Cloud Platform. (If you want to use your [service account's credentials instead](https://cloud.google.com/docs/authentication/provide-credentials-adc#local-key), set the environment variable `GOOGLE_APPLICATION_CREDENTIALS` to the path of the JSON file that contains your service account key) +* `gcloud auth application-default login` to authorize your user with Google Cloud Platform. (If you want to use your [service account's credentials instead](https://cloud.google.com/docs/authentication/provide-credentials-adc#local-key), set the environment variable `GOOGLE_APPLICATION_CREDENTIALS` to the path of the JSON file that contains your service account key) * `terraform init` to load Google and Databricks Terraform providers. * `terraform apply` to apply the configuration changes. Terraform will use your credential to impersonate the service account specified in `databricks_google_service_account` to call the Databricks account-level API. @@ -148,7 +148,7 @@ resource "random_string" "suffix" { ## Creating a VPC -The very first step is VPC creation with necessary resources. Please consult [main documentation page](https://docs.gcp.databricks.com/administration-guide/cloud-configurations/gcp/customer-managed-vpc.html) for **the most complete and up-to-date details on networking**. A GCP VPC is registered as [databricks_mws_networks](../resources/mws_networks.md) resource. +The very first step is VPC creation with the necessary resources. Please consult [main documentation page](https://docs.gcp.databricks.com/administration-guide/cloud-configurations/gcp/customer-managed-vpc.html) for **the most complete and up-to-date details on networking**. A GCP VPC is registered as [databricks_mws_networks](../resources/mws_networks.md) resource. ```hcl resource "google_compute_network" "dbx_private_vpc" { @@ -208,7 +208,7 @@ Once [the VPC](#creating-a-vpc) is set up, you can create Databricks workspace t Code that creates workspaces and code that [manages workspaces](workspace-management.md) must be in separate terraform modules to avoid common confusion between `provider = databricks.accounts` and `provider = databricks.created_workspace`. This is why we specify `databricks_host` and `databricks_token` outputs, which have to be used in the latter modules. --> **Note** If you experience technical difficulties with rolling out resources in this example, please make sure that [environment variables](../index.md#environment-variables) don't [conflict with other](../index.md#empty-provider-block) provider block attributes. When in doubt, please run `TF_LOG=DEBUG terraform apply` to enable [debug mode](https://www.terraform.io/docs/internals/debugging.html) through the [`TF_LOG`](https://www.terraform.io/docs/cli/config/environment-variables.html#tf_log) environment variable. Look specifically for `Explicit and implicit attributes` lines, that should indicate authentication attributes used. The other common reason for technical difficulties might be related to missing `alias` attribute in `provider "databricks" {}` blocks or `provider` attribute in `resource "databricks_..." {}` blocks. Please make sure to read [`alias`: Multiple Provider Configurations](https://www.terraform.io/docs/language/providers/configuration.html#alias-multiple-provider-configurations) documentation article. +-> **Note** If you experience technical difficulties with rolling out resources in this example, please make sure that [environment variables](../index.md#environment-variables) don't [conflict with other](../index.md#empty-provider-block) provider block attributes. When in doubt, please run `TF_LOG=DEBUG terraform apply` to enable [debug mode](https://www.terraform.io/docs/internals/debugging.html) through the [`TF_LOG`](https://www.terraform.io/docs/cli/config/environment-variables.html#tf_log) environment variable. Look specifically for `Explicit and implicit attributes` lines, indicating authentication attributes used. The other common reason for technical difficulties might be related to missing `alias` attribute in `provider "databricks" {}` blocks or `provider` attribute in `resource "databricks_..." {}` blocks. Please make sure to read [`alias`: Multiple Provider Configurations](https://www.terraform.io/docs/language/providers/configuration.html#alias-multiple-provider-configurations) documentation article. ```hcl resource "databricks_mws_workspaces" "this" { @@ -248,7 +248,7 @@ output "databricks_token" { ### Data resources and Authentication is not configured errors -*In Terraform 0.13 and later*, data resources have the same dependency resolution behavior [as defined for managed resources](https://www.terraform.io/docs/language/resources/behavior.html#resource-dependencies). Most data resources make an API call to a workspace. If a workspace doesn't exist yet, `default auth: cannot configure default credentials` error is raised. To work around this issue and guarantee a proper lazy authentication with data resources, you should add `depends_on = [databricks_mws_workspaces.this]` to the body. This issue doesn't occur if workspace is created *in one module* and resources [within the workspace](workspace-management.md) are created *in another*. We do not recommend using Terraform 0.12 and earlier, if your usage involves data resources. +*In Terraform 0.13 and later*, data resources have the same dependency resolution behavior [as defined for managed resources](https://www.terraform.io/docs/language/resources/behavior.html#resource-dependencies). Most data resources make an API call to a workspace. If a workspace doesn't exist yet, `default auth: cannot configure default credentials` error is raised. To work around this issue and guarantee proper lazy authentication with data resources, you should add `depends_on = [databricks_mws_workspaces.this]` to the body. This issue doesn't occur if a workspace is created *in one module* and resources [within the workspace](workspace-management.md) are created *in another*. We do not recommend using Terraform 0.12 and earlier if your usage involves data resources. ```hcl data "databricks_current_user" "me" { @@ -267,4 +267,4 @@ provider "databricks" { } ``` -We assume that you have a terraform module in your project that creats a workspace (using [Databricks Workspace](#creating-a-databricks-workspace) section) and you named it as `dbx_gcp` while calling it in the **main.tf** file of your terraform project. And `workspace_url` and `token_value` are the output attributes of that module. This provider configuration will allow you to use the generated token during workspace creation to authenticate to the created workspace. +We assume that you have a terraform module in your project that creates a workspace (using [Databricks Workspace](#creating-a-databricks-workspace) section), and you named it as `dbx_gcp` while calling it in the **main.tf** file of your terraform project. And `workspace_url` and `token_value` are the output attributes of that module. This provider configuration will allow you to use the generated token to authenticate to the created workspace during workspace creation. diff --git a/docs/guides/passthrough-cluster-per-user.md b/docs/guides/passthrough-cluster-per-user.md index 08514bbd82..eafa13fdfd 100644 --- a/docs/guides/passthrough-cluster-per-user.md +++ b/docs/guides/passthrough-cluster-per-user.md @@ -4,7 +4,7 @@ page_title: "Dynamic Passthrough Clusters for a Group" # Dynamic Passthrough Clusters -This example addresses a pretty common use-case: data science team, which is managed as a group through SCIM provisioning, needs a collection of individual passthrough [databricks_cluster](../resources/cluster.md), which they should be able to restart. It could simply be achieved by [databricks_group](../data-sources/group.md) and [databricks_user](../data-sources/user.md) data resources to get the list of user names that belong to a group. Terraform's `for_each` meta-attribute helps to do this easily. +This example addresses a pretty common use-case: data science team, which is managed as a group through SCIM provisioning, needs a collection of individual passthrough [databricks_cluster](../resources/cluster.md), which they should be able to restart. It could be achieved by using [databricks_group](../data-sources/group.md) and [databricks_user](../data-sources/user.md) data sources to get the list of user names Terraform's `for_each` meta-attribute helps to do this easily. ```hcl data "databricks_group" "dev" { @@ -17,7 +17,7 @@ data "databricks_user" "dev" { } ``` -Once we have a specific list of user resources, we could proceed creating clusters and permissions with `for_each = data.databricks_user.dev` to ensure it's done for each user: +Once we have a specific list of user resources, we could proceed with creating clusters and permissions with `for_each = data.databricks_user.dev` to ensure it's done for each user: ```hcl data "databricks_spark_version" "latest" {} diff --git a/docs/guides/troubleshooting.md b/docs/guides/troubleshooting.md index 59a207ebb2..d6d3a32c39 100644 --- a/docs/guides/troubleshooting.md +++ b/docs/guides/troubleshooting.md @@ -7,9 +7,9 @@ page_title: "Troubleshooting Guide" If you have problems with code that uses Databricks Terraform provider, follow these steps to solve them: * Check symptoms and solutions in the [Typical problems](#typical-problems) section below. -* Upgrade provider to the latest version. The bug might have already been fixed. +* Upgrade the provider to the latest version. The bug might have already been fixed. * In case of authentication problems, see the [Data resources and Authentication is not configured errors](#data-resources-and-authentication-is-not-configured-errors) below. -* Collect debug information using following command: +* Collect debug information using the following command: ```sh TF_LOG=DEBUG DATABRICKS_DEBUG_TRUNCATE_BYTES=250000 terraform apply -no-color 2>&1 |tee tf-debug.log @@ -21,11 +21,11 @@ TF_LOG=DEBUG DATABRICKS_DEBUG_TRUNCATE_BYTES=250000 terraform apply -no-color 2> ### Data resources and Authentication is not configured errors -*In Terraform 0.13 and later*, data resources have the same dependency resolution behavior [as defined for managed resources](https://www.terraform.io/docs/language/resources/behavior.html#resource-dependencies). Most data resources make an API call to a workspace. If a workspace doesn't exist yet, `default auth: cannot configure default credentials` error is raised. To work around this issue and guarantee a proper lazy authentication with data resources, you should add `depends_on = [azurerm_databricks_workspace.this]` or `depends_on = [databricks_mws_workspaces.this]` to the body. This issue doesn't occur if workspace is created *in one module* and resources [within the workspace](guides/workspace-management.md) are created *in another*. We do not recommend using Terraform 0.12 and earlier, if your usage involves data resources. +*In Terraform 0.13 and later*, data resources have the same dependency resolution behavior [as defined for managed resources](https://www.terraform.io/docs/language/resources/behavior.html#resource-dependencies). Most data resources make an API call to a workspace. If a workspace doesn't exist yet, `default auth: cannot configure default credentials` error is raised. To work around this issue and guarantee a proper lazy authentication with data resources, you should add `depends_on = [azurerm_databricks_workspace.this]` or `depends_on = [databricks_mws_workspaces.this]` to the body. This issue doesn't occur if a workspace is created *in one module* and resources [within the workspace](guides/workspace-management.md) are created *in another*. We do not recommend using Terraform 0.12 and earlier if your usage involves data resources. ### Multiple Provider Configurations -The most common reason for technical difficulties might be related to missing `alias` attribute in `provider "databricks" {}` blocks or `provider` attribute in `resource "databricks_..." {}` blocks, when using multiple provider configurations. Please make sure to read [`alias`: Multiple Provider Configurations](https://www.terraform.io/docs/language/providers/configuration.html#alias-multiple-provider-configurations) documentation article. +The most common reason for technical difficulties might be related to missing `alias` attribute in `provider "databricks" {}` blocks or `provider` attribute in `resource "databricks_..." {}` blocks when using multiple provider configurations. Please make sure to read [`alias`: Multiple Provider Configurations](https://www.terraform.io/docs/language/providers/configuration.html#alias-multiple-provider-configurations) documentation article. ### Error while installing: registry does not have a provider @@ -35,7 +35,7 @@ registry.terraform.io does not have a provider named registry.terraform.io/hashicorp/databricks ``` -If you notice below error, it might be due to the fact that [required_providers](https://www.terraform.io/docs/language/providers/requirements.html#requiring-providers) block is not defined in *every module*, that uses Databricks Terraform Provider. Create `versions.tf` file with the following contents: +If you notice the below error, it might be because [required_providers](https://www.terraform.io/docs/language/providers/requirements.html#requiring-providers) block is not defined in *every module*that uses Databricks Terraform Provider. Create `versions.tf` file with the following contents: ```hcl # versions.tf @@ -49,30 +49,30 @@ terraform { } ``` -... and copy the file in every module in your codebase. Our recommendation is to skip the `version` field for `versions.tf` file on module level, and keep it only on the environment level. +... and copy the file in every module in your codebase. We recommend skipping the `version` field for `versions.tf` file on the module level and keeping it only on the environment level. ``` ├── environments -│   ├── sandbox -│   │   ├── README.md -│   │   ├── main.tf -│   │   └── versions.tf -│   └── production -│      ├── README.md -│      ├── main.tf -│   └── versions.tf +│ ├── sandbox +│ │ ├── README.md +│ │ ├── main.tf +│ │ └── versions.tf +│ └── production +│ ├── README.md +│ ├── main.tf +│ └── versions.tf └── modules - ├── first-module - │   ├── ... - │   └── versions.tf - └── second-module -    ├── ... -    └── versions.tf + ├── first-module + │ ├── ... + │ └── versions.tf + └── second-module + ├── ... + └── versions.tf ``` ### Error: Failed to install provider -Running the `terraform init` command, you may see `Failed to install provider` error if you didn't check-in [`.terraform.lock.hcl`](https://www.terraform.io/language/files/dependency-lock#lock-file-location) to the source code version control: +Running the `terraform init` command, you may see `Failed to install provider` error if you didn't check in [`.terraform.lock.hcl`](https://www.terraform.io/language/files/dependency-lock#lock-file-location) to the source code version control: ```sh Error: Failed to install provider @@ -84,7 +84,7 @@ You can fix it by following three simple steps: * Replace `databrickslabs/databricks` with `databricks/databricks` in all your `.tf` files with the `python3 -c "$(curl -Ls https://dbricks.co/updtfns)"` command. * Run the `terraform state replace-provider databrickslabs/databricks databricks/databricks` command and approve the changes. See [Terraform CLI](https://www.terraform.io/cli/commands/state/replace-provider) docs for more information. -* Run `terraform init` to verify everything working. +* Run `terraform init` to verify everything is working. The terraform apply command should work as expected now. @@ -93,7 +93,7 @@ Alternatively, you can find the hashes of the last 30 provider versions in [`.te * Copy [`versions-lock.hcl`](https://github.com/databrickslabs/terraform-provider-databricks/blob/v0.6.2/scripts/versions-lock.hcl) to the root folder of your terraform project. * Rename to `terraform.lock.hcl` * Run `terraform init` and verify the provider is installed. -* Be sure to commit the new `.terraform.lock.hcl` file to your source code repository. +* Commit the new `.terraform.lock.hcl` file to your source code repository. ### Error: Failed to query available provider packages @@ -101,37 +101,37 @@ See the same steps as in [Error: Failed to install provider](#error-failed-to-in ### Error: Deployment name cannot be used until a deployment name prefix is defined -You can get this error during provisioning of the Databricks workspace. It arises when you're trying to set `deployment_name` by no deployment prefix was set on the Databricks side (you can't set it yourself). The problem could be solved one of the following methods: +You can get this error during provisioning of the Databricks workspace. It arises when you're trying to set `deployment_name` with no deployment prefix on the Databricks side (you can't set it yourself). The problem could be solved by one of the following methods: -1. Contact your Databricks representative, like Solutions Architect, Customer Success Engineer, Account Executive, or Partner Solutions Architect to set a deployment prefix for your account. +1. Contact your Databricks representatives, like Solutions Architect, Customer Success Engineer, Account Executive, or Partner Solutions Architect, to set a deployment prefix for your account. -1. Comment out the `deployment_name` parameter to create workspace with default URL: `dbc-XXXXXX.cloud.databricks.com`. +1. Comment out the `deployment_name` parameter to create a workspace with the default URL: `dbc-XXXXXX.cloud.databricks.com`. ### Error: 'strconv.ParseInt parsing "...." value out of range' or "Attribute must be a whole number, got N.NNNNe+XX" -This kind of errors happens when the 32-bit version of Databricks Terraform provider is used, usually on Microsoft Windows. To fix the issue you need to switch to use of the 64-bit versions of Terraform and Databricks Terraform provider. +This kind of error happens when the 32-bit version of Databricks Terraform provider is used, usually on Microsoft Windows. To fix the issue, you need to switch to use of the 64-bit versions of Terraform and Databricks Terraform provider. ### Error: cannot create xxxx: HTTP method POST is not supported by this URL -This error may appear when creating Databricks users/groups/service principals on Databricks account level when no `account_id` is specified in the provider's configuration. Make sure that `account_id` is specified & has a correct value. +This error may appear when creating Databricks users/groups/service principals on Databricks account level when no `account_id` is specified in the provider's configuration. Make sure that `account_id` is set and has a correct value. ### Error: oauth-m2m: oidc: parse .well-known: invalid character '<' looking for beginning of value -This similar to previous item. Make sure that `account_id` is specified in the provider configuration & it has a correct value. +This problem is similar to the previous item. Ensure that `account_id` is specified in the provider configuration and it has a correct value. ### Error: cannot create ...: invalid character '<' looking for beginning of value -This error may appear when creating workspace-level objects but the provider is configured to account-level. +This error may appear when creating workspace-level objects, but the provider is configured to account-level. ### Error: Provider registry.terraform.io/databricks/databricks v... does not have a package available for your current platform, windows_386 -This kind of errors happens when the 32-bit version of Databricks Terraform provider is used, usually on Microsoft Windows. To fix the issue you need to switch to use of the 64-bit versions of Terraform and Databricks Terraform provider. +This error happens when the 32-bit version of Databricks Terraform provider is used, usually on Microsoft Windows. To fix the issue, you need to switch to the 64-bit versions of Terraform and Databricks Terraform provider. ### Permanent configuration drifts with `databricks_grants` or `databricks_permissions` -For both resources, each single resource instance should manage all the grants/permissions for a given object. If there are multiple instances set up against an object, they will keep overwriting one another and lead to permanent configuration drifts. +For both resources, each single resource instance should manage all the grants/permissions for a given object. If multiple instances are set up against an object, they will keep overwriting one another, leading to permanent configuration drifts. -To prevent that, you need to have only one resource instance per object, and inside that resource instance use [Dynamic Blocks](https://developer.hashicorp.com/terraform/language/expressions/dynamic-blocks) to specify the variable number of nested grant blocks. +To prevent that, you need to have only one resource instance per object, and inside that resource instance, use [Dynamic Blocks](https://developer.hashicorp.com/terraform/language/expressions/dynamic-blocks) to specify the variable number of nested grant blocks. For example