Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial commit for log delivery resource #343

Merged
merged 8 commits into from
Nov 3, 2020
Merged

Conversation

nfx
Copy link
Contributor

@nfx nfx commented Sep 28, 2020

databricks_mws_log_delivery Resource

This resource configures the delivery of the two supported log types from Databricks workspaces: billable usage logs and audit logs. You cannot delete a log delivery configuration, but you can disable it when you no longer need it. This fact is important because there is a limit to the number of enabled log delivery configurations that you can create for an account. You can create a maximum of two enabled using the account level (without workspace filter) and two that use the workspace filter. There is an additional uniqueness constraint that two enabled configurations cannot share all their fields (not including the config_name). Re-enabling may fail when there's a violation of limit or uniqueness constraints.

Example Usage

End-to-end example of usage and audit log delivery:

resource "aws_s3_bucket" "logdelivery" {
  bucket = "${var.prefix}-logdelivery"
  acl    = "private"
  versioning {
    enabled = false
  }
  force_destroy = true
  tags = merge(var.tags, {
    Name = "${var.prefix}-logdelivery"
  })
}

resource "aws_s3_bucket_public_access_block" "logdelivery" {
  bucket             = aws_s3_bucket.logdelivery.id
  ignore_public_acls = true
}

data "databricks_aws_assume_role_policy" "logdelivery" {
  external_id = var.account_id
  for_log_delivery = true
}

resource "aws_iam_role" "logdelivery" {
  name               = "${var.prefix}-logdelivery"
  description        = "(${var.prefix}) UsageDelivery role"
  assume_role_policy = data.databricks_aws_assume_role_policy.logdelivery.json
  tags               = var.tags
}

data "databricks_aws_bucket_policy" "logdelivery" {
  full_access_role = aws_iam_role.logdelivery.arn
  bucket           = aws_s3_bucket.logdelivery.bucket
}

resource "aws_s3_bucket_policy" "logdelivery" {
  bucket = aws_s3_bucket.logdelivery.id
  policy = data.databricks_aws_bucket_policy.logdelivery.json
}

resource "databricks_mws_credentials" "log_writer" {
    account_id       = var.account_id
    credentials_name = "Usage Delivery"
    role_arn         = aws_iam_role.logdelivery.arn
}

resource "databricks_mws_storage_configurations" "log_bucket" {
    account_id                 = var.account_id
    storage_configuration_name = "Usage Logs"
    bucket_name                = aws_s3_bucket.logdelivery.bucket
}

resource "databricks_mws_log_delivery" "usage_logs" {
    account_id = var.account_id
    credentials_id = databricks_mws_credentials.log_writer.credentials_id
    storage_configuration_id = databricks_mws_storage_configurations.log_bucket.storage_configuration_id
    delivery_path_prefix = "billable-usage"
    config_name = "Usage Logs"
    log_type = "BILLABLE_USAGE"
    output_format = "CSV"
}

resource "databricks_mws_log_delivery" "audit_logs" {
    account_id = var.account_id
    credentials_id = databricks_mws_credentials.log_writer.credentials_id
    storage_configuration_id = databricks_mws_storage_configurations.log_bucket.storage_configuration_id
    delivery_path_prefix = "audit-logs"
    config_name = "Audit Logs"
    log_type = "AUDIT_LOGS"
    output_format = "JSON"
}

Billable Usage

CSV files with static schema are delivered to <delivery_path_prefix>/billable-usage/csv/. Files are named workspaceId=<workspace-id>-usageMonth=<month>.csv, which are delivered daily by overwriting the month's CSV file for each workspace.

resource "databricks_mws_log_delivery" "usage_logs" {
    account_id = var.account_id
    credentials_id = databricks_mws_credentials.log_writer.credentials_id
    storage_configuration_id = databricks_mws_storage_configurations.log_bucket.storage_configuration_id
    delivery_path_prefix = "billable-usage"
    config_name = "Usage Logs"
    log_type = "BILLABLE_USAGE"
    output_format = "CSV"
}

Audit Logs

JSON files with static schema are delivered to <delivery_path_prefix>/workspaceId=<workspaceId>/date=<yyyy-mm-dd>/auditlogs_<internal-id>.json. Logs are available within 15 minutes of activation for audit logs. New JSON files are delivered every few minutes, potentially overwriting existing files for each workspace. Sometimes data may arrive later than 15 minutes. Databricks can overwrite the delivered log files in your bucket at any time. If a file is overwritten, the existing content remains, but there may be additional lines for more auditable events. Overwriting ensures exactly-once semantics without requiring read or delete access to your account.

resource "databricks_mws_log_delivery" "audit_logs" {
    account_id = var.account_id
    credentials_id = databricks_mws_credentials.log_writer.credentials_id
    storage_configuration_id = databricks_mws_storage_configurations.log_bucket.storage_configuration_id
    delivery_path_prefix = "audit-logs"
    config_name = "Audit Logs"
    log_type = "AUDIT_LOGS"
    output_format = "JSON"
}

Argument reference

  • account_id - The Databricks account ID that hosts the log delivery configuration.
  • config_name - The optional human-readable name of the log delivery configuration. Defaults to empty.
  • log_type - The type of log delivery. BILLABLE_USAGE and AUDIT_LOGS are supported.
  • output_format - The file type of log delivery. Currently CSV (for BILLABLE_USAGE) and JSON (for AUDIT_LOGS) are supported.
  • credentials_id - The ID for a Databricks credential configuration that represents the AWS IAM role with policy and trust relationship as described in the main billable usage documentation page.
  • storage_configuration_id - The ID for a Databricks storage configuration that represents the S3 bucket with bucket policy as described in the main billable usage documentation page.
  • workspace_ids_filter - (Optional) By default, this log configuration applies to all workspaces associated with your account ID. If your account is on the E2 version of the platform or on a select custom plan that allows multiple workspaces per account, you may have multiple workspaces associated with your account ID. You can optionally set the field as mentioned earlier to an array of workspace IDs. If you plan to use different log delivery configurations for several workspaces, set this explicitly rather than leaving it blank. If you leave this blank and your account ID gets additional workspaces in the future, this configuration will also apply to the new workspaces.
  • delivery_path_prefix - (Optional) Defaults to empty, which means that logs delivered to the root of the bucket. The value must be a valid S3 object key. It must not start or end with a slash character.
  • delivery_start_time - (Optional) The optional start month and year for delivery, specified in YYYY-MM format. Defaults to current year and month. Usage is not available before 2019-03.

Attribute reference

Resource exports the following attributes:

  • config_id - Databricks log delivery configuration ID.

@nfx nfx added this to the v0.3.0 milestone Sep 28, 2020
@TravisBuddy
Copy link

Hey @nfx,
Your changes look good to me!

View build log

TravisBuddy Request Identifier: 648e4ef0-01ad-11eb-9be6-0d9634a5fbe7

@codecov-commenter
Copy link

codecov-commenter commented Sep 28, 2020

Codecov Report

Merging #343 into master will increase coverage by 0.12%.
The diff coverage is 84.12%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #343      +/-   ##
==========================================
+ Coverage   70.31%   70.43%   +0.12%     
==========================================
  Files          60       61       +1     
  Lines        6292     6353      +61     
==========================================
+ Hits         4424     4475      +51     
- Misses       1450     1455       +5     
- Partials      418      423       +5     
Impacted Files Coverage Δ
mws/resource_log_delivery.go 83.33% <83.33%> (ø)
provider/provider.go 91.89% <100.00%> (+0.02%) ⬆️

Copy link

@svar29 svar29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments. Thanks for doing this for our customers!!

docs/resources/mws_log_delivery.md Outdated Show resolved Hide resolved
docs/resources/mws_log_delivery.md Show resolved Hide resolved
mws/resource_log_delivery.go Show resolved Hide resolved
mws/resource_log_delivery.go Outdated Show resolved Hide resolved
mws/resource_log_delivery.go Show resolved Hide resolved
mws/resource_log_delivery.go Show resolved Hide resolved
mws/resource_log_delivery.go Show resolved Hide resolved
@nfx nfx modified the milestones: v0.3.0, v0.2.8 Oct 5, 2020
@TravisBuddy
Copy link

Hey @nfx,
Your changes look good to me!

View build log

TravisBuddy Request Identifier: 045085a0-07f5-11eb-a712-35638fe859f6

@nfx nfx self-assigned this Nov 2, 2020
@TravisBuddy
Copy link

Hey @nfx,
Your changes look good to me!

View build log

TravisBuddy Request Identifier: 80794730-1d6b-11eb-8505-818d591f0ed7

@codecov-io
Copy link

codecov-io commented Nov 3, 2020

Codecov Report

Merging #343 into master will increase coverage by 0.05%.
The diff coverage is 76.62%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #343      +/-   ##
==========================================
+ Coverage   70.57%   70.62%   +0.05%     
==========================================
  Files          65       66       +1     
  Lines        6562     6635      +73     
==========================================
+ Hits         4631     4686      +55     
- Misses       1488     1499      +11     
- Partials      443      450       +7     
Impacted Files Coverage Δ
access/data_aws_policies.go 93.63% <33.33%> (-2.45%) ⬇️
mws/resource_log_delivery.go 79.41% <79.41%> (ø)
provider/provider.go 91.62% <100.00%> (+0.02%) ⬆️

Copy link

@svar29 svar29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with some nits!!

@@ -46,6 +46,7 @@ resource "databricks_mws_credentials" "this" {
## Argument Reference

* `external_id` (Required) (String) External ID that can be found at http://accounts.cloud.databricks.com/#aws
* `for_log_delivery` (Optional) Either or not this assume role policy should be created for usage log delivery. Defaults to false.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if its a bool, probably name it as is_log_delivery?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was thinking about it, for does it make sense semantically? it's not common in tf to have is_ prefixes. nor for our apis...
log_delivery_principal = true was the other idea


resource "databricks_mws_credentials" "ld" {
account_id = "{env.DATABRICKS_ACCOUNT_ID}"
credentials_name = "tf-acceptance-logdelivery-{var.RANDOM}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: names in this and other resources can be duplicated across resources so should be fine even if we dont introduce randomness here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svar29 RANDOM identifier here is more to tie different resources created by same test in case of debugging.

@@ -55,6 +55,53 @@ module "aws_common" {
tags = local.tags
}

resource "aws_s3_bucket" "logdelivery" {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we replace logDelivery with log_delivery?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will force deletion and re-creation of all relevant resources, which i'm rather avoiding

@nfx nfx merged commit 6f29524 into master Nov 3, 2020
@nfx nfx deleted the feature/log-delivery branch November 3, 2020 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants