Skip to content

Commit

Permalink
Merge pull request #21 from Lakhtenkov-iv/feature/add-replay-pipeline…
Browse files Browse the repository at this point in the history
…-switch

Add replay pipeline switch
  • Loading branch information
rarsan authored Dec 20, 2022
2 parents f774f3c + 5f17c18 commit 5f36714
Show file tree
Hide file tree
Showing 4 changed files with 25 additions and 17 deletions.
12 changes: 8 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ These deployment templates are provided as is, without warranty. See [Copyright
| <a name="input_dataflow_job_udf_gcs_path"></a> [dataflow_job_udf_gcs_path](#input_dataflow_job_udf_gcs_path) | (Optional) GCS path for JavaScript file (default No UDF used) | `string` |
| <a name="input_dataflow_template_version"></a> [dataflow_template_version](#input_dataflow_template_version) | (Optional) Dataflow template release version (default 'latest'). Override this for version pinning e.g. '2021-08-02-00_RC00'. Must specify version only since template GCS path will be deduced automatically: 'gs://dataflow-templates/`version`/Cloud_PubSub_to_Splunk' | `string` |
| <a name="input_dataflow_worker_service_account"></a> [dataflow_worker_service_account](#input_dataflow_worker_service_account) | (Optional) Name of worker service account to be created and used to execute job operations. Must be 6-30 characters long, and match the regular expression [a-z]([-a-z0-9]*[a-z0-9]). If parameter is empty, worker service account defaults to project's Compute Engine default service account. | `string` |
| <a name="input_deploy_replay_job"></a> [deploy_replay_job](#input_deploy_replay_job) | (Optional) Defines if replay pipeline should be deployed or not (default: `false`) | `bool` |
| <a name="input_primary_subnet_cidr"></a> [primary_subnet_cidr](#input_primary_subnet_cidr) | The CIDR Range of the primary subnet | `string` |
| <a name="input_scoping_project"></a> [scoping_project](#input_scoping_project) | Cloud Monitoring scoping project ID to create dashboard under.<br>This assumes a pre-existing scoping project whose metrics scope contains the `project` where dataflow job is to be deployed.<br>See [Cloud Monitoring settings](https://cloud.google.com/monitoring/settings) for more details on scoping project.<br>If parameter is empty, scoping project defaults to value of `project` parameter above. | `string` |
| <a name="input_splunk_hec_token"></a> [splunk_hec_token](#input_splunk_hec_token) | (Optional) Splunk HEC token. Must be defined if `splunk_hec_token_source` if type of `PLAINTEXT` or `KMS`. | `string` |
Expand Down Expand Up @@ -116,11 +117,14 @@ Take note of dashboard_id value.

2. Visit newly created Monitoring Dashboard in Cloud Console by replacing dashboard_id in the following URL: https://console.cloud.google.com/monitoring/dashboards/builder/{dashboard_id}

#### Deploy replay pipeline
#### Deploy replay pipeline when needed

In the `replay.tf` file, uncomment the code under `splunk_dataflow_replay` and follow the sequence of `terraform plan` and `terraform apply`.
The replay pipeline is not deployed by default; instead it is only used to move failed messages from the PubSub deadletter subscription back to the input topic, in order to be redelivered by the main log export pipeline (as depicted in above [diagram](#architecture-diagram)). Refer to [Handling delivery failures](https://cloud.google.com/architecture/deploying-production-ready-log-exports-to-splunk-using-dataflow#handling_delivery_failures) for more detail.

Once the replay pipeline is no longer needed (the number of messages in the PubSub deadletter topic are at 0), comment out `splunk_dataflow_replay` and follow the `plan` and `apply` sequence above.
**Caution**: Make sure to deploy replay pipeline only after the root cause of the delivery failure has been fixed. Otherwise, the replay pipeline will cause an infinite loop where failed messages are sent back for re-delivery, only to fail again, causing an infinite loop and wasted resources. For that same reason, make sure to tear down the replay pipeline once the failed messages from the deadletter subscription are all processed or replayed.

1. To deploy replay pipeline, set `deploy_replay_job` variable to `true`, then follow the sequence of `terraform plan` and `terraform apply`.
2. Once the replay pipeline is no longer needed (i.e. the number of messages in the PubSub deadletter subscription is 0), set `deploy_replay_job` variable to `false`, then follow the sequence of `terraform plan` and `terraform apply`.

### Cleanup

Expand All @@ -131,8 +135,8 @@ $ terraform destroy

### TODOs

* ~~Support KMS-encrypted HEC token~~
* Expose logging level knob
* ~~Support KMS-encrypted HEC token~~
* ~~Create replay pipeline~~
* ~~Create secure network for self-contained setup if existing network is not provided~~
* ~~Add Cloud Monitoring dashboard~~
Expand Down
1 change: 0 additions & 1 deletion main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,6 @@ locals {
dataflow_temporary_gcs_bucket_path = "tmp/"

dataflow_splunk_template_gcs_path = "gs://dataflow-templates/${var.dataflow_template_version}/Cloud_PubSub_to_Splunk"
# tflint-ignore: terraform_unused_declarations
dataflow_pubsub_template_gcs_path = "gs://dataflow-templates/${var.dataflow_template_version}/Cloud_PubSub_to_Cloud_PubSub"

# If provided, set Dataflow worker to new user-managed service account;
Expand Down
23 changes: 11 additions & 12 deletions replay.tf
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,16 @@
# limitations under the License.

/*
The replay job should stay commented out while the main export pipeline is initially deployed.
When the replay job needs to be run, simply uncomment the module and deploy the replay pipeline.
From the CLI, this may look like `terraform apply -target="google_dataflow_job.splunk_dataflow_replay"`
After the deadletter Pub/Sub topic has no more messages, comment out the module and run a regular terraform deployment (ex. terraform apply). Terraform will automatically destroy the replay job.
`terraform apply -target` usage documentation is here: https://www.terraform.io/docs/cli/commands/apply.html
When the replay job needs to be run, simply run terraform with overriding variable
`terraform apply -var deploy_replay_job="true"` or change the value in tfvars file.
After the deadletter Pub/Sub topic has no more messages, set `deploy_replay_job` value to `false ` and
run a regular terraform deployment (ex. terraform apply).
*/

/*

resource "google_dataflow_job" "splunk_dataflow_replay" {
count = var.deploy_replay_job == true ? 1 : 0

name = local.dataflow_replay_job_name
template_gcs_path = local.dataflow_pubsub_template_gcs_path
temp_gcs_location = "gs://${local.dataflow_temporary_gcs_bucket_name}/${local.dataflow_temporary_gcs_bucket_path}"
Expand All @@ -32,13 +32,12 @@ resource "google_dataflow_job" "splunk_dataflow_replay" {
inputSubscription = google_pubsub_subscription.dataflow_deadletter_pubsub_sub.id
outputTopic = google_pubsub_topic.dataflow_input_pubsub_topic.id
}
region = var.region
network = var.network
subnetwork = "regions/${var.region}/subnetworks/${local.subnet_name}"
ip_configuration = "WORKER_IP_PRIVATE"
region = var.region
network = var.network
subnetwork = "regions/${var.region}/subnetworks/${local.subnet_name}"
ip_configuration = "WORKER_IP_PRIVATE"

depends_on = [
google_compute_subnetwork.splunk_subnet
]
}
*/
6 changes: 6 additions & 0 deletions variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -182,3 +182,9 @@ variable "dataflow_job_udf_function_name" {
description = "(Optional) Name of JavaScript function to be called (default No UDF used)"
default = ""
}

variable "deploy_replay_job" {
type = bool
description = "(Optional) Defines if replay pipeline should be deployed or not (default: `false`)"
default = false
}

0 comments on commit 5f36714

Please sign in to comment.