Merge pull request #21 from Lakhtenkov-iv/feature/add-replay-pipeline…

…-switch Add replay pipeline switch
GoogleCloudPlatform · Dec 20, 2022 · 5f36714 · 5f36714
2 parents f774f3c + 5f17c18
commit 5f36714
Show file tree

Hide file tree

Showing 4 changed files with 25 additions and 17 deletions.
diff --git a/README.md b/README.md
@@ -34,6 +34,7 @@ These deployment templates are provided as is, without warranty. See [Copyright
 | <a name="input_dataflow_job_udf_gcs_path"></a> [dataflow_job_udf_gcs_path](#input_dataflow_job_udf_gcs_path) | (Optional) GCS path for JavaScript file (default No UDF used) | `string` |
 | <a name="input_dataflow_template_version"></a> [dataflow_template_version](#input_dataflow_template_version) | (Optional) Dataflow template release version (default 'latest'). Override this for version pinning e.g. '2021-08-02-00_RC00'. Must specify version only since template GCS path will be deduced automatically: 'gs://dataflow-templates/`version`/Cloud_PubSub_to_Splunk' | `string` |
 | <a name="input_dataflow_worker_service_account"></a> [dataflow_worker_service_account](#input_dataflow_worker_service_account) | (Optional) Name of worker service account to be created and used to execute job operations. Must be 6-30 characters long, and match the regular expression [a-z]([-a-z0-9]*[a-z0-9]). If parameter is empty, worker service account defaults to project's Compute Engine default service account. | `string` |
+| <a name="input_deploy_replay_job"></a> [deploy_replay_job](#input_deploy_replay_job) | (Optional) Defines if replay pipeline should be deployed or not (default: `false`) | `bool` |
 | <a name="input_primary_subnet_cidr"></a> [primary_subnet_cidr](#input_primary_subnet_cidr) | The CIDR Range of the primary subnet | `string` |
 | <a name="input_scoping_project"></a> [scoping_project](#input_scoping_project) | Cloud Monitoring scoping project ID to create dashboard under.<br>This assumes a pre-existing scoping project whose metrics scope contains the `project` where dataflow job is to be deployed.<br>See [Cloud Monitoring settings](https://cloud.google.com/monitoring/settings) for more details on scoping project.<br>If parameter is empty, scoping project defaults to value of `project` parameter above. | `string` |
 | <a name="input_splunk_hec_token"></a> [splunk_hec_token](#input_splunk_hec_token) | (Optional) Splunk HEC token. Must be defined if `splunk_hec_token_source` if type of `PLAINTEXT` or `KMS`. | `string` |
@@ -116,11 +117,14 @@ Take note of dashboard_id value.
 
 2. Visit newly created Monitoring Dashboard in Cloud Console by replacing dashboard_id in the following URL: https://console.cloud.google.com/monitoring/dashboards/builder/{dashboard_id}
 
-#### Deploy replay pipeline
+#### Deploy replay pipeline when needed
 
-In the `replay.tf` file, uncomment the code under `splunk_dataflow_replay` and follow the sequence of `terraform plan` and `terraform apply`.
+The replay pipeline is not deployed by default; instead it is only used to move failed messages from the PubSub deadletter subscription back to the input topic, in order to be redelivered by the main log export pipeline (as depicted in above [diagram](#architecture-diagram)). Refer to [Handling delivery failures](https://cloud.google.com/architecture/deploying-production-ready-log-exports-to-splunk-using-dataflow#handling_delivery_failures) for more detail.
 
-Once the replay pipeline is no longer needed (the number of messages in the PubSub deadletter topic are at 0), comment out `splunk_dataflow_replay` and follow the `plan` and `apply` sequence above.
+**Caution**: Make sure to deploy replay pipeline only after the root cause of the delivery failure has been fixed. Otherwise, the replay pipeline will cause an infinite loop where failed messages are sent back for re-delivery, only to fail again, causing an infinite loop and wasted resources. For that same reason, make sure to tear down the replay pipeline once the failed messages from the deadletter subscription are all processed or replayed.
+
+1. To deploy replay pipeline, set `deploy_replay_job` variable to `true`, then follow the sequence of `terraform plan` and `terraform apply`.
+2. Once the replay pipeline is no longer needed (i.e. the number of messages in the PubSub deadletter subscription is 0), set `deploy_replay_job` variable to `false`, then follow the sequence of `terraform plan` and `terraform apply`.
 
 ### Cleanup
 
@@ -131,8 +135,8 @@ $ terraform destroy
 
 ### TODOs
 
-* ~~Support KMS-encrypted HEC token~~
 * Expose logging level knob
+* ~~Support KMS-encrypted HEC token~~
 * ~~Create replay pipeline~~
 * ~~Create secure network for self-contained setup if existing network is not provided~~
 * ~~Add Cloud Monitoring dashboard~~

diff --git a/main.tf b/main.tf
@@ -35,7 +35,6 @@ locals {
   dataflow_temporary_gcs_bucket_path = "tmp/"
 
   dataflow_splunk_template_gcs_path = "gs://dataflow-templates/${var.dataflow_template_version}/Cloud_PubSub_to_Splunk"
-  # tflint-ignore: terraform_unused_declarations
   dataflow_pubsub_template_gcs_path = "gs://dataflow-templates/${var.dataflow_template_version}/Cloud_PubSub_to_Cloud_PubSub"
 
   # If provided, set Dataflow worker to new user-managed service account;

diff --git a/replay.tf b/replay.tf
@@ -13,16 +13,16 @@
 # limitations under the License.
 
 /*
-The replay job should stay commented out while the main export pipeline is initially deployed.
-When the replay job needs to be run, simply uncomment the module and deploy the replay pipeline. 
-From the CLI, this may look like `terraform apply -target="google_dataflow_job.splunk_dataflow_replay"`
-After the deadletter Pub/Sub topic has no more messages, comment out the module and run a regular terraform deployment (ex. terraform apply). Terraform will automatically destroy the replay job.
-
-`terraform apply -target` usage documentation is here: https://www.terraform.io/docs/cli/commands/apply.html
+When the replay job needs to be run, simply run terraform with overriding variable
+`terraform apply -var deploy_replay_job="true"` or change the value in tfvars file.
+After the deadletter Pub/Sub topic has no more messages, set `deploy_replay_job` value to `false ` and 
+run a regular terraform deployment (ex. terraform apply). 
 */
 
-/*
+
 resource "google_dataflow_job" "splunk_dataflow_replay" {
+  count = var.deploy_replay_job == true ? 1 : 0
+
   name              = local.dataflow_replay_job_name
   template_gcs_path = local.dataflow_pubsub_template_gcs_path
   temp_gcs_location = "gs://${local.dataflow_temporary_gcs_bucket_name}/${local.dataflow_temporary_gcs_bucket_path}"
@@ -32,13 +32,12 @@ resource "google_dataflow_job" "splunk_dataflow_replay" {
     inputSubscription = google_pubsub_subscription.dataflow_deadletter_pubsub_sub.id
     outputTopic       = google_pubsub_topic.dataflow_input_pubsub_topic.id
   }
-  region                = var.region
-  network               = var.network
-  subnetwork            = "regions/${var.region}/subnetworks/${local.subnet_name}"
-  ip_configuration      = "WORKER_IP_PRIVATE"
+  region           = var.region
+  network          = var.network
+  subnetwork       = "regions/${var.region}/subnetworks/${local.subnet_name}"
+  ip_configuration = "WORKER_IP_PRIVATE"
 
   depends_on = [
     google_compute_subnetwork.splunk_subnet
   ]
 }
-*/
diff --git a/variables.tf b/variables.tf
@@ -182,3 +182,9 @@ variable "dataflow_job_udf_function_name" {
   description = "(Optional) Name of JavaScript function to be called (default No UDF used)"
   default     = ""
 }
+
+variable "deploy_replay_job" {
+  type        = bool
+  description = "(Optional) Defines if replay pipeline should be deployed or not (default: `false`)"
+  default     = false
+}