-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creation of Replay Pipeline for Splunk export #11
Changes from 5 commits
c21a9a8
259b863
c5a277f
fc75330
4450974
2010637
bada22e
be0ab52
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# Copyright 2021 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# https://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
/* | ||
The replay job should stay commented out while the main export pipeline is initially deployed. | ||
When the replay job needs to be run, simply uncomment the module and deploy the replay pipeline. | ||
From the CLI, this may look like `terraform apply -target="google_dataflow_job.splunk_dataflow_replay"` | ||
After the deadletter Pub/Sub topic has no more messages, comment out the module and run a regular terraform deployment (ex. terraform apply). Terraform will automatically destroy the replay job. | ||
|
||
`terraform apply -target` usage documentation is here: https://www.terraform.io/docs/cli/commands/apply.html | ||
*/ | ||
|
||
resource "google_dataflow_job" "splunk_dataflow_replay" { | ||
name = local.dataflow_replay_job_name | ||
template_gcs_path = local.dataflow_deadletter_template_gcs_path | ||
temp_gcs_location = "gs://${local.dataflow_temporary_gcs_bucket_name}/${local.dataflow_temporary_gcs_bucket_path}" | ||
machine_type = var.dataflow_job_machine_type | ||
max_workers = var.dataflow_job_machine_count | ||
parameters = { | ||
inputSubscription = google_pubsub_subscription.dataflow_deadletter_pubsub_sub.id | ||
outputTopic = google_pubsub_topic.dataflow_input_pubsub_topic.id | ||
} | ||
region = var.region | ||
network = var.network | ||
subnetwork = "regions/${var.region}/subnetworks/${local.subnet_name}" | ||
ip_configuration = "WORKER_IP_PRIVATE" | ||
# service_account_email = "" | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -72,6 +72,12 @@ variable "splunk_hec_token" { | |
|
||
# Dataflow job parameters | ||
|
||
variable "dataflow_template_version" { | ||
type = string | ||
description = "Dataflow template version for the replay job." | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this version variable applies to both templates/jobs. Just minor update to description (and associated README table of params) |
||
default = "latest" | ||
} | ||
|
||
variable "dataflow_template_path" { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @npredey Should we remove this input variable, now that you have a similarly-named local variable? |
||
description = "Dataflow template path. Defaults to latest version of Google-hosted Pub/Sub to Splunk template" | ||
default = "gs://dataflow-templates/latest/Cloud_PubSub_to_Splunk" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The machine type and count specified are specific to Pub/Sub to Splunk template sizing. I wonder if we should leave those out for Pub/Sub to Pub/Sub template, and rely on default since it's an ephemeral pipeline? I'm fine either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember when we did this with Tempus, if the machine type for the replay pipeline was too small, then it took quite some time (depending on the number of logs) to burn down the PubSub to PubSub template. Perhaps we can just rely on the default, but give the opportunity for customization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM