Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creation of Replay Pipeline for Splunk export #11

Merged

Conversation

npredey
Copy link
Collaborator

@npredey npredey commented Jul 27, 2021

No description provided.

@npredey npredey requested a review from rarsan July 27, 2021 22:11
Copy link
Member

@rarsan rarsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for the PR. Comments below.
Could you add a sentence in README with steps on how to deploy specifically the replay pipeline as needed?

variables.tf Outdated
@@ -124,3 +124,9 @@ variable "dataflow_job_udf_function_name" {
description = "[Optional Dataflow UDF] Name of JavaScript function to be called (default: '')"
default = ""
}

variable "dataflow_replay_template_version" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of specifying a version number vs the GCS path. Should we make the other parameter dataflow_template_path consistent though?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the version is all that's required, because the template path itself won't really change (as far as I know). Should we address that in this PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we should make the path static, but then consolidate the path + version variables together. How does that sound to you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that sounds good. Up to you if you prefer to address in this PR or a follow-up one

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed it in this PR, let me know what you think!

replay.tf Outdated Show resolved Hide resolved
name = local.dataflow_replay_job_name
template_gcs_path = local.dataflow_deadletter_template_gcs_path
temp_gcs_location = "gs://${local.dataflow_temporary_gcs_bucket_name}/${local.dataflow_temporary_gcs_bucket_path}"
machine_type = var.dataflow_job_machine_type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The machine type and count specified are specific to Pub/Sub to Splunk template sizing. I wonder if we should leave those out for Pub/Sub to Pub/Sub template, and rely on default since it's an ephemeral pipeline? I'm fine either way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember when we did this with Tempus, if the machine type for the replay pipeline was too small, then it took quite some time (depending on the number of logs) to burn down the PubSub to PubSub template. Perhaps we can just rely on the default, but give the opportunity for customization?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@npredey npredey requested a review from rarsan August 16, 2021 14:57
variables.tf Outdated
description = "Dataflow template version for the replay job."
default = "latest"
}

variable "dataflow_template_path" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@npredey Should we remove this input variable, now that you have a similarly-named local variable?

@@ -72,6 +72,12 @@ variable "splunk_hec_token" {

# Dataflow job parameters

variable "dataflow_template_version" {
type = string
description = "Dataflow template version for the replay job."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this version variable applies to both templates/jobs. Just minor update to description (and associated README table of params)

Copy link
Member

@rarsan rarsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in this iteration, @npredey . Looks good to me, one last question about dataflow_template_path - how about deleting that input variable in favor of standardizing ondataflow_template_version? If so, let's update the table in README. Will probably need to update one line in pipeline.tf to use the local path variable similar to replay.tf

…le for dataflow template path in pipeline.tf
@npredey npredey requested a review from rarsan August 23, 2021 14:58
@npredey
Copy link
Collaborator Author

npredey commented Aug 23, 2021

@rarsan I standardized the input variable for template path, and switched to use the local template path variable like in replay.tf. Let me know if that's what you were thinking!

@rarsan
Copy link
Member

rarsan commented Aug 25, 2021

Looks good @npredey !
Can you give me push access to your branch feature/npredey-replay-pipeline so I can push a minor commit into your PR?

So I pulled & tested this PR but it failed as it turns out a random_id resource was removed. I made some changes since to fix it along with README updates in your branch. Once pushed and you review, I can merge & close.

@npredey
Copy link
Collaborator Author

npredey commented Aug 25, 2021

@rarsan I sent an invite out to you for my repo, which should grant you the proper access. Let me know if that works!

@rarsan rarsan merged commit 130bca7 into GoogleCloudPlatform:main Aug 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants