Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WX-1792] Helper to Actor #7544

Merged
merged 19 commits into from
Sep 18, 2024
Merged

[WX-1792] Helper to Actor #7544

merged 19 commits into from
Sep 18, 2024

Conversation

THWiseman
Copy link
Contributor

@THWiseman THWiseman commented Sep 16, 2024

Description

This PR refactors & renames the CostCatalogHelper into the PollResultMonitorActor. Doing this allows the helper to asynchronously communicate with the CostCatalogService, which it needs to do in order to calculate a VM Cost Per Hour.

Release Notes Confirmation

CHANGELOG.md

  • I updated CHANGELOG.md in this PR
  • I assert that this change shouldn't be included in CHANGELOG.md because it doesn't impact community users

Terra Release Notes

  • I added a suggested release notes entry in this Jira ticket
  • I assert that this change doesn't need Jira release notes because it doesn't impact Terra users

@THWiseman THWiseman requested a review from a team as a code owner September 16, 2024 14:02
)
def onTaskComplete(runStatus: StandardAsyncRunState, handle: StandardAsyncPendingExecutionHandle): Unit =
pollingResultMonitorActor.foreach(helper =>
helper.tell(AsyncJobHasFinished(runStatus.getClass.getSimpleName), self)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pass runStatus here rather than stringifying it?

}

class PapiPollResultMonitorActor(tellMetadataFn: Map[String, Any] => Unit,
tellBardFn: (String, OffsetDateTime, Option[OffsetDateTime], OffsetDateTime) => Unit
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth bringing back the StartAndEndTimes class, a series of OffsetDateTime can be easy to mess up.

case event if event.name == CallMetadataKeys.VmEndTime => event.offsetDateTime
}

override def tellMetadata(metadata: Map[String, Any]): Unit = tellMetadataFn(metadata)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we give this actor a handle to the metadata service registry rather than having all the backends pass in (I assume identical?) implementations?

Similar question for Bard - can we put the actual message-passing to the Bard service in the parent PollResultMonitorActor and figure out the minimum logic that needs to be per-backend?

Copy link
Contributor Author

@THWiseman THWiseman Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is certainly possible, but I'm not sure if that's a big improvement since we would also need to pass in a lot of information about the particular task (runtime attributes, job descriptor, etc...) in order to have all the information necessary to send the message in the right way. All that other stuff doesn't seem relevant to tracking start and end times, yet we still want to send messages that include more context about their task. The passing of the function object is more about capturing necessary context from the parent class than it is about sharing implementation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see what you mean, for Bard. Maybe we could have a method that takes the current inputs and creates and returns a TaskSummaryEvent, which the PollMonitorActor knows what to do with?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated branch to do something along these lines. We no longer pass in callbacks from individual backends, but instead pass in some extra data so that the poll monitors can call their own implementations of tellMetadata and tellBard instead. This allowed us to remove tellBard from elsewhere in the codebase.

}

// Function that reports metrics to bard, called when a specific call attempt terminates.
def tellBard(terminalStateName: String,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved (almost) verbatim from StandardAsyncExecutionActor

@@ -322,7 +320,6 @@ object TesAsyncBackendJobExecutionActor {
handle: StandardAsyncPendingExecutionHandle,
getTaskLogsFn: StandardAsyncPendingExecutionHandle => Future[Option[TaskLog]],
tellMetadataFn: Map[String, Any] => Unit,
tellBardFn: StandardAsyncRunState => Unit,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to create a ticket in this epic to track the fact that we've disabled Bard reporting here.

Copy link
Collaborator

@jgainerdewar jgainerdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactor!

Copy link
Contributor

@salonishah11 salonishah11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question otherwise LGTM!

@@ -134,9 +132,8 @@ class PipelinesApiAsyncBackendJobExecutionActor(override val standardParams: Sta

override type StandardAsyncRunState = RunStatus

override val costHelper: Option[CostPollingHelper[RunStatus]] = Option(new PapiCostPollingHelper(tellMetadata))
def statusEquivalentTo(thiz: StandardAsyncRunState)(that: StandardAsyncRunState): Boolean =
thiz.toString == that.toString
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't .toString needed so that Cromwell doesn't log no-op status changes? I thought this was added recently 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YES, thank you!

@jgainerdewar
Copy link
Collaborator

I tested an image built off this branch in my BEE and confirmed that the vmStartTime and vmEndTime keys were written to metadata, and that the Bard event shows up in the BigQuery table as expected.

@THWiseman THWiseman merged commit 80fbf59 into develop Sep 18, 2024
38 checks passed
@THWiseman THWiseman deleted the WX-1792-helper-to-actor branch September 18, 2024 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants