Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Custom metrics are not recorded for DAG tasks Fixes #3872 #3886

Merged
merged 23 commits into from
Sep 2, 2020

Conversation

sarabala1979
Copy link
Member

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this is a chore.
  • The title of the PR is (a) conventional, (b) states what changed, and (c) suffixes the related issues number. E.g. "fix(controller): Updates such and such. Fixes #1234". Fixes Custom metrics are not recorded for DAG tasks. #3872
  • I've signed the CLA.
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My builds are green. Try syncing with master if they are not.
  • My organization is added to USERS.md.

@simster7 simster7 self-assigned this Aug 31, 2020
@sarabala1979 sarabala1979 marked this pull request as ready for review August 31, 2020 18:49
Comment on lines 317 to 324
// Collect the completed task metrics
tmpl := woc.wf.GetTemplateByName(task.Template)
if tmpl != nil && tmpl.Metrics != nil {
if prevNodeStatus, ok := woc.preExecutionNodePhases[node.ID]; ok && !prevNodeStatus.Fulfilled() {
localScope, realTimeScope := woc.prepareMetricScope(node)
woc.computeMetrics(tmpl.Metrics.Prometheus, localScope, realTimeScope, false)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too sure this is the correct approach here. All metrics should be emitted within executeTemplate. If you notice for DAG or Steps, metrics are never emitted outside of it. Can you investigate why the current metric emission code in executeTemplate is not getting called in this case?

Copy link
Member Author

@sarabala1979 sarabala1979 Sep 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In DAG flow, the Code is checking the node.Fulfilled() in early-stage (executeDAGTask) which prevents to execute the executeTemplate for completed the tasks.
https://github.com/argoproj/argo/blob/5b5d2359ef9f573121fe6429e386f03dd8652ece/workflow/controller/dag.go#L316-L326

Options:

  1. We can remove the return statement on the above code. So, Completed node will travel to executetemplate But I am not sure whether it will break any DAG flow
  2. Above fix, Add metrics emission code in node.Fulfilled() check

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I think we can keep this current fix for this issue.

workflow/controller/dag.go Outdated Show resolved Hide resolved
workflow/controller/dag.go Show resolved Hide resolved
Comment on lines 317 to 324
// Collect the completed task metrics
tmpl := woc.wf.GetTemplateByName(task.Template)
if tmpl != nil && tmpl.Metrics != nil {
if prevNodeStatus, ok := woc.preExecutionNodePhases[node.ID]; ok && !prevNodeStatus.Fulfilled() {
localScope, realTimeScope := woc.prepareMetricScope(node)
woc.computeMetrics(tmpl.Metrics.Prometheus, localScope, realTimeScope, false)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I think we can keep this current fix for this issue.

@sarabala1979 sarabala1979 merged commit 24c7783 into argoproj:master Sep 2, 2020
@YourPsychiatrist
Copy link
Contributor

Thank you for taking care of the issue that fast! Highly appreciated

@YourPsychiatrist
Copy link
Contributor

@simster7 or @alexec or someone else: Is this fix already scheduled for a specific release?

@simster7
Copy link
Member

Backported to 2.10 and 2.11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Custom metrics are not recorded for DAG tasks.
3 participants