Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add workflow telemetry to collect action metrics #17046

Merged
merged 1 commit into from
Dec 4, 2023

Conversation

moficodes
Copy link
Member

part of #17045

part of etcd-io#17045

Signed-off-by: Mofi Rahman <mofi@google.com>
@k8s-ci-robot
Copy link

Hi @moficodes. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jmhbnz
Copy link
Member

jmhbnz commented Nov 30, 2023

/ok-to-test

@moficodes
Copy link
Member Author

@jmhbnz do you know how the metrics become available to view?

@moficodes
Copy link
Member Author

hmm

The action post failed.

Error: [Workflow Telemetry] Unable to report process tracer result
Error: [Workflow Telemetry] Error
Error: Error: ENOENT: no such file or directory, open '/home/runner/actions-runner/_work/_actions/catchpoint/workflow-telemetry-action/v1/dist/proc-tracer/proc-trace.out'
[Workflow Telemetry] Reporting all content ...
Error: [Workflow Telemetry] Resource not accessible by integration

@jmhbnz
Copy link
Member

jmhbnz commented Nov 30, 2023

@jmhbnz do you know how the metrics become available to view?

Hey @moficodes - I can see some results here for example: https://github.com/etcd-io/etcd/actions/runs/7051472834/attempts/1#summary-19194441512

Can you check across all three (e2e, tests and robustness) and confirm we have results available for arm64 runs? If so I think we are looking ok to merge and collect some data for a few days before making a decision on reducing requested ram.

@moficodes
Copy link
Member Author

Telemetry

E2E arm64 memory usage.

E2E arm64

Robustness arm 64

robustness arm64

Looks like memory usage is under 3 gb for most runs. Except for robustness, where peak seems to be just under 5 gb.

I can recommend setting the limit to 8gb of ram to have some head room. Or 6gb if we are really trying to save resources.

@moficodes
Copy link
Member Author

@jmhbnz is there a list for all the possible actuated machine types?

Or is it dynamically provisioned based on actuated-arm64-8cpu-32gb and we can instead do something like actuated-arm64-4cpu-8gb and it will just work.

Copy link
Member

@jmhbnz jmhbnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @moficodes - I would like to collect some samples for a few days to make sure there are no outliers.

This pr LGTM so let's merge, collect some data for a few days and then @fykaa will raise the follow-up pr to reduce the ram requests and disable the telemetry.

@jmhbnz
Copy link
Member

jmhbnz commented Nov 30, 2023

Or is it dynamically provisioned based on actuated-arm64-8cpu-32gb and we can instead do something like actuated-arm64-4cpu-8gb and it will just work.

Based on #16801 (comment) I believe there is dynamic sizing/provisioning so we can just set the value and it will just work.

@jmhbnz jmhbnz requested a review from serathius November 30, 2023 21:57
@jmhbnz jmhbnz requested a review from ahrtr December 3, 2023 20:31
@ahrtr
Copy link
Member

ahrtr commented Dec 4, 2023

Saw the following errors in "Post Collect Workflow Telemetry", but it seems the reports were already successfully uploaded,

Error: [Workflow Telemetry] Unable to report process tracer result
Error: [Workflow Telemetry] Error
Error: Error: ENOENT: no such file or directory, open '/home/runner/actions-runner/_work/_actions/catchpoint/workflow-telemetry-action/v1/dist/proc-tracer/proc-trace.out'
[Workflow Telemetry] Reporting all content ...
Error: [Workflow Telemetry] Resource not accessible by integration

@ahrtr
Copy link
Member

ahrtr commented Dec 4, 2023

Just noticed that you already raised an issue catchpoint/workflow-telemetry-action#53.

Thanks.

@ahrtr ahrtr merged commit b89a451 into etcd-io:main Dec 4, 2023
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants