Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop persisting completed/failed workflow #1706

Closed
IronPan opened this issue Jul 31, 2019 · 1 comment · Fixed by #1802
Closed

Stop persisting completed/failed workflow #1706

IronPan opened this issue Jul 31, 2019 · 1 comment · Fixed by #1802

Comments

@IronPan
Copy link
Member

IronPan commented Jul 31, 2019

Persistence Agent should stop persisting the workflow after workflow complete or failed.

@IronPan
Copy link
Member Author

IronPan commented Aug 7, 2019

Some data points -

"Today we experience high latency with KFP in our prod environment. The call to fetch experiments took almost 30 seconds.
However, the number of argo workflows stored in the cluster was only around 7k.
The problem was most likely to be caused by the persistent agent issues we discussed last time.
I saw many error logs about not finding the row or entry in the persistent agent's log.

I turned off persistent agent by scaling the replica down to 0 and cleared out the old workflows down to around 3k.
The latency dropped right after I turned off the persistent agent as expected.
I turned persistent agent back up afterwards and let it run for about 20 minutes. It was not able to catch up with newly created workflows.
And the latency started to increase again. Therefore, I repeated the process, but this time cleared the workflows down to around 200 first.
This resolved the issue. Workflow data was sync to the db without problem. The latency now hovers around 6 secs for fetching experiments.
"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant