You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"Today we experience high latency with KFP in our prod environment. The call to fetch experiments took almost 30 seconds.
However, the number of argo workflows stored in the cluster was only around 7k.
The problem was most likely to be caused by the persistent agent issues we discussed last time.
I saw many error logs about not finding the row or entry in the persistent agent's log.
I turned off persistent agent by scaling the replica down to 0 and cleared out the old workflows down to around 3k.
The latency dropped right after I turned off the persistent agent as expected.
I turned persistent agent back up afterwards and let it run for about 20 minutes. It was not able to catch up with newly created workflows.
And the latency started to increase again. Therefore, I repeated the process, but this time cleared the workflows down to around 200 first.
This resolved the issue. Workflow data was sync to the db without problem. The latency now hovers around 6 secs for fetching experiments.
"
Persistence Agent should stop persisting the workflow after workflow complete or failed.
The text was updated successfully, but these errors were encountered: