-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Surface K8s Events about Nodes to UI #3673
Comments
We watch pods of a workflow to compute state. If this information is available on the pod’s status we could. Are you able to attach the YAML of a pod that we involved? Or is this only available on events? (Please attach the event YAML). |
Indeed the only thing I see in the pod YAML is what Argo report in the UI: status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2020-08-05T16:53:27Z"
message: '0/21 nodes are available: 21 Insufficient cpu.'
reason: Unschedulable
status: "False"
type: PodScheduled
phase: Pending
qosClass: Burstable and the source is $ kubectl get event -n argo --field-selector involvedObject.name=wonderful-tiger
LAST SEEN TYPE REASON OBJECT MESSAGE
25s Normal WorkflowRunning workflow/wonderful-tiger Workflow Running
25s Warning FailedScheduling pod/wonderful-tiger 0/21 nodes are available: 21 Insufficient cpu.
22s Normal NotTriggerScaleUp pod/wonderful-tiger pod didn't trigger scale-up (it wouldn't fit if a new node is added): 17 Insufficient cpu and kubectl get event -n argo --field-selector involvedObject.name=wonderful-tiger -o yaml apiVersion: v1
items:
- apiVersion: v1
count: 1
eventTime: null
firstTimestamp: "2020-08-05T17:01:09Z"
involvedObject:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
name: wonderful-tiger
namespace: argo
resourceVersion: "14645697"
uid: f4bb68ae-84a2-4e40-8d5b-71f0a9ab91f3
kind: Event
lastTimestamp: "2020-08-05T17:01:09Z"
message: Workflow Running
metadata:
creationTimestamp: "2020-08-05T17:01:09Z"
name: wonderful-tiger.16286dde6befd70c
namespace: argo
resourceVersion: "135777"
selfLink: /api/v1/namespaces/argo/events/wonderful-tiger.16286dde6befd70c
uid: 57c2bd09-11c6-4dc8-8ada-b33346764566
reason: WorkflowRunning
reportingComponent: ""
reportingInstance: ""
source:
component: workflow-controller
type: Normal
- apiVersion: v1
count: 3
eventTime: null
firstTimestamp: "2020-08-05T17:01:09Z"
involvedObject:
apiVersion: v1
kind: Pod
name: wonderful-tiger
namespace: argo
resourceVersion: "14645699"
uid: a6c471b1-a31e-46fa-b53e-27554a81d328
kind: Event
lastTimestamp: "2020-08-05T17:02:33Z"
message: '0/21 nodes are available: 21 Insufficient cpu.'
metadata:
creationTimestamp: "2020-08-05T17:01:09Z"
name: wonderful-tiger.16286dde6d37eca9
namespace: argo
resourceVersion: "135781"
selfLink: /api/v1/namespaces/argo/events/wonderful-tiger.16286dde6d37eca9
uid: d256c9fe-40fd-4b97-a7e6-62ec0470eb4f
reason: FailedScheduling
reportingComponent: ""
reportingInstance: ""
source:
component: default-scheduler
type: Warning
- apiVersion: v1
count: 1
eventTime: null
firstTimestamp: "2020-08-05T17:01:12Z"
involvedObject:
apiVersion: v1
kind: Pod
name: wonderful-tiger
namespace: argo
resourceVersion: "14645700"
uid: a6c471b1-a31e-46fa-b53e-27554a81d328
kind: Event
lastTimestamp: "2020-08-05T17:01:12Z"
message: 'pod didn''t trigger scale-up (it wouldn''t fit if a new node is added):
17 Insufficient cpu'
metadata:
creationTimestamp: "2020-08-05T17:01:12Z"
name: wonderful-tiger.16286ddf00db261b
namespace: argo
resourceVersion: "135780"
selfLink: /api/v1/namespaces/argo/events/wonderful-tiger.16286ddf00db261b
uid: ac79183e-aaff-4f8d-a604-10a8c4b254f6
reason: NotTriggerScaleUp
reportingComponent: ""
reportingInstance: ""
source:
component: cluster-autoscaler
type: Normal
kind: List
metadata:
resourceVersion: ""
selfLink: ""
So maybe it's possible that the |
Would it be enough to list events related to the pod in the UI? |
I think it would do the job yeah. And actually this will be useful for probably other things than this specific issue to be able to see the history of events. Similar to what's GKE is doing: I guess if this is available in the UI it will also be available in the |
As an MVP, I don't think we would make it available in the YAML. Instead, we would just make it available in the UI only. That would exclude the UI and the CLI. Do you think this should be MVP? |
For us, the most important is to have this information available on the UI. Having it in the YAML is just a bonus and a cool enhancement IMO. I like the idea of having all the information related to a workflow in a single YAML object. Its state, spec, and history of status. So my answer is yes it should go in MVP but if this requires too much work then having the information on the UI only is fine. |
cluster-autoscaler
event messages to workflow status message
Available for testing in v2.11.0-rc1. |
We run argo on GKE with
cluster-autoscaler
. Most of the time all our workflows trigger ascaleUp
event since powerful machines are usually down when not used.Usually, the scale-up takes a few minutes. During this time the workflow is in pending state with a message similar to this:
Once the node has been created the workflow starts and all is good!
But sometimes the cluster is not able to scale up (often due to an error in the
resources
andnodeSelector
configuration).Admin have access to cluster event logs and we can spot the issue quickly:
But most of our staff only use the Argo dashboard and so can only rely on the workflow status message to understand what is going on. And scaling up error reported by the
cluster-autoscaler
is not propagated to it:Would that be possible to report cluster events related to a specific workflow or pods managed by a workflow in the
status
field?Not sure it's technically possible (because of ServiceAccount and permissions) but I am asking just in case.
The text was updated successfully, but these errors were encountered: