-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Design Proposal] Preventing kaniko logs leakage #2083
Conversation
Name: "side-car", | ||
Image: constants.DefaultBusyboxImage, | ||
ImagePullPolicy: v1.PullIfNotPresent, | ||
Command: []string{"sh", "-c", "while [[ $(ps -ef | grep kaniko | wc -l) -gt 1 ]] ; do sleep 1; done; sleep " + cfg.PodGracePeriodSeconds}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you specify this Container as a side-car container which adds a sleep until cgf.PodGracePeriodSeconds which is 2 seconds.
Shd we instead use the preStopContainer hook to block container deletion ?
Shd we block it for number of seconds or something else is a question to discuss here.
lifecycle:
preStop:
exec:
# SIGTERM triggers a quick exit; gracefully terminate instead
command: ["some blocking command"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @tejal29 ,
IMHO I think preStopContainer are applicable only when api server sends kill pod command to a running pod, not when pod itself kills. Below is the link I have previously commented, correct me if i am missing something. We can have other solutions as well, I more of the option kaniko is the one where logs should be stored and streamed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i think we discussed about kubectl describe
when we internally triaged #1978 . I just forgot about it. We can definitely rely on kubectl describe. Should we do that when logs are not present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I am wrong, you are pointing when kaniko is not producing any log in that case we should rely on kubectl describe, is that what you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. if we see an error or no lines when retrieving logs, we should run kubectl describe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, Thanks @prary for pointing out preStopContainers are only applicable when api server kills the pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is no need for kubectl describe we are already monitoring the status of pod in WaitForPodComplete function in https://github.com/GoogleContainerTools/skaffold/blob/master/pkg/skaffold/kubernetes/wait.go . Or we may just add a log in PodRunning switch case i.e
case v1.PodRunning:
logrus.Infof("Pod is running")
return false, nil
case v1.PodFailed:
return false, fmt.Errorf("pod already in terminal phase: %s", pod.Status.Phase)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I like providing the output of kubectl describe upon failure because it could provide valuable information, and because the user won't be able to access it themselves since the pod will be cleaned up by skaffold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean
case v1.PodFailed:
**kubectl describe over here**
return false, fmt.Errorf("pod already in terminal phase: %s", pod.Status.Phase)
Is that where you think kubectl describe should be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tejal29 @priyawadhwa @dgageot I think fetching events from kubernetes would be better approach than kubectl describe i.e
case v1.PodFailed:
kubectl get events --namespace namespace-name --field-selector involvedObject.name=kaniko-pod-name
WDYT? This would fetch all event logs even when pod is failed due to some kubernetes constraints failure like image-builder(which we use for kaniko pod) service account not present or maybe attaching secret volume timeout due to some random reason.
# Skaffold logs improvements | ||
|
||
* Author(s): Prashant Arya (@prary) | ||
* Design Shepherd: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tejal Desai
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to add me here! i am interested in this discussion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
Kubernetes is pretty good at handling big log streams. The problem relies in the current log stream implementation as it waits until the pod is ready (with a 1 sec interval). If the pod exits during that period, no logs are streamed. Maybe we could reduce the interval time here:
or just look for the lines streamed. If no lines have been streamed before exiting, one more log call should be done there. e.g. var read int64
...
n, _ = io.Copy(out, r)
read += n and here
we can check if WDYT |
Will this work even when logger is not attached? |
I don't know why the logger shouldn't attach to the process. Even if the executable only runs for a ms, the logger will still catch it's output. When I implemented the fix for me, the logs were pulled properly from the pod. @prary maybe I'm getting this question wrong, but are you missing logs when Kaniko exits immediately? |
is blocked by #2311 |
running kokoro build here: https://sponge.corp.google.com/invocation?id=46060480-67a6-43f5-848f-237e248c6243&searchFor= |
Discarding this since #2352 is merged |
Kaniko logs not being fetch in case either too many logs are produced or kaniko exit before logger is attached to it.