[Design Proposal] Preventing kaniko logs leakage #2083

prary · 2019-05-07T16:41:00Z

Kaniko logs not being fetch in case either too many logs are produced or kaniko exit before logger is attached to it.

codecov-io · 2019-05-07T16:54:14Z

Codecov Report

Merging #2083 into master will not change coverage.
The diff coverage is n/a.

tejal29 · 2019-05-20T22:06:11Z

docs/design_proposals/kaniko-logs-leakage.md

+    Name:            "side-car",
+    Image:           constants.DefaultBusyboxImage,
+    ImagePullPolicy: v1.PullIfNotPresent,
+    Command: []string{"sh", "-c", "while [[ $(ps -ef | grep kaniko | wc -l) -gt 1 ]] ; do sleep 1; done; sleep " + cfg.PodGracePeriodSeconds},


I see you specify this Container as a side-car container which adds a sleep until cgf.PodGracePeriodSeconds which is 2 seconds.
Shd we instead use the preStopContainer hook to block container deletion ?
Shd we block it for number of seconds or something else is a question to discuss here.

lifecycle: preStop: exec: # SIGTERM triggers a quick exit; gracefully terminate instead command: ["some blocking command"]

hi @tejal29 ,

IMHO I think preStopContainer are applicable only when api server sends kill pod command to a running pod, not when pod itself kills. Below is the link I have previously commented, correct me if i am missing something. We can have other solutions as well, I more of the option kaniko is the one where logs should be stored and streamed.

#1978 (comment)

yes, i think we discussed about kubectl describe when we internally triaged #1978 . I just forgot about it. We can definitely rely on kubectl describe. Should we do that when logs are not present?

Correct me if I am wrong, you are pointing when kaniko is not producing any log in that case we should rely on kubectl describe, is that what you mean?

yes. if we see an error or no lines when retrieving logs, we should run kubectl describe

btw, Thanks @prary for pointing out preStopContainers are only applicable when api server kills the pod.

I think there is no need for kubectl describe we are already monitoring the status of pod in WaitForPodComplete function in https://github.com/GoogleContainerTools/skaffold/blob/master/pkg/skaffold/kubernetes/wait.go . Or we may just add a log in PodRunning switch case i.e

case v1.PodRunning: logrus.Infof("Pod is running") return false, nil case v1.PodFailed: return false, fmt.Errorf("pod already in terminal phase: %s", pod.Status.Phase)

Personally I like providing the output of kubectl describe upon failure because it could provide valuable information, and because the user won't be able to access it themselves since the pod will be cleaned up by skaffold.

You mean

case v1.PodFailed: **kubectl describe over here** return false, fmt.Errorf("pod already in terminal phase: %s", pod.Status.Phase)

Is that where you think kubectl describe should be?

@tejal29 @priyawadhwa @dgageot I think fetching events from kubernetes would be better approach than kubectl describe i.e

case v1.PodFailed: kubectl get events --namespace namespace-name --field-selector involvedObject.name=kaniko-pod-name

WDYT? This would fetch all event logs even when pod is failed due to some kubernetes constraints failure like image-builder(which we use for kaniko pod) service account not present or maybe attaching secret volume timeout due to some random reason.

tejal29 · 2019-05-20T22:06:19Z

docs/design_proposals/kaniko-logs-leakage.md

+# Skaffold logs improvements
+
+* Author(s): Prashant Arya (@prary)
+* Design Shepherd: 


Tejal Desai

Feel free to add me here! i am interested in this discussion!

cedrickring · 2019-06-03T18:31:37Z

too many logs are produced or kaniko exit before logger is attached to it

Kubernetes is pretty good at handling big log streams. The problem relies in the current log stream implementation as it waits until the pod is ready (with a 1 sec interval). If the pod exits during that period, no logs are streamed.

Maybe we could reduce the interval time here:

skaffold/pkg/skaffold/build/cluster/logs.go

Line 55 in b139240

time.Sleep(1 * time.Second)

or just look for the lines streamed. If no lines have been streamed before exiting, one more log call should be done there.

e.g.

var read int64
...
n, _ = io.Copy(out, r)
read += n

and here

skaffold/pkg/skaffold/build/cluster/logs.go

Line 64 in b139240

return func() {

we can check if read == 0 and request the logs.

WDYT

prary · 2019-06-10T12:12:18Z

Maybe we could reduce the interval time here:
I personally think it won't be a good idea.

var read int64
...
n, _ = io.Copy(out, r)
read += n

Will this work even when logger is not attached?

cedrickring · 2019-06-15T07:44:13Z

I don't know why the logger shouldn't attach to the process. Even if the executable only runs for a ms, the logger will still catch it's output.

When I implemented the fix for me, the logs were pulled properly from the pod.

@prary maybe I'm getting this question wrong, but are you missing logs when Kaniko exits immediately?

tejal29 · 2019-06-21T17:02:33Z

is blocked by #2311

tejal29 · 2019-07-09T23:57:41Z

running kokoro build here: https://sponge.corp.google.com/invocation?id=46060480-67a6-43f5-848f-237e248c6243&searchFor=

tejal29 · 2019-07-10T02:19:21Z

Discarding this since #2352 is merged

prary requested review from balopat, dgageot, nkubala, priyawadhwa and tejal29 as code owners May 7, 2019 16:41

googlebot added the cla: yes label May 7, 2019

prary changed the title ~~Preventing kaniko logs leakage~~ [Design Proposal] Preventing kaniko logs leakage May 7, 2019

prary mentioned this pull request May 8, 2019

Skaffold not streaming all the kaniko logs #1978

Closed

tejal29 reviewed May 20, 2019

View reviewed changes

tejal29 self-assigned this May 20, 2019

tejal29 mentioned this pull request May 20, 2019

Preventing Kaniko log loss #2152

Closed

tejal29 added kind/design discussion build/kaniko labels May 20, 2019

tejal29 approved these changes Jun 21, 2019

View reviewed changes

tejal29 added the kokoro:run runs the kokoro jobs on a PR label Jun 21, 2019

kokoro-team removed the kokoro:run runs the kokoro jobs on a PR label Jun 21, 2019

cedrickring mentioned this pull request Jun 26, 2019

Fix missing logs when kaniko exists immediately #2352

Merged

dgageot added the kokoro:run runs the kokoro jobs on a PR label Jul 2, 2019

priyawadhwa added kokoro:run runs the kokoro jobs on a PR and removed kokoro:run runs the kokoro jobs on a PR labels Jul 9, 2019

kokoro-team removed the kokoro:run runs the kokoro jobs on a PR label Jul 9, 2019

prary added 3 commits July 9, 2019 18:17

Design Doc for preventing kaniko error logs to flush

fb1f934

Design Doc for preventing kaniko error logs to flush

55c00a2

adding shepherd

049e4ef

Updating Design Doc for pulling k8s event log

0eacd1d

tejal29 force-pushed the kaniko_logs branch from 0e9c55d to 0eacd1d Compare July 10, 2019 01:17

tejal29 closed this Jul 10, 2019

prary deleted the kaniko_logs branch July 26, 2019 14:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Design Proposal] Preventing kaniko logs leakage #2083

[Design Proposal] Preventing kaniko logs leakage #2083

prary commented May 7, 2019

codecov-io commented May 7, 2019 •

edited by codecov bot

Loading

tejal29 May 20, 2019 •

edited

Loading

prary May 21, 2019

tejal29 May 22, 2019

prary May 24, 2019

tejal29 May 24, 2019

tejal29 May 24, 2019

prary May 27, 2019

priyawadhwa Jun 4, 2019

prary Jun 10, 2019

prary Jun 14, 2019 •

edited

Loading

tejal29 May 20, 2019

tejal29 May 20, 2019

prary May 21, 2019

cedrickring commented Jun 3, 2019 •

edited

Loading

prary commented Jun 10, 2019

cedrickring commented Jun 15, 2019

tejal29 commented Jun 21, 2019

tejal29 commented Jul 9, 2019

tejal29 commented Jul 10, 2019 •

edited

Loading

[Design Proposal] Preventing kaniko logs leakage #2083

[Design Proposal] Preventing kaniko logs leakage #2083

Conversation

prary commented May 7, 2019

codecov-io commented May 7, 2019 • edited by codecov bot Loading

Codecov Report

tejal29 May 20, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prary Jun 14, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cedrickring commented Jun 3, 2019 • edited Loading

prary commented Jun 10, 2019

cedrickring commented Jun 15, 2019

tejal29 commented Jun 21, 2019

tejal29 commented Jul 9, 2019

tejal29 commented Jul 10, 2019 • edited Loading

codecov-io commented May 7, 2019 •

edited by codecov bot

Loading

tejal29 May 20, 2019 •

edited

Loading

prary Jun 14, 2019 •

edited

Loading

cedrickring commented Jun 3, 2019 •

edited

Loading

tejal29 commented Jul 10, 2019 •

edited

Loading