Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Design Proposal] Preventing kaniko logs leakage #2083

Closed
wants to merge 4 commits into from

Conversation

prary
Copy link
Contributor

@prary prary commented May 7, 2019

Kaniko logs not being fetch in case either too many logs are produced or kaniko exit before logger is attached to it.

@prary prary changed the title Preventing kaniko logs leakage [Design Proposal] Preventing kaniko logs leakage May 7, 2019
@codecov-io
Copy link

codecov-io commented May 7, 2019

Codecov Report

Merging #2083 into master will not change coverage.
The diff coverage is n/a.

Name: "side-car",
Image: constants.DefaultBusyboxImage,
ImagePullPolicy: v1.PullIfNotPresent,
Command: []string{"sh", "-c", "while [[ $(ps -ef | grep kaniko | wc -l) -gt 1 ]] ; do sleep 1; done; sleep " + cfg.PodGracePeriodSeconds},
Copy link
Contributor

@tejal29 tejal29 May 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you specify this Container as a side-car container which adds a sleep until cgf.PodGracePeriodSeconds which is 2 seconds.
Shd we instead use the preStopContainer hook to block container deletion ?
Shd we block it for number of seconds or something else is a question to discuss here.

lifecycle:
          preStop:
            exec:
              # SIGTERM triggers a quick exit; gracefully terminate instead
              command: ["some blocking command"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @tejal29 ,

IMHO I think preStopContainer are applicable only when api server sends kill pod command to a running pod, not when pod itself kills. Below is the link I have previously commented, correct me if i am missing something. We can have other solutions as well, I more of the option kaniko is the one where logs should be stored and streamed.

#1978 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i think we discussed about kubectl describe when we internally triaged #1978 . I just forgot about it. We can definitely rely on kubectl describe. Should we do that when logs are not present?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I am wrong, you are pointing when kaniko is not producing any log in that case we should rely on kubectl describe, is that what you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. if we see an error or no lines when retrieving logs, we should run kubectl describe

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, Thanks @prary for pointing out preStopContainers are only applicable when api server kills the pod.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is no need for kubectl describe we are already monitoring the status of pod in WaitForPodComplete function in https://github.com/GoogleContainerTools/skaffold/blob/master/pkg/skaffold/kubernetes/wait.go . Or we may just add a log in PodRunning switch case i.e

                case v1.PodRunning:
                        logrus.Infof("Pod is running")
			return false, nil
		case v1.PodFailed:
			return false, fmt.Errorf("pod already in terminal phase: %s", pod.Status.Phase)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I like providing the output of kubectl describe upon failure because it could provide valuable information, and because the user won't be able to access it themselves since the pod will be cleaned up by skaffold.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean

case v1.PodFailed:
        **kubectl describe over here**
	return false, fmt.Errorf("pod already in terminal phase: %s", pod.Status.Phase)

Is that where you think kubectl describe should be?

Copy link
Contributor Author

@prary prary Jun 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tejal29 @priyawadhwa @dgageot I think fetching events from kubernetes would be better approach than kubectl describe i.e

case v1.PodFailed:
   kubectl get events --namespace namespace-name --field-selector involvedObject.name=kaniko-pod-name

WDYT? This would fetch all event logs even when pod is failed due to some kubernetes constraints failure like image-builder(which we use for kaniko pod) service account not present or maybe attaching secret volume timeout due to some random reason.

# Skaffold logs improvements

* Author(s): Prashant Arya (@prary)
* Design Shepherd:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tejal Desai

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to add me here! i am interested in this discussion!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@cedrickring
Copy link
Contributor

cedrickring commented Jun 3, 2019

too many logs are produced or kaniko exit before logger is attached to it

Kubernetes is pretty good at handling big log streams. The problem relies in the current log stream implementation as it waits until the pod is ready (with a 1 sec interval). If the pod exits during that period, no logs are streamed.

Maybe we could reduce the interval time here:

time.Sleep(1 * time.Second)

or just look for the lines streamed. If no lines have been streamed before exiting, one more log call should be done there.

e.g.

var read int64
...
n, _ = io.Copy(out, r)
read += n

and here

we can check if read == 0 and request the logs.

WDYT

@prary
Copy link
Contributor Author

prary commented Jun 10, 2019

Maybe we could reduce the interval time here:
I personally think it won't be a good idea.

var read int64
...
n, _ = io.Copy(out, r)
read += n

Will this work even when logger is not attached?

@cedrickring
Copy link
Contributor

I don't know why the logger shouldn't attach to the process. Even if the executable only runs for a ms, the logger will still catch it's output.

When I implemented the fix for me, the logs were pulled properly from the pod.

@prary maybe I'm getting this question wrong, but are you missing logs when Kaniko exits immediately?

@tejal29 tejal29 added the kokoro:run runs the kokoro jobs on a PR label Jun 21, 2019
@kokoro-team kokoro-team removed the kokoro:run runs the kokoro jobs on a PR label Jun 21, 2019
@tejal29
Copy link
Contributor

tejal29 commented Jun 21, 2019

is blocked by #2311

@dgageot dgageot added the kokoro:run runs the kokoro jobs on a PR label Jul 2, 2019
@priyawadhwa priyawadhwa added kokoro:run runs the kokoro jobs on a PR and removed kokoro:run runs the kokoro jobs on a PR labels Jul 9, 2019
@tejal29
Copy link
Contributor

tejal29 commented Jul 9, 2019

@kokoro-team kokoro-team removed the kokoro:run runs the kokoro jobs on a PR label Jul 9, 2019
@tejal29
Copy link
Contributor

tejal29 commented Jul 10, 2019

Discarding this since #2352 is merged

@tejal29 tejal29 closed this Jul 10, 2019
@prary prary deleted the kaniko_logs branch July 26, 2019 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants