-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long builds fail with "UNAUTHORIZED: \"authentication required\"" #245
Comments
Hey @Delphinator , I tried to repro this issue but was able to build the image as expected: $ cat Dockerfile
FROM debian:stable-slim
RUN sleep 360
$ ./run_in_docker.sh Dockerfile $(pwd) gcr.io/priya-wadhwa/test:test
time="2018-07-25T21:50:49Z" level=info msg="Unpacking filesystem of debian:stable-slim..."
2018/07/25 21:50:49 No matching credentials found for index.docker.io, falling back on anonymous
time="2018-07-25T21:50:49Z" level=info msg="Mounted directories: [/kaniko /var/run /proc /dev /dev/pts /sys /sys/fs/cgroup /sys/fs/cgroup/systemd /sys/fs/cgroup/net_cls,net_prio /sys/fs/cgroup/cpu,cpuacct /sys/fs/cgroup/memory /sys/fs/cgroup/perf_event /sys/fs/cgroup/cpuset /sys/fs/cgroup/blkio /sys/fs/cgroup/freezer /sys/fs/cgroup/pids /sys/fs/cgroup/devices /dev/mqueue /workspace /etc/resolv.conf /etc/hostname /etc/hosts /dev/shm /root/.config/gcloud /proc/asound /proc/bus /proc/fs /proc/irq /proc/sys /proc/sysrq-trigger /proc/kcore /proc/keys /proc/timer_list /proc/sched_debug /sys/firmware]"
time="2018-07-25T21:50:50Z" level=info msg="Unpacking layer: 0"
time="2018-07-25T21:50:50Z" level=info msg="Not adding /dev because it is whitelisted"
time="2018-07-25T21:50:50Z" level=info msg="Not adding /etc/hostname because it is whitelisted"
time="2018-07-25T21:50:50Z" level=info msg="Not adding /etc/resolv.conf because it is whitelisted"
time="2018-07-25T21:50:50Z" level=info msg="Not adding /proc because it is whitelisted"
time="2018-07-25T21:50:50Z" level=info msg="Not adding /sys because it is whitelisted"
time="2018-07-25T21:50:51Z" level=info msg="Not adding /var/run because it is whitelisted"
time="2018-07-25T21:50:51Z" level=info msg="Taking snapshot of full filesystem..."
time="2018-07-25T21:50:53Z" level=info msg="cmd: /bin/sh"
time="2018-07-25T21:50:53Z" level=info msg="args: [-c sleep 360]"
time="2018-07-25T21:56:53Z" level=info msg="Taking snapshot of full filesystem..."
time="2018-07-25T21:56:54Z" level=info msg="No files were changed, appending empty layer to config."
2018/07/25 21:56:55 mounted blob: sha256:7d1463d31d7e5ad679ea175cd72afede858628ca49598925a17410e452d5ccec
2018/07/25 21:56:55 mounted blob: sha256:b5186294b6f665381d964ff1e51910d9c03599009ca8a8a54a66607f44daf490
gcr.io/priya-wadhwa/test:test: digest: sha256:d1244e42b3de8475275900102535ca00efdc05c9bbb4882c78ff816215028ea6 size: 429 We fetch the image config of the base image before starting to run commands here, and it's downloaded along with the base image, so an expired token shouldn't be the issue. When pushing the image after running commands, we also get a new auth token. I'm not sure why you're hitting this error since I couldn't repro the issue, but the only thing I can think of right now is that it's an auth issue with the registry you're trying to push to? |
That's odd. I just tried it again and I'm consistently getting errors when sleeping 6 minutes. I do remember 6 minutes being somewhat close to the threshold though. Maybe 10 minutes will do the job in your environment? I'm not convinced it's an auth issue with the private registry.
Just to be sure, this is the exact image I used and the docker version:
And logs of my most recent attempts: trying to push to a registry with valid credentials in
|
I tried again with 10 minutes, and it still worked fine:
If you could submit a PR with a Dockerfile that breaks our CI we might be able to explore this issue a bit more. You'd just have to add another Dockerfile in the dockerfiles directory |
Ping @Delphinator any luck getting a repro? |
I'm experiencing the same error message after 6mins and 20seconds of a NPM container build.
NPM Docker file:
I can however build and push the following sleepy container without issue.
Destination registry is Docker Trusted Registry 2.5.3 |
in GoogleContainerTools#245 (comment) @MnrGreg describes a multi-stage buld failing in the same way. Maybe this can be reproduced in CI?
I just pushed another attempt to reproduce in #267 with a two stage build. Hopefully it breaks.
I'm not 100% sure if 6 minutes are enough on a fast system. I did my experiments on a somewhat busy system with spinning disks. That does slow container startup and unpacking / packing down significantly. Maybe try 10 minutes just to be extra sure? |
Another thought: I first encountered the issue while building images using the CI component of my private gitlab instance. I just created a new user there, added the SSH keys of @priyawadhwa and @dlorenc (github makes SSH keys public for some reason) to that user and pushed a repo, which reproduces the issue in gitlab CI. The repo and build logs are publicly accessible at https://gitlab.jensgutermuth.de/kaniko-issue-245/reproduce. Feel free to experiment (probably best on a branch, so you don't step on each others work). The CI jobs in that repo first build kaniko using itself to facilitate adding debugging code or testing changes. Just change the |
Well, I'm completely confused now. Two stage builds seem to work for me (ignore the |
I did a bit more troubleshooting and noticed that when adding the 'COPY --from' the build succeeds 1/5 times with a 185 sleep, while without the 'COPY --from' it succeeds every time 5/5. Decreasing the sleep time to below 60 seconds with the 'COPY --from' succeeds 5/5 times.
@Delphinator could we include the 'COPY --from prior build' into you tests- seems it make play a part? |
Just got an UNAUTHORIZED with a 60 sleep and at a different unpacking stage. Seems to be quite inconsistent. Also seem that this is during a Pull from the registry and not the final image Push.
second run with timer at 10 seconds:
|
Sure! Let's see what happens: https://gitlab.jensgutermuth.de/kaniko-issue-245/reproduce/pipelines/2056 |
Jop. A build w/o |
Lucky us! You hit the error on the first go. It seems to be somewhat sporadic for me. |
in GoogleContainerTools#245 (comment) @MnrGreg describes a multi-stage buld failing in the same way. Maybe this can be reproduced in CI?
Hi all, I got the same issues after setting up some python software module (compiled with internet sources) on a base image. Looking forward to a fix 👍 If I can do anything to help, tell me and I will do my very best to help you! |
Could the duration of the push itself be relevant? The difference in upload speed could explain the lack of reproducibility. |
I have just run into this problem when running as part of a Jenkins build on Kubernetes. When I run locally (on my MacBook Pro) I'm able to build the Docker image just fine, but when I run as part of a Jenkins build on Kubernetes, the Kaniko container consumes a huge amount of memory (50+ GB) and after 14 minutes, it fails with:
At first, I thought it was a problem with trying to push the image to our local repository, so I added the
and here is my Dockerfile:
|
We are also experiencing this issue and it's a major roadblock. We do not know how to reproduce it consistently but it happens often. We are using kaniko in a gitlab ci pipeline.
We are using the debug image: |
@brandon-bethke-neudesic I worked around the problem by switching to https://github.com/containers/buildah |
Update: #388 was just merged, but I'll keep this issue open for a few days in case anyone continues to see this error. Thanks @ianberinger for the fix!! |
Is the fix included in the latest |
yup, it should be in |
#388 fixed my long builds. Edit: It looks like I spoke too soon...
Update: Extending the authorization token duration, per @pieterlange's suggestion in GitLab's container registry settings fixed it. Thanks, everyone! |
I can confirm this fixed it for me! @yurrriq make sure the token doesn't expire serverside either - gitlab-ci's expiry also happens to be 5 minutes, but you can easily up that in the admin console. |
Works for me, thanks everyone! |
I can confirm the same. Good job guys! |
Hi, Unfortunately we still have this problem. We use Artifactory as a Docker Registry. This problem occurs only on long running builds.
|
@AndreasBieber, perhaps you need to replace |
@yurrriq |
Forgotten to mention: This problem only occurs with multistage builds. |
Sorry to hear this is still happening @AndreasBieber :( Is there any chance you could provide us with a instructions we could use to reproduced the issue you are still seeing, e.g. maybe similar to the example in this issue's description? |
@bobcatfish: Sorry, it was my bad. We are building our own image of kaniko for the GitLab CI Runner with custom scripts.
Since we also build this custom image with kaniko, the docker auth config was accidentally overwritten during the build process. Now it works like charm. |
Awesome, I'm going to go ahead and close this issue since it seems like #388 fixed it. If anyone experiences this again please comment on this thread or open another issue! |
I seem to observe a regression between |
@akhmerov Replace |
I hate to re-open old bugs but recently just encountered this with a private registry. Quick builds (< 5 minutes) are uploaded just fine but longer builds (> 5 minutes) hit a 401 on upload. Unfortunately I don't run the registry so I cannot confirm the bearer token lifetime. Reading the comments above, I'm confused whether this bug was actually fixed or people just extended the server-side token lifetimes. |
I'm fairly certain this was actually fixed. The original issue was reproducible with the central registry at registry.docker.io and I don't think they changed their token lifetimes. #245 (comment) explains the fix and links to the relevant PR. |
@Caligatio I kept getting this error in multi stage docker builds when running in GitLab CI.
|
I'm currently hitting this with all my builds that take longer than 5-ish minutes. I am confused by yurrriq's response as it seems like the fix didn't work for them so they just extended the token lifetime. @culdev: I can confirm I have the /kaniko/.docker/config.json file in place and it works for quick builds. EDIT: If it somehow matters, I'm also sitting behind a proxy. EDIT2: Turns out my issue was caused by a misbehaving Registry auth service. The symptoms looked similar but was a completely different problem. Sorry all! |
my yaml:
yes, it is /kaniko/.docker not /root/.docker, but still error !
|
steps to reproduce
additional obervations
I also ran
tcpdump
on the network interface of the container. I saw a quite a bit of traffic at the start (I assume pulling the image) and a single, short TLS connection to index.docker.io aftersleep
was done.The issue seems to be gone (or at least takes substantially longer to arise) if I substitute
debian:stable-slim
with any image from my harbor (private docker registry) instance.working theory
My working theory of the underlying cause based on those two observations is, that kaniko tries to fetch the image config of the base image using an expired bearer token. This config would normally be extended and included in the tarball or pushed to the registry.
The text was updated successfully, but these errors were encountered: