Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Fail to pull runtime image due to exceeding docker pull rate limit #5202

Closed
JosephKang opened this issue Dec 24, 2020 · 6 comments
Closed

Fail to pull runtime image due to exceeding docker pull rate limit #5202

JosephKang opened this issue Dec 24, 2020 · 6 comments

Comments

@JosephKang
Copy link

Organization Name:
Advantech

Short summary about the issue/question:
Fail to pull runtime image due to exceeding docker pull rate limit

Brief what process you are following:

  1. Submit a job.
  2. The job is in waiting state more than 10 mins
  3. Check the k8s log
root@devbox-845320-iaas:/pai/contrib/kubespray# kubectl logs a22aadb25a9613c3a32539e1329331ae-main-0
Error from server (BadRequest): container "app" in pod "a22aadb25a9613c3a32539e1329331ae-main-0" is waiting to start: PodInitializing

root@devbox-845320-iaas:/pai/contrib/kubespray# kubectl describe pod a22aadb25a9613c3a32539e1329331ae-main-0
...
Events:
  Type     Reason     Age                 From                               Message
  ----     ------     ----                ----                               -------
  Normal   Scheduled  20m                 hivedscheduler-ds-prodcpu          Successfully assigned default/a22aadb25a9613c3a32539e1329331ae-main-0 to workercpu03-1106620-iaas
  Normal   Pulling    19m (x4 over 20m)   kubelet, workercpu03-1106620-iaas  Pulling image "openpai/openpai-runtime:v1.0.1"
  Warning  Failed     19m (x4 over 20m)   kubelet, workercpu03-1106620-iaas  Failed to pull image "openpai/openpai-runtime:v1.0.1": rpc error: code = Unknown desc = Error response from daemon: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit
  Warning  Failed     19m (x4 over 20m)   kubelet, workercpu03-1106620-iaas  Error: ErrImagePull
  Warning  Failed     18m (x7 over 20m)   kubelet, workercpu03-1106620-iaas  Error: ImagePullBackOff
  Normal   BackOff    27s (x86 over 20m)  kubelet, workercpu03-1106620-iaas  Back-off pulling image "openpai/openpai-runtime:v1.0.1"

How to reproduce it:
Repeat step 1 ~ 3 and sometimes the situation will happened.

OpenPAI Environment:

  • OpenPAI version:
    v1.0.1
  • Cloud provider or hardware configuration:
    OpenStack VM
  • OS (e.g. from /etc/os-release):
    16,04.6 LTS

Anything else we need to know:
https://www.docker.com/increase-rate-limit

It might be related the new pulling image policy by docker.

@JosephKang
Copy link
Author

JosephKang commented Dec 24, 2020

  1. How to skip visiting dockerhub if I already load openpai/openpai-runtime image in every worker node?
  2. May I know the function of openpai/openpai-runtime?

@JosephKang
Copy link
Author

I found a place that can redirect openpai-runtime docker registry URL: https://github.com/microsoft/pai/blob/master/src/rest-server/deploy/rest-server.yaml.template#L62

However, it required passwordless docker login permission. May I know how to setup docker login permission for this rest-server variable?

@JosephKang
Copy link
Author

Related to #5219

@JosephKang
Copy link
Author

Hello all,

I might need to deploy openpai in an environment without internet access during operation, and there is no imagepullpolicy configuration as IfNotPresent or Never for openpai-runtime in https://github.com/microsoft/pai/blob/master/src/rest-server/deploy/rest-server.yaml.template even if I already load openpai-runtime image on worker nodes.

May I know how to work-around it on ~/pai/src/rest-server/ ?

Should I open another issue for the offline environment opration?

@SwordFaith
Copy link
Contributor

Hello @JosephKang ,

I'm working an docker-registry as pull-through cache mode to solve docker_rate_limit problem, which will add a integrated registry to pai cluster. And it may be possible with a private docker registry include all your image. You can make a proposal in advance. Some of other users have told us no internet environment, so we will solve in future version when it become next step. A basic work around can be build a registry full with your image used, and contact us after that.

@JosephKang
Copy link
Author

@SwordFaith
Thanks. I setup a local docker registry including the image we used, and it seems the only work-around solution for the docker policy change.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants