Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question: Why not directly return when AddFd fail in RunPodSandbox? #720

Closed
yanxuean opened this issue Jul 16, 2018 · 7 comments
Closed

Comments

@yanxuean
Copy link
Contributor

yanxuean commented Jul 16, 2018

Now, we continue to do CreateContainer when AddFd is failed in RunPodSandbox. After all kubelet will trigger delete all.
I think it is meaningless, and it introduce some new problem.
we should directly return when AddFd is failed in RunPodSandbox.

[ubuntu@master:/paasdata/test/virtlet]$ kubectl get pod
NAME      READY     STATUS                 RESTARTS   AGE
yy-1      1/1       Running                0          15m
yy-10     1/1       Running                0          14m
yy-2      1/1       Running                0          14m
yy-3      1/1       Running                0          14m
yy-4      1/1       Running                0          14m
yy-5      1/1       Running                0          14m
yy-6      1/1       Running                0          14m
yy-7      0/1       CreateContainerError   0          14m
yy-8      0/1       CreateContainerError   0          14m
yy-9      0/1       CreateContainerError   0          14m

===========kubectl describe pod yy-7===============
Events:
  Type     Reason                 Age              From               Message
  ----     ------                 ----             ----               -------
  Normal   SuccessfulMountVolume  4m               kubelet, node1     MountVolume.SetUp succeeded for volume "default-token-pwwkg"
  Warning  Failed                 3m               kubelet, node1     Error: "/run/virtlet.sock": rpc error: code = 2 desc = failed to refresh the storage pool: virError(Code=1, Domain=18, Message='internal error: pool 'volumes' has asynchronous jobs running.')
  Normal   Scheduled              3m               default-scheduler  Successfully assigned yy-7 to node1
  Normal   Pulled                 2m (x6 over 3m)  kubelet, node1     Container image "virtlet.cloud/download.cirros-cloud.net/0.3.5/cirros-0.3.5-x86_64-disk.img" already present on machine
  Warning  FailedSync             2m (x6 over 3m)  kubelet, node1     Error syncing pod
  Warning  Failed                 2m (x5 over 3m)  kubelet, node1     Error: "/run/virtlet.sock": rpc error: code = 2 desc = virError(Code=90, Domain=18, Message='storage volume 'virtlet_root_a8941e21-b35f-54d0-7525-ae4d54f5b9e2' exists already')
  Normal   SandboxChanged         2m (x6 over 3m)  kubelet, node1     Pod sandbox changed, it will be killed and re-created.

refer to #718 #719

@jellonek
Copy link
Contributor

Hmmm... If AddFd failed during RunPodSandbox CreateContainer should fail on the beginning (around there: https://github.com/Mirantis/virtlet/blob/master/pkg/manager/runtime.go#L314), because sandbox metadata should not land in metadata store at the end of RunPodSandbox (https://github.com/Mirantis/virtlet/blob/master/pkg/manager/runtime.go#L171).
Can you attach also the virtlet log?
In above logs there is no info about AddFd failure, while there is other interesting issue - internal error: pool 'volumes' has asynchronous jobs running which also should not happened after 0633d86#diff-82807a8b25a38ed05200e0c7805cb79d
Can you confirm on which version of sources are you working, because in #719 you were pointing at "virtletv1.0", what probably means our v1.0.0 release from april, while above commit was added in may.
I'm guessing that it could be v1.0.0 also because what i pointed above with not proceeding to do CreateContainer after failure during AddFd was added in d14d131#diff-c602029982c7f4627cf1825318674126, which was included in v1.0.2

@yanxuean
Copy link
Contributor Author

The version is release v1.0.

log, There are some my analysis. and is chinese, you can ignore it.

[ubuntu@master:/paasdata/test/virtlet]$ kubectl get pod
NAME      READY     STATUS                 RESTARTS   AGE
yy-1      1/1       Running                0          15m
yy-10     1/1       Running                0          14m
yy-2      1/1       Running                0          14m
yy-3      1/1       Running                0          14m
yy-4      1/1       Running                0          14m
yy-5      1/1       Running                0          14m
yy-6      1/1       Running                0          14m
yy-7      0/1       CreateContainerError   0          14m
yy-8      0/1       CreateContainerError   0          14m
yy-9      0/1       CreateContainerError   0          14m

===========kubectl describe pod yy-7===============
Events:
  Type     Reason                 Age              From               Message
  ----     ------                 ----             ----               -------
  Normal   SuccessfulMountVolume  4m               kubelet, node1     MountVolume.SetUp succeeded for volume "default-token-pwwkg"
  Warning  Failed                 3m               kubelet, node1     Error: "/run/virtlet.sock": rpc error: code = 2 desc = failed to refresh the storage pool: virError(Code=1, Domain=18, Message='internal error: pool 'volumes' has asynchronous jobs running.')
  Normal   Scheduled              3m               default-scheduler  Successfully assigned yy-7 to node1
  Normal   Pulled                 2m (x6 over 3m)  kubelet, node1     Container image "virtlet.cloud/download.cirros-cloud.net/0.3.5/cirros-0.3.5-x86_64-disk.img" already present on machine
  Warning  FailedSync             2m (x6 over 3m)  kubelet, node1     Error syncing pod
  Warning  Failed                 2m (x5 over 3m)  kubelet, node1     Error: "/run/virtlet.sock": rpc error: code = 2 desc = virError(Code=90, Domain=18, Message='storage volume 'virtlet_root_a8941e21-b35f-54d0-7525-ae4d54f5b9e2' exists already')
  Normal   SandboxChanged         2m (x6 over 3m)  kubelet, node1     Pod sandbox changed, it will be killed and re-created.

===================virtlet  log======================================================================
{"log":"I0508 07:23:56.132679   10024 manager.go:178] RunPodSandbox called for pod yy-7 (d601d1c1-5290-11e8-8d7b-fa163e15c857)\n","stream":"stderr","time":"2018-05-08T07:23:56.132997241Z"}
{"log":"I0508 07:23:56.133146   10024 manager.go:192] RunPodSandbox: (*runtime.RunPodSandboxRequest)(0xc4204d8058)({\n","stream":"stderr","time":"2018-05-08T07:23:56.133364352Z"}

{"log":"I0508 07:23:56.433099   10024 manager.go:178] RunPodSandbox called for pod yy-9 (d7e60d8e-5290-11e8-8d7b-fa163e15c857)\n","stream":"stderr","time":"2018-05-08T07:23:56.441318907Z"}

{"log":"I0508 07:23:56.437965   10024 manager.go:178] RunPodSandbox called for pod yy-8 (d6f96ee4-5290-11e8-8d7b-fa163e15c857)\n","stream":"stderr","time":"2018-05-08T07:23:56.44206469Z"}

======================yy-6的结果应该没被截获,该pod建成功了。当前 addFd成功时,runPodSandbox不打印东西===================
{"log":"I0508 15:23:56.532832   28272 client.go:162] AddSandboxToNetwork: PodID \"d5000820-5290-11e8-8d7b-fa163e15c857\", PodName \"yy-6\", PodNs \"default\", runtime config:\n","stream":"stderr","time":"2018-05-08T07:23:56.533012432Z"}
{"log":"(*libcni.RuntimeConf)(0xc42019e6e0)({\n","stream":"stderr","time":"2018-05-08T07:23:56.533042261Z"}
{"log":" ContainerID: (string) (len=36) \"d5000820-5290-11e8-8d7b-fa163e15c857\",\n","stream":"stderr","time":"2018-05-08T07:23:56.533054463Z"}

{"log":"I0508 07:23:56.718620   10024 manager.go:178] RunPodSandbox called for pod yy-10 (d92df9a3-5290-11e8-8d7b-fa163e15c857)\n","stream":"stderr","time":"2018-05-08T07:23:56.719654282Z"}

{"log":"I0508 15:23:58.442156   28272 client.go:166] AddSandboxToNetwork: PodID \"d5000820-5290-11e8-8d7b-fa163e15c857\", PodName \"yy-6\", PodNs \"default\": result:\n","stream":"stderr","time":"2018-05-08T07:23:58.442670546Z"}

==============查询yy-6的状态,为ready,so yy-6是成功的。========
{"log":"I0508 07:23:59.271503   10024 manager.go:317] PodSandboxStatusStatus: (*runtime.PodSandboxStatusRequest)(0xc420637960)({\n","stream":"stderr","time":"2018-05-08T07:23:59.272377189Z"}
{"log":"I0508 07:23:59.272050   10024 manager.go:343] PodSandboxStatus response: (*runtime.PodSandboxStatusResponse)(0xc4203f70b0)({\n","stream":"stderr","time":"2018-05-08T07:23:59.272489985Z"}
{"log":" Status: (*runtime.PodSandboxStatus)(0xc4207761e0)({\n","stream":"stderr","time":"2018-05-08T07:23:59.272503294Z"}
{"log":"  Id: (string) (len=36) \"d5000820-5290-11e8-8d7b-fa163e15c857\",\n","stream":"stderr","time":"2018-05-08T07:23:59.272515736Z"}
{"log":"  Metadata: (*runtime.PodSandboxMetadata)(0xc42056a740)({\n","stream":"stderr","time":"2018-05-08T07:23:59.272529032Z"}
{"log":"   Name: (string) (len=4) \"yy-6\",\n","stream":"stderr","time":"2018-05-08T07:23:59.272541445Z"}
{"log":"   Uid: (string) (len=36) \"d5000820-5290-11e8-8d7b-fa163e15c857\",\n","stream":"stderr","time":"2018-05-08T07:23:59.272553594Z"}
{"log":"   Namespace: (string) (len=7) \"default\",\n","stream":"stderr","time":"2018-05-08T07:23:59.272565296Z"}
{"log":"   Attempt: (uint32) 0\n","stream":"stderr","time":"2018-05-08T07:23:59.272593272Z"}
{"log":"  }),\n","stream":"stderr","time":"2018-05-08T07:23:59.272609574Z"}
{"log":"  State: (runtime.PodSandboxState) 0,\n","stream":"stderr","time":"2018-05-08T07:23:59.272621983Z"}

{"log":"I0508 07:23:59.301269   10024 manager.go:381] CreateContainer called for name: cc-yy-6\n","stream":"stderr","time":"2018-05-08T07:23:59.306745555Z"}

{"log":"I0508 15:23:59.699607   28352 client.go:162] AddSandboxToNetwork: PodID \"d41a3c78-5290-11e8-8d7b-fa163e15c857\", PodName \"yy-5\", PodNs \"default\", runtime config:\n","stream":"stderr","time":"2018-05-08T07:23:59.699838664Z"}

{"log":"I0508 15:24:01.646011   28352 client.go:166] AddSandboxToNetwork: PodID \"d41a3c78-5290-11e8-8d7b-fa163e15c857\", PodName \"yy-5\", PodNs \"default\": result:\n","stream":"stderr","time":"2018-05-08T07:24:01.646303711Z"}

{"log":"I0508 07:24:02.348225   10024 manager.go:343] PodSandboxStatus response: (*runtime.PodSandboxStatusResponse)(0xc4201d68e0)({\n","stream":"stderr","time":"2018-05-08T07:24:02.348448879Z"}
{"log":" Status: (*runtime.PodSandboxStatus)(0xc4202c7680)({\n","stream":"stderr","time":"2018-05-08T07:24:02.348476217Z"}
{"log":"  Id: (string) (len=36) \"d41a3c78-5290-11e8-8d7b-fa163e15c857\",\n","stream":"stderr","time":"2018-05-08T07:24:02.348491095Z"}
{"log":"  Metadata: (*runtime.PodSandboxMetadata)(0xc4202e3500)({\n","stream":"stderr","time":"2018-05-08T07:24:02.348504439Z"}
{"log":"   Name: (string) (len=4) \"yy-5\",\n","stream":"stderr","time":"2018-05-08T07:24:02.348517762Z"}
{"log":"   Uid: (string) (len=36) \"d41a3c78-5290-11e8-8d7b-fa163e15c857\",\n","stream":"stderr","time":"2018-05-08T07:24:02.348530644Z"}
{"log":"   Namespace: (string) (len=7) \"default\",\n","stream":"stderr","time":"2018-05-08T07:24:02.348543686Z"}
{"log":"   Attempt: (uint32) 0\n","stream":"stderr","time":"2018-05-08T07:24:02.348557708Z"}
{"log":"  }),\n","stream":"stderr","time":"2018-05-08T07:24:02.348570197Z"}
{"log":"  State: (runtime.PodSandboxState) 0,\n","stream":"stderr","time":"2018-05-08T07:24:02.34858285Z"}

{"log":"I0508 07:24:02.357534   10024 manager.go:381] CreateContainer called for name: cc-yy-5\n","stream":"stderr","time":"2018-05-08T07:24:02.361738351Z"}


=============第一次为yy-7建ns,   tapmananger侧调用cni成功了====================
{"log":"I0508 15:24:02.585525   28392 client.go:124] CNI config: name: \"containerd-bridge\" type: \"bridge\"\n","stream":"stderr","time":"2018-05-08T07:24:02.585817702Z"}
{"log":"I0508 15:24:02.586228   28392 client.go:162] AddSandboxToNetwork: PodID \"d601d1c1-5290-11e8-8d7b-fa163e15c857\", PodName \"yy-7\", PodNs \"default\", runtime config:\n","stream":"stderr","time":"2018-05-08T07:24:02.586412563Z"}
{"log":"(*libcni.RuntimeConf)(0xc420189e00)({\n","stream":"stderr","time":"2018-05-08T07:24:02.586438418Z"}
{"log":" ContainerID: (string) (len=36) \"d601d1c1-5290-11e8-8d7b-fa163e15c857\",\n","stream":"stderr","time":"2018-05-08T07:24:02.586451774Z"}

{"log":"I0508 15:24:04.613730   28392 client.go:166] AddSandboxToNetwork: PodID \"d601d1c1-5290-11e8-8d7b-fa163e15c857\", PodName \"yy-7\", PodNs \"default\": result:\n","stream":"stderr","time":"2018-05-08T07:24:04.614093962Z"}

{"log":"I0508 07:24:04.619166   10033 tapfdsource.go:138] CNI configuration for pod yy-7 (d601d1c1-5290-11e8-8d7b-fa163e15c857): (*current.Result)(0xc420136000)({\n","stream":"stderr","time":"2018-05-08T07:24:04.619372693Z"}

=====================应该是这里把yy-7的结果让yy-8给截获了。这之前tapManager只有对yy-7的操作=======================
{"log":"E0508 07:24:05.205665   10024 manager.go:220] Error when adding pod yy-8 (d6f96ee4-5290-11e8-8d7b-fa163e15c857) to CNI network: fd key mismatch in the server response\n","stream":"stderr","time":"2018-05-08T07:24:05.205934204Z"}
{"log":"nsfix reexec: pid 28438: entering the namespaces of target pid 1\n","stream":"stderr","time":"2018-05-08T07:24:05.244085618Z"}

接下来一坨yy-8的createContainer操作

===================yy-9 在tapManager侧也成功了=====================
{"log":"I0508 15:24:05.493332   28438 client.go:162] AddSandboxToNetwork: PodID \"d7e60d8e-5290-11e8-8d7b-fa163e15c857\", PodName \"yy-9\", PodNs \"default\", runtime config:\n","stream":"stderr","time":"2018-05-08T07:24:05.527559495Z"}

{"log":"I0508 15:24:05.775555   28438 client.go:166] AddSandboxToNetwork: PodID \"d7e60d8e-5290-11e8-8d7b-fa163e15c857\", PodName \"yy-9\", PodNs \"default\": result:\n","stream":"stderr","time":"2018-05-08T07:24:05.776299698Z"}

{"log":"I0508 07:24:05.780602   10033 tapfdsource.go:138] CNI configuration for pod yy-9 (d7e60d8e-5290-11e8-8d7b-fa163e15c857): (*current.Result)(0xc420294460)({\n","stream":"stderr","time":"2018-05-08T07:24:05.781945121Z"}

=====================yy-7又把yy-9的给截获了===============
{"log":"I0508 07:24:05.869500   10033 nettools.go:716] Opening tap interface \"tap0\" for link \"virtlet-eth0\"\n","stream":"stderr","time":"2018-05-08T07:24:05.87005565Z"}
{"log":"I0508 07:24:05.869571   10033 nettools.go:721] Adding interface \"virtlet-eth0\" as \"tap0\"\n","stream":"stderr","time":"2018-05-08T07:24:05.870131321Z"}
{"log":"2018/05/08 07:24:06 transport: http2Server.HandleStreams failed to receive the preface from client: EOF\n","stream":"stderr","time":"2018-05-08T07:24:06.092375098Z"}
{"log":"E0508 07:24:06.371938   10024 manager.go:220] Error when adding pod yy-7 (d601d1c1-5290-11e8-8d7b-fa163e15c857) to CNI network: fd key mismatch in the server response\n","stream":"stderr","time":"2018-05-08T07:24:06.372188913Z"}
{"log":"nsfix reexec: pid 28549: entering the namespaces of target pid 1\n","stream":"stderr","time":"2018-05-08T07:24:06.415327404Z"}

====================建sandbox失败当前不会返回失败,只会设置状态为not ready,让kubelet自己发起删建,so会继续创建container================
{"log":"I0508 07:24:06.549259   10024 manager.go:381] CreateContainer called for name: cc-yy-7\n","stream":"stderr","time":"2018-05-08T07:24:06.549479509Z"}
{"log":"I0508 07:24:06.549943   10024 manager.go:382] CreateContainer: (*runtime.CreateContainerRequest)(0xc42015e1e0)({\n","stream":"stderr","time":"2018-05-08T07:24:06.550168281Z"}
{"log":" PodSandboxId: (string) (len=36) \"d601d1c1-5290-11e8-8d7b-fa163e15c857\",\n","stream":"stderr","time":"2018-05-08T07:24:06.550196521Z"}


=====================yy-9在tapmanager侧也成功了==========================================
{"log":"I0508 15:24:06.724307   28549 client.go:162] AddSandboxToNetwork: PodID \"d6f96ee4-5290-11e8-8d7b-fa163e15c857\", PodName \"yy-8\", PodNs \"default\", runtime config:\n","stream":"stderr","time":"2018-05-08T07:24:06.724516942Z"}

{"log":"I0508 15:24:08.711109   28549 client.go:166] AddSandboxToNetwork: PodID \"d6f96ee4-5290-11e8-8d7b-fa163e15c857\", PodName \"yy-8\", PodNs \"default\": result:\n","stream":"stderr","time":"2018-05-08T07:24:08.711799506Z"}

=========================yy-9也乱了,9把8的给 截了。就7,8,9三个乱序了。10成功了==============================
{"log":"E0508 07:24:09.304910   10024 manager.go:220] Error when adding pod yy-9 (d7e60d8e-5290-11e8-8d7b-fa163e15c857) to CNI network: fd key mismatch in the server response\n","stream":"stderr","time":"2018-05-08T07:24:09.305170582Z"}
{"log":"nsfix reexec: pid 28591: entering the namespaces of target pid 1\n","stream":"stderr","time":"2018-05-08T07:24:09.343093109Z"}
{"log":"I0508 15:24:09.553139   28591 client.go:124] CNI config: name: \"containerd-bridge\" type: \"bridge\"\n","stream":"stderr","time":"2018-05-08T07:24:09.553472444Z"}
{"log":"I0508 15:24:09.553591   28591 client.go:162] AddSandboxToNetwork: PodID \"d92df9a3-5290-11e8-8d7b-fa163e15c857\", PodName \"yy-10\", PodNs \"default\", runtime config:\n","stream":"stderr","time":"2018-05-08T07:24:09.553825938Z"}
{"log":"(*libcni.RuntimeConf)(0xc420478690)({\n","stream":"stderr","time":"2018-05-08T07:24:09.553858153Z"}

{"log":"I0508 15:24:11.245197   28591 client.go:166] AddSandboxToNetwork: PodID \"d92df9a3-5290-11e8-8d7b-fa163e15c857\", PodName \"yy-10\", PodNs \"default\": result:\n","stream":"stderr","time":"2018-05-08T07:24:11.245479153Z"}


{"log":"I0508 07:24:12.037181   10024 manager.go:317] PodSandboxStatusStatus: (*runtime.PodSandboxStatusRequest)(0xc420711920)({\n","stream":"stderr","time":"2018-05-08T07:24:12.037444738Z"}
{"log":" PodSandboxId: (string) (len=36) \"d92df9a3-5290-11e8-8d7b-fa163e15c857\",\n","stream":"stderr","time":"2018-05-08T07:24:12.037487558Z"}
{"log":" Verbose: (bool) false\n","stream":"stderr","time":"2018-05-08T07:24:12.037519051Z"}
{"log":"})\n","stream":"stderr","time":"2018-05-08T07:24:12.037531815Z"}
{"log":"I0508 07:24:12.038023   10024 manager.go:343] PodSandboxStatus response: (*runtime.PodSandboxStatusResponse)(0xc420226e30)({\n","stream":"stderr","time":"2018-05-08T07:24:12.038218089Z"}
{"log":" Status: (*runtime.PodSandboxStatus)(0xc420809400)({\n","stream":"stderr","time":"2018-05-08T07:24:12.038242407Z"}
{"log":"  Id: (string) (len=36) \"d92df9a3-5290-11e8-8d7b-fa163e15c857\",\n","stream":"stderr","time":"2018-05-08T07:24:12.038254016Z"}
{"log":"  Metadata: (*runtime.PodSandboxMetadata)(0xc42020d780)({\n","stream":"stderr","time":"2018-05-08T07:24:12.038266565Z"}
{"log":"   Name: (string) (len=5) \"yy-10\",\n","stream":"stderr","time":"2018-05-08T07:24:12.038295725Z"}
{"log":"   Uid: (string) (len=36) \"d92df9a3-5290-11e8-8d7b-fa163e15c857\",\n","stream":"stderr","time":"2018-05-08T07:24:12.038307807Z"}
{"log":"   Namespace: (string) (len=7) \"default\",\n","stream":"stderr","time":"2018-05-08T07:24:12.038318922Z"}
{"log":"   Attempt: (uint32) 0\n","stream":"stderr","time":"2018-05-08T07:24:12.038329656Z"}
{"log":"  }),\n","stream":"stderr","time":"2018-05-08T07:24:12.038340096Z"}
{"log":"  State: (runtime.PodSandboxState) 0,\n","stream":"stderr","time":"2018-05-08T07:24:12.038350627Z"}


=======7,8,9的CreateContainer失败了。这个错误不知道啥原因(可能是libvirt不支持并行refresh,并行create应该是支持的),libvirt返回的。此错误会触发删除volume,为啥没删掉???========
{"log":"E0508 07:24:17.211824   10024 manager.go:422] Error creating container cc-yy-8: failed to refresh the storage pool: virError(Code=1, Domain=18, Message='internal error: pool 'volumes' has asynchronous jobs running.')\n","stream":"stderr","time":"2018-05-08T07:24:17.212035961Z"}
{"log":"E0508 07:24:17.237308   10024 manager.go:422] Error creating container cc-yy-9: failed to refresh the storage pool: virError(Code=1, Domain=18, Message='internal error: pool 'volumes' has asynchronous jobs running.')\n","stream":"stderr","time":"2018-05-08T07:24:17.237493432Z"}
{"log":"E0508 07:24:17.294768   10024 manager.go:422] Error creating container cc-yy-7: failed to refresh the storage pool: virError(Code=1, Domain=18, Message='internal error: pool 'volumes' has asynchronous jobs running.')\n","stream":"stderr","time":"2018-05-08T07:24:17.294961047Z"}

==================开始删建yy-9了,没有删掉netns????============================
{"log":"I0508 07:24:42.994434   10024 manager.go:257] StopPodSandbox called for pod d7e60d8e-5290-11e8-8d7b-fa163e15c857\n","stream":"stderr","time":"2018-05-08T07:24:42.995376602Z"}
{"log":"I0508 07:24:42.994660   10024 manager.go:258] StopPodSandbox: (*runtime.StopPodSandboxRequest)(0xc42084adc0)({\n","stream":"stderr","time":"2018-05-08T07:24:42.995446169Z"}
{"log":" PodSandboxId: (string) (len=36) \"d7e60d8e-5290-11e8-8d7b-fa163e15c857\"\n","stream":"stderr","time":"2018-05-08T07:24:42.995464911Z"}
{"log":"})\n","stream":"stderr","time":"2018-05-08T07:24:42.995481847Z"}
{"log":"I0508 07:24:43.002064   10024 manager.go:178] RunPodSandbox called for pod yy-9 (d7e60d8e-5290-11e8-8d7b-fa163e15c857)\n","stream":"stderr","time":"2018-05-08T07:24:43.003330473Z"}
{"log":"I0508 07:24:43.002925   10024 manager.go:192] RunPodSandbox: (*runtime.RunPodSandboxRequest)(0xc42000e290)({\n","stream":"stderr","time":"2018-05-08T07:24:43.00336614Z"}


=========================第2次建yy-7======================
{"log":"I0508 07:24:43.229942   10024 manager.go:257] StopPodSandbox called for pod d601d1c1-5290-11e8-8d7b-fa163e15c857\n","stream":"stderr","time":"2018-05-08T07:24:43.23022753Z"}
{"log":"I0508 07:24:43.230051   10024 manager.go:258] StopPodSandbox: (*runtime.StopPodSandboxRequest)(0xc420226a70)({\n","stream":"stderr","time":"2018-05-08T07:24:43.230275644Z"}
{"log":" PodSandboxId: (string) (len=36) \"d601d1c1-5290-11e8-8d7b-fa163e15c857\"\n","stream":"stderr","time":"2018-05-08T07:24:43.230293177Z"}
{"log":"})\n","stream":"stderr","time":"2018-05-08T07:24:43.230309322Z"}
{"log":"I0508 07:24:43.232652   10024 manager.go:178] RunPodSandbox called for pod yy-7 (d601d1c1-5290-11e8-8d7b-fa163e15c857)\n","stream":"stderr","time":"2018-05-08T07:24:43.233551279Z"}
{"log":"I0508 07:24:43.233175   10024 manager.go:192] RunPodSandbox: (*runtime.RunPodSandboxRequest)(0xc420648a48)({\n","stream":"stderr","time":"2018-05-08T07:24:43.23358113Z"}

===========================估计因为kubelet还没发RemoveSandbox下来导致建立的时候发现已存在。=============
{"log":"E0508 07:24:43.239156   10033 ns.go:48] \"ip netns add d601d1c1-5290-11e8-8d7b-fa163e15c857\" failed with output: Cannot create namespace file \"/var/run/netns/d601d1c1-5290-11e8-8d7b-fa163e15c857\": File exists\n","stream":"stderr","time":"2018-05-08T07:24:43.2394407Z"}
{"log":"E0508 07:24:43.239329   10024 manager.go:220] Error when adding pod yy-7 (d601d1c1-5290-11e8-8d7b-fa163e15c857) to CNI network: server returned error: error getting fd: error creating new netns for pod yy-7 (d601d1c1-5290-11e8-8d7b-fa163e15c857): exit status 1\n","stream":"stderr","time":"2018-05-08T07:24:43.239474293Z"}
{"log":"I0508 07:24:43.290802   10024 logger.go:33] Spawned new log writer. Log file:/kubernetes-log/d92df9a3-5290-11e8-8d7b-fa163e15c857/cc-yy-10_0.log\n","stream":"stderr","time":"2018-05-08T07:24:43.291869165Z"}
{"log":"I0508 07:24:43.290951   10024 listener.go:145] Spawned new stream reader for container 3cce4a2d-eacf-5926-48d9-9fe041e6f080\n","stream":"stderr","time":"2018-05-08T07:24:43.291916163Z"}

=========================建立container时也发现root volume已存在============================
{"log":"I0508 07:24:43.556662   10024 manager.go:381] CreateContainer called for name: cc-yy-7\n","stream":"stderr","time":"2018-05-08T07:24:43.559836967Z"}
{"log":"I0508 07:24:43.557418   10024 manager.go:382] CreateContainer: (*runtime.CreateContainerRequest)(0xc4207784a0)({\n","stream":"stderr","time":"2018-05-08T07:24:43.559865367Z"}
{"log":" PodSandboxId: (string) (len=36) \"d601d1c1-5290-11e8-8d7b-fa163e15c857\",\n","stream":"stderr","time":"2018-05-08T07:24:43.559879802Z"}

{"log":"E0508 07:24:43.603558   10024 manager.go:422] Error creating container cc-yy-7: virError(Code=90, Domain=18, Message='storage volume 'virtlet_root_a8941e21-b35f-54d0-7525-ae4d54f5b9e2' exists already')\n","stream":"stderr","time":"2018-05-08T07:24:43.604341563Z"}
{"log":"I0508 07:24:44.262034   10024 manager.go:257] StopPodSandbox called for pod d6f96ee4-5290-11e8-8d7b-fa163e15c857\n","stream":"stderr","time":"2018-05-08T07:24:44.262321075Z"}
{"log":"I0508 07:24:44.262139   10024 manager.go:258] StopPodSandbox: (*runtime.StopPodSandboxRequest)(0xc420787e40)({\n","stream":"stderr","time":"2018-05-08T07:24:44.262405221Z"}
{"log":" PodSandboxId: (string) (len=36) \"d6f96ee4-5290-11e8-8d7b-fa163e15c857\"\n","stream":"stderr","time":"2018-05-08T07:24:44.262439715Z"}
{"log":"})\n","stream":"stderr","time":"2018-05-08T07:24:44.262457815Z"}

@jellonek
Copy link
Contributor

The problem is that you are using an old version and some things are already fixed (like corrected cni handling, libvirt calls covered with locks, so there should bo no more errors like failed to refresh the storage pool).
Can you verify if same or similar issues are still visible when most recent version of Virtlet is used?

@yanxuean
Copy link
Contributor Author

Can you tell me the PR which we solved the issue on?

@jellonek
Copy link
Contributor

Issue with handling cni was fixed by #665. After that mentioned by you on the beginning call to CreateContainer after failed AddFd will not be done.
Later issue shown in pod events in your description - connected with asynchronous calls to libvirt was fixed in #677.

Still #719 imo fixes handling of calls to tapmanager from vmwrapper (ok, PR description is incorrect because it's not only about AddFd but all RPC using tapmanager client) which were not fixed by both of above mentioned PRs.

@yanxuean
Copy link
Contributor Author

Thanks, @jellonek I will try new version later. Hope to be solved:)

@yanxuean
Copy link
Contributor Author

The #665 #677 have not appeared again in virtlet v1.1.2.
Close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants