Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HA Test: POST /v1.21/containers/create returned error: Server error from portlayer: Invalid configuration for device '1'. #3845

Closed
mhagen-vmware opened this issue Feb 3, 2017 · 29 comments
Assignees
Labels
kind/defect Behavior that is inconsistent with what's intended priority/p0
Milestone

Comments

@mhagen-vmware
Copy link
Contributor

mhagen-vmware commented Feb 3, 2017

Saw the above error on the HA test after fixing the shared storage situation and making sure to grab the latest IP address after the HA event. The HA appears to have worked, afterwards docker info, docker pull, and docker images all worked, but on attempting to create the first container it failed.

5-4-High-Availability-VCH-0-7423-container-logs.zip

Marking this a high priority as it is a critical error, and PM expected that we already supported HA fully.

From portlayer.log:

Feb  3 2017 17:56:22.042Z DEBUG [BEGIN] [github.com/vmware/vic/lib/portlayer/exec.Commit:35] 488ede8ed8fc8e227aaacdeb996dd14b0930677dc99be754199d6791d642da37
Feb  3 2017 17:56:22.907Z DEBUG unexpected fault on task retry : &types.InvalidDeviceSpec{InvalidVmConfig:types.InvalidVmConfig{VmConfigFault:types.VmConfigFault{VimFault:types.VimFault{MethodFault:types.MethodFault{FaultCause:(*types.LocalizedMethodFault)(nil), FaultMessage:[]types.LocalizableMessage(nil)}}}, Property:"virtualDeviceSpec.device.backing.parent.fileName"}, DeviceIndex:1}
Feb  3 2017 17:56:22.907Z ERROR Something failed. Spec was {DynamicData:{} ChangeVersion: Name:trusting_shockl-488ede8ed8fc8e227aaacdeb996dd14b0930677dc99be754199d6791d642da37 Version: Uuid:7fb20d7c-423a-5e55-a762-3d27f4feed77 InstanceUuid: NpivNodeWorldWideName:[] NpivPortWorldWideName:[] NpivWorldWideNameType: NpivDesiredNodeWwns:0 NpivDesiredPortWwns:0 NpivTemporaryDisabled:<nil> NpivOnNonRdmDisks:<nil> NpivWorldWideNameOp: LocationId: GuestId:other3xLinux64Guest AlternateGuestName: Annotation: Files:0xc420810f00 Tools:<nil> Flags:<nil> ConsolePreferences:<nil> PowerOpInfo:<nil> NumCPUs:2 NumCoresPerSocket:0 MemoryMB:2048 MemoryHotAddEnabled:0xc4209241c0 CpuHotAddEnabled:0xc4209241c0 CpuHotRemoveEnabled:<nil> VirtualICH7MPresent:<nil> VirtualSMCPresent:<nil> DeviceChange:[0xc420811130 0xc420811360 0xc420811590 0xc420811720 0xc420811900 0xc420811a40 0xc420811db0 0xc420811e00] CpuAllocation:<nil> MemoryAllocation:<nil> LatencySensitivity:<nil> CpuAffinity:<nil> MemoryAffinity:<nil> NetworkShaper:<nil> CpuFeatureMask:[] ExtraConfig:[0xc4207ddb60 0xc4207ddb80 0xc4207ddba0 0xc4207ddbc0 0xc4207ddbe0 0xc4207ddc00 0xc4207ddc20 0xc4207ddc40 0xc4207ddc60 0xc4207ddc80 0xc4206ccdc0 0xc4206ccde0 0xc4206cce00 0xc4206cce20 0xc4206cce60 0xc4206cce80 0xc4206ccea0 0xc4206ccec0 0xc4206ccf00 0xc4206ccf20 0xc4206ccf40 0xc4206ccf60 0xc4206ccfa0 0xc4206ccfe0 0xc4206cd000 0xc4206cd020 0xc4206cd040 0xc4206cd060 0xc4206cd0a0 0xc4206cd0c0 0xc4206cd0e0 0xc4206cd100 0xc4206cd140 0xc4206cd160 0xc4206cd180 0xc4206cd1a0 0xc4206cd1c0 0xc4206cd1e0 0xc4206cd240 0xc4206cd260 0xc4206cd280 0xc4206cd2a0 0xc4206cd2c0 0xc4206cd300 0xc4206cd320 0xc4206cd340 0xc4206cd360 0xc4206cd380 0xc4206cd3a0 0xc4206cd3c0 0xc4206cd3e0 0xc4209b4460 0xc4209b44c0 0xc4209b4520 0xc4209b4560 0xc4209b45c0 0xc4209b4620 0xc4209b47c0 0xc4209b4820 0xc4209b48a0 0xc4209b48e0 0xc4209b4920 0xc4209b4940 0xc4209b4a00 0xc4209b4a40 0xc4209b4a80 0xc4209b4aa0 0xc4209b4ac0 0xc4209b4ae0 0xc4209b4b00 0xc4209b4c40 0xc4209b4ca0 0xc4209b4dc0 0xc4209b4e20 0xc4209b4e80 0xc4209b4f00 0xc4209b4f80 0xc4209b4fc0 0xc4209b5000 0xc4209b5040 0xc4209b5080 0xc4209b50c0 0xc4209b5100 0xc4209b5120 0xc4209b5160 0xc4209b5180 0xc4209b51c0 0xc4209b51e0 0xc4209b5240 0xc4209b5260 0xc4209b52a0 0xc4209b52c0 0xc4209b5300 0xc4209b5320 0xc4209b5360 0xc4209b5380 0xc4209b53a0 0xc4209b5420 0xc4209b5540 0xc4209b5580 0xc4209b55c0 0xc4209b5620 0xc4209b5640 0xc4209b56c0 0xc4209b5700 0xc4209b5720 0xc4209b5760] SwapPlacement: BootOptions:<nil> VAppConfig:<nil> FtInfo:<nil> RepConfig:<nil> VAppConfigRemoved:<nil> VAssertsEnabled:<nil> ChangeTrackingEnabled:<nil> Firmware: MaxMksConnections:0 GuestAutoLockEnabled:<nil> ManagedBy:<nil> MemoryReservationLockedToMax:<nil> NestedHVEnabled:<nil> VPMCEnabled:<nil> ScheduledHardwareUpgradeInfo:<nil> VmProfile:[] MessageBusTunnelEnabled:<nil> Crypto:<nil> MigrateEncryption:}
Feb  3 2017 17:56:22.907Z DEBUG [ END ] [github.com/vmware/vic/lib/portlayer/exec.Commit:35] [864.959168ms] 488ede8ed8fc8e227aaacdeb996dd14b0930677dc99be754199d6791d642da37
Feb  3 2017 17:56:22.907Z ERROR CommitHandler error on handle(62c6442658e7e850b711dce42f379a8a) for 488ede8ed8fc8e227aaacdeb996dd14b0930677dc99be754199d6791d642da37: task.Error{LocalizedMethodFault:(*types.LocalizedMethodFault)(0xc42093b420)}
Feb  3 2017 17:56:22.907Z DEBUG [ END ] [github.com/vmware/vic/lib/apiservers/portlayer/restapi/handlers.(*ContainersHandlersImpl).CommitHandler:224] [865.685252ms] handle(62c6442658e7e850b711dce42f379a8a)
Feb  3 2017 17:56:23.010Z DEBUG Found 3 subscribers to vsphere.VMEvent: Removed trusting_shockl-488ede8ed8fc8e227aaacdeb996dd14b0930677dc99be754199d6791d642da37 on 10.160.247.130 from ha-datacenter
Feb  3 2017 17:56:23.010Z DEBUG Event manager calling back to exec
Feb  3 2017 17:56:23.010Z DEBUG Event manager calling back to registeredVMEvent
Feb  3 2017 17:56:23.010Z DEBUG Event manager calling back to logging
Feb  3 2017 17:56:23.010Z DEBUG [BEGIN] [github.com/vmware/vic/lib/portlayer/logging.eventCallback:50]
Feb  3 2017 17:56:23.010Z DEBUG [ END ] [github.com/vmware/vic/lib/portlayer/logging.eventCallback:50] [8.942µs] 
@mhagen-vmware mhagen-vmware added the kind/defect Behavior that is inconsistent with what's intended label Feb 13, 2017
@corrieb corrieb mentioned this issue Feb 15, 2017
4 tasks
@hmahmood hmahmood self-assigned this Feb 15, 2017
@mhagen-vmware mhagen-vmware added this to the Sprint 3 milestone Feb 15, 2017
@hmahmood
Copy link
Contributor

tl;dr: not HA related.

Reproduced the error; from port-layer.log:

Feb 22 2017 21:23:55.189Z DEBUG unexpected fault on task retry : &types.InvalidDeviceSpec{InvalidVmConfig:types.InvalidVmConfig{VmConfigFault:types.VmConfigFault{VimFault:types.VimFault{MethodFault:types.MethodFault{FaultCause:(*types.LocalizedMethodFault)(nil), FaultMessage:[]types.LocalizableMessage(nil)}}}, Property:"virtualDeviceSpec.device.backing.parent.fileName"}, DeviceIndex:1}
Feb 22 2017 21:23:55.189Z ERROR Something failed. Spec was {DynamicData:{} ChangeVersion: Name:admiring_easley-ed4e9bfa392fa73adef12e2faf0ae3506bde4459ee3e970db060130f7ddec86d Version: Uuid:26c29e5f-33bd-54a1-816f-c1f9e02e02d6 InstanceUuid: NpivNodeWorldWideName:[] NpivPortWorldWideName:[] NpivWorldWideNameType: NpivDesiredNodeWwns:0 NpivDesiredPortWwns:0 NpivTemporaryDisabled:<nil> NpivOnNonRdmDisks:<nil> NpivWorldWideNameOp: LocationId: GuestId:other3xLinux64Guest AlternateGuestName: Annotation: Files:0xc420822280 Tools:<nil> Flags:<nil> ConsolePreferences:<nil> PowerOpInfo:<nil> NumCPUs:2 NumCoresPerSocket:0 MemoryMB:2048 MemoryHotAddEnabled:0xc420dea7f0 CpuHotAddEnabled:0xc420dea7f0 CpuHotRemoveEnabled:<nil> VirtualICH7MPresent:<nil> VirtualSMCPresent:<nil> DeviceChange:[0xc4208227d0 0xc420822ff0 0xc420823450 0xc420823810 0xc420823b80 0xc420823e50 0xc420812460 0xc4208124b0] CpuAllocation:<nil> MemoryAllocation:<nil> LatencySensitivity:<nil> CpuAffinity:<nil> MemoryAffinity:<nil> NetworkShaper:<nil> CpuFeatureMask:[] ExtraConfig:[0xc420970180 0xc4209701a0 0xc4209701e0 0xc420970200 0xc420970240 0xc420970260 0xc4209702a0 0xc4209702e0 0xc420970300 0xc420970340 0xc420971cc0 0xc420971ce0 0xc420971d00 0xc420971d20 0xc420971d40 0xc420971d60 0xc420971d80 0xc420971da0 0xc420971dc0 0xc420971e00 0xc420971e20 0xc420971e40 0xc420971e60 0xc420971e80 0xc420971ea0 0xc420971ec0 0xc420971ee0 0xc420971f00 0xc420971f20 0xc420971f40 0xc420971f60 0xc420971f80 0xc420971fa0 0xc420971fc0 0xc420971fe0 0xc420946020 0xc420946040 0xc420946080 0xc4209460a0 0xc4209460e0 0xc420946100 0xc420946140 0xc420946180 0xc4209461a0 0xc4209461c0 0xc4209461e0 0xc420946200 0xc420946240 0xc420946260 0xc4209462a0 0xc4209462e0 0xc420946300 0xc420887740 0xc420887760 0xc420887780 0xc4208877c0 0xc4208877e0 0xc420887820 0xc420887840 0xc420887860 0xc4208878a0 0xc4208878c0 0xc4208878e0 0xc420887900 0xc420887920 0xc420887940 0xc420887980 0xc4208879a0 0xc4208879c0 0xc4208879e0 0xc420887a00 0xc420887a20 0xc420887a40 0xc420887a60 0xc420887a80 0xc420887ac0 0xc420887ae0 0xc420887b00 0xc420887b20 0xc420887b60 0xc420887ba0 0xc420887bc0 0xc420887be0 0xc420887c20 0xc420887c40 0xc420887c60 0xc420887c80 0xc420887ca0 0xc420887cc0 0xc420887ce0 0xc420887d00 0xc420887d40 0xc420887d60 0xc420887d80 0xc420887da0 0xc420887de0 0xc420887e20 0xc420887e40 0xc420887e60 0xc420887ea0 0xc420887f40 0xc420887f60 0xc420887fa0 0xc420887fc0 0xc420880000 0xc420880020 0xc420880060 0xc420880080 0xc4208800a0] SwapPlacement: BootOptions:<nil> VAppConfig:<nil> FtInfo:<nil> RepConfig:<nil> VAppConfigRemoved:<nil> VAssertsEnabled:<nil> ChangeTrackingEnabled:<nil> Firmware: MaxMksConnections:0 GuestAutoLockEnabled:<nil> ManagedBy:<nil> MemoryReservationLockedToMax:<nil> NestedHVEnabled:<nil> VPMCEnabled:<nil> ScheduledHardwareUpgradeInfo:<nil> VmProfile:[] MessageBusTunnelEnabled:<nil> Crypto:<nil> MigrateEncryption:}

From hostd.log:

-->       (vim.vm.device.VirtualDeviceSpec) {
-->          operation = "add",
-->          fileOperation = "create",
-->          device = (vim.vm.device.VirtualDisk) {
-->             key = -10,
-->             deviceInfo = (vim.Description) null,
-->             backing = (vim.vm.device.VirtualDisk.FlatVer2BackingInfo) {
-->                fileName = "[]/vmfs/volumes/5270fe3c-8a720327/admiring_easley-ed4e9bfa392fa73adef12e2faf0ae3506bde4459ee3e970db060130f7ddec86d/ed4e9bfa392fa73adef12e2faf0ae3506bde4459ee3e970db060130f7ddec86d.vmdk",
-->                datastore = 'vim.Datastore:datastore-22',
-->                backingObjectId = <unset>,
-->                diskMode = "persistent",
-->                split = <unset>,
-->                writeThrough = <unset>,
-->                thinProvisioned = true,
-->                eagerlyScrub = <unset>,
-->                uuid = <unset>,
-->                contentId = <unset>,
-->                changeId = <unset>,
-->                parent = (vim.vm.device.VirtualDisk.FlatVer2BackingInfo) {
-->                   fileName = "[]/vmfs/volumes/5270fe3c-8a720327/VCH-0-6993/VIC/42150f1d-1476-c32b-90eb-e5f3a5428868/images/c40e708042c6a113e5827705cf9ace0377fae60cf9ae7a80ea8502d6460d87c6/c40e708042c6a113e5827705cf9ace0377fae60cf9ae7a80ea8502d6460d87c6.vmdk",
-->                   datastore = <unset>,
-->                   backingObjectId = <unset>,
-->                   diskMode = "",
-->                   split = <unset>,
-->                   writeThrough = <unset>,
-->                   thinProvisioned = <unset>,
-->                   eagerlyScrub = <unset>,
-->                   uuid = <unset>,
-->                   contentId = <unset>,
-->                   changeId = <unset>,
-->                   parent = (vim.vm.device.VirtualDisk.FlatVer2BackingInfo) null,
-->                   deltaDiskFormat = <unset>,
-->                   digestEnabled = <unset>,
-->                   deltaGrainSize = <unset>,
-->                   deltaDiskFormatVariant = <unset>,
-->                   sharing = <unset>,
-->                   keyId = (vim.encryption.CryptoKeyId) null
-->                },
-->                deltaDiskFormat = <unset>,
-->                digestEnabled = <unset>,
-->                deltaGrainSize = <unset>,
-->                deltaDiskFormatVariant = <unset>,
-->                sharing = <unset>,
-->                keyId = (vim.encryption.CryptoKeyId) null
-->             },
-->             connectable = (vim.vm.device.VirtualDevice.ConnectInfo) null,
-->             slotInfo = (vim.vm.device.VirtualDevice.BusSlotInfo) null,
-->             controllerKey = 100,
-->             unitNumber = 0,
-->             capacityInKB = 8388608,
-->             capacityInBytes = <unset>,
-->             shares = (vim.SharesInfo) null,
-->             storageIOAllocation = (vim.StorageResourceManager.IOAllocationInfo) null,
-->             diskObjectId = <unset>,
-->             vFlashCacheConfigInfo = (vim.vm.device.VirtualDisk.VFlashCacheConfigInfo) null,
-->             iofilter = <unset>,
-->             vDiskId = (vim.vslm.ID) null
-->          },
-->          profile = (vim.vm.ProfileSpec) [
-->             (vim.vm.EmptyProfileSpec) {
-->             }
-->          ],
-->          backing = (vim.vm.device.VirtualDeviceSpec.BackingSpec) null
-->       },

/vmfs/volumes/5270fe3c-8a720327/VCH-0-6993/VIC/42150f1d-1476-c32b-90eb-e5f3a5428868/images/c40e708042c6a113e5827705cf9ace0377fae60cf9ae7a80ea8502d6460d87c6/c40e708042c6a113e5827705cf9ace0377fae60cf9ae7a80ea8502d6460d87c6.vmdk does not exist on the datastore since the regression tests do a docker rmi busybox which deletes that file. The problem is the VCH still thinks it exists, since the image cache in the docker personality is out of date. On docker personality bring up the image cache gets its last known state from the kv store, and docker rmi does not currently save the image cache to the kv store. So simply saving the image cache every time the cache changes should fix this.

@cgtexmex
Copy link
Contributor

@hmahmood I'm not sure how the cache could get out of sync -- I think this could be a configuration issue where the NFS datastore isn't available on the failed over host.

Notice on line 2835 of hostd.log ending in 130 -- the HA target -- vSphere has moved over an existing VM (was on the powered off host) and the backing files are not on the new host.

2017-02-03T17:52:45.332Z info hostd[CB44B70] [Originator@6876 sub=Vmsvc opID=411e8c-01-76-6a4c user=vpxuser:VSPHERE.LOCAL\Administrator] CreateVm: Path is not on object datastore ([]/vmfs/volumes/21bf4c6d-8882e282/cranky_lumiere-db6580aa6aab716aa94bd8806771065ad16e1faa09867c4c3237e69a23b4b075/cranky_lumiere-db6580aa6aab716aa94bd8806771065ad16e1faa09867c4c3237e69a23b4b075.vmx)

Another example is at line 3170 of the new host...

Thoughts?

@cgtexmex
Copy link
Contributor

@jzt take a look at this if you have a second...

@cgtexmex
Copy link
Contributor

@hmahmood I've deployed to nimbus via the robot script and the NFS datastore is presented to all the hosts in the cluster. I'll look at it a bit more in the AM with my Nimbus environment.

@hmahmood
Copy link
Contributor

@cgtexmex the cache is fine as long the VM is not restarted. The problem happens if the VM is restarted, and we rehydrate the cache from the kv store. Since we never updated the kv store, the cache is outdated from the start.

@cgtexmex
Copy link
Contributor

@hmahmood so the steps to reproduce would be the following?

  1. docker pull busybox
  2. docker rmi busybox
  3. restart appliance
  4. docker run -d busybox /bin/top

@hmahmood
Copy link
Contributor

Yes that is what the test is doing.

@cgtexmex
Copy link
Contributor

Yep..I'll give it a try -- if that's the problem then I'm surprised that the VCH Restart Test isn't failing...

@hmahmood
Copy link
Contributor

The key is the docker rmi call. I just reproduced this sans NFS or HA on standalone ESX.

@cgtexmex
Copy link
Contributor

ok..ok...I finally believe you. :-)

@rajanashok
Copy link
Contributor

rajanashok commented Feb 27, 2017

Verified this from Hasan's remote branch running the HA test on nightlies, and it passed. New issue to include the HA test back to the nightly runs #3956 .

rajanashok added a commit that referenced this issue Mar 1, 2017
Adding HA tests back to nightlies, as we now have #3845 fixed.
@mhagen-vmware
Copy link
Contributor Author

Hit this again on latest longevity run:

Mar  6 2017 10:45:10.760Z ERROR Handler for POST /v1.23/containers/create returned error: Server error from portlayer: Invalid configuration for device '1'.

container-logs.zip

@mhagen-vmware
Copy link
Contributor Author

From docker-personality.log:

Mar  6 2017 10:45:10.760Z DEBUG [ END ] [github.com/vmware/vic/lib/apiservers/engine/backends.(*Container).containerCreate:322] [2.351510564s] Container.containerCreate
Mar  6 2017 10:45:10.760Z DEBUG [ END ] [github.com/vmware/vic/lib/apiservers/engine/backends.(*Container).ContainerCreate:260] [2.35175085s] 
Mar  6 2017 10:45:10.760Z ERROR Handler for POST /v1.23/containers/create returned error: Server error from portlayer: Invalid configuration for device '1'.
2017-03-06 18:33:59.352120 I | http: TLS handshake error from 10.129.223.201:33436: tls: first record does not look like a TLS handshake
2017-03-06 18:33:59.729335 I | http: TLS handshake error from 10.129.223.201:33438: tls: first record does not look like a TLS handshake
2017-03-06 18:34:00.067156 I | http: TLS handshake error from 10.129.223.201:33440: tls: first record does not look like a TLS handshake

@mhagen-vmware
Copy link
Contributor Author

This was on build 3851

@mhagen-vmware mhagen-vmware reopened this Mar 6, 2017
@mhagen-vmware mhagen-vmware removed this from the Sprint 3 milestone Mar 6, 2017
@hmahmood
Copy link
Contributor

Logs from this build are not available, so I am not sure how to debug this. I would close this until we see this again in the longevity tests. @mhagen-vmware

@mhagen-vmware
Copy link
Contributor Author

We have not re-run longevity since the last failure, in order to close this as no-repro or lacking logs we need to at least run longevity once more.

@mhagen-vmware
Copy link
Contributor Author

I will be starting a longevity run this afternoon on the latest build. If this doesn't repro, we can close this then.

@dougm dougm self-assigned this Mar 15, 2017
@dougm dougm added this to the Sprint 5 milestone Mar 15, 2017
@dougm
Copy link
Member

dougm commented Mar 23, 2017

@mhagen-vmware any related results from longevity tests yet?

@mhagen-vmware
Copy link
Contributor Author

I was not able to repro as of yet on two different runs, I will start another and if this fails to repro then we can close this as non-repro for now.

@mhagen-vmware
Copy link
Contributor Author

Not able to repro currently. Will re-open if we see this again.

@mhagen-vmware
Copy link
Contributor Author

@mhagen-vmware mhagen-vmware reopened this May 10, 2017
@mhagen-vmware mhagen-vmware removed this from the Sprint 5 milestone May 10, 2017
@mdubya66 mdubya66 added this to the Sprint 10 milestone May 24, 2017
@anchal-agrawal
Copy link
Contributor

Seen in https://ci.vcna.io/vmware/vic/10877 in the Group Add test in 3-04-Docker-Compose-Fixtures.

@mdubya66 mdubya66 removed this from the Sprint 10 milestone May 25, 2017
@anchal-agrawal
Copy link
Contributor

Seen in Nightly 6.0 5-4-High-Availability test:
5-4-High-Availability.zip

@anchal-agrawal
Copy link
Contributor

Seen in Nightly 6.0 5-4-High-Availability test:
5-4-High-Availability.zip

@anchal-agrawal
Copy link
Contributor

Seen in Nightly 6.0 5-4-High-Availability:
5-4-High-Availability.zip

@anchal-agrawal
Copy link
Contributor

Seen in Nightly 6.0 5-4-High-Availability:
5-4-High-Availability.zip

@anchal-agrawal
Copy link
Contributor

Seen in Nightly 6.0:
5-4-High-Availability.zip

@anchal-agrawal
Copy link
Contributor

Seen in Nightly 6.0:
5-4-High-Availability.zip

@sgairo sgairo self-assigned this Aug 2, 2017
@mhagen-vmware mhagen-vmware added this to the Sprint 14 milestone Aug 2, 2017
@sgairo
Copy link
Contributor

sgairo commented Aug 2, 2017

Closing this as a dup of #4666, fix is being tracked there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/defect Behavior that is inconsistent with what's intended priority/p0
Projects
None yet
Development

No branches or pull requests

8 participants