-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HA Test: POST /v1.21/containers/create returned error: Server error from portlayer: Invalid configuration for device '1'. #3845
Comments
tl;dr: not HA related. Reproduced the error; from port-layer.log:
From hostd.log:
|
@hmahmood I'm not sure how the cache could get out of sync -- I think this could be a configuration issue where the NFS datastore isn't available on the failed over host. Notice on line 2835 of hostd.log ending in 130 -- the HA target -- vSphere has moved over an existing VM (was on the powered off host) and the backing files are not on the new host.
Another example is at line 3170 of the new host... Thoughts? |
@jzt take a look at this if you have a second... |
@hmahmood I've deployed to nimbus via the robot script and the NFS datastore is presented to all the hosts in the cluster. I'll look at it a bit more in the AM with my Nimbus environment. |
@cgtexmex the cache is fine as long the VM is not restarted. The problem happens if the VM is restarted, and we rehydrate the cache from the kv store. Since we never updated the kv store, the cache is outdated from the start. |
@hmahmood so the steps to reproduce would be the following?
|
Yes that is what the test is doing. |
Yep..I'll give it a try -- if that's the problem then I'm surprised that the VCH Restart Test isn't failing... |
The key is the |
ok..ok...I finally believe you. :-) |
Verified this from Hasan's remote branch running the HA test on nightlies, and it passed. New issue to include the HA test back to the nightly runs #3956 . |
Adding HA tests back to nightlies, as we now have #3845 fixed.
Hit this again on latest longevity run:
|
From docker-personality.log:
|
This was on build 3851 |
Logs from this build are not available, so I am not sure how to debug this. I would close this until we see this again in the longevity tests. @mhagen-vmware |
We have not re-run longevity since the last failure, in order to close this as no-repro or lacking logs we need to at least run longevity once more. |
I will be starting a longevity run this afternoon on the latest build. If this doesn't repro, we can close this then. |
@mhagen-vmware any related results from longevity tests yet? |
I was not able to repro as of yet on two different runs, I will start another and if this fails to repro then we can close this as non-repro for now. |
Not able to repro currently. Will re-open if we see this again. |
Seen in https://ci.vcna.io/vmware/vic/10877 in the |
Seen in Nightly 6.0 |
Seen in Nightly 6.0 |
Seen in Nightly 6.0 |
Seen in Nightly 6.0 |
Seen in Nightly 6.0: |
Seen in Nightly 6.0: |
Closing this as a dup of #4666, fix is being tracked there. |
Saw the above error on the HA test after fixing the shared storage situation and making sure to grab the latest IP address after the HA event. The HA appears to have worked, afterwards docker info, docker pull, and docker images all worked, but on attempting to create the first container it failed.
5-4-High-Availability-VCH-0-7423-container-logs.zip
Marking this a high priority as it is a critical error, and PM expected that we already supported HA fully.
From portlayer.log:
The text was updated successfully, but these errors were encountered: