Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

Can't provision Hybrid cluster issues with cse provisioning #4027

Closed
sylus opened this issue Oct 15, 2018 · 4 comments · Fixed by #4058
Closed

Can't provision Hybrid cluster issues with cse provisioning #4027

sylus opened this issue Oct 15, 2018 · 4 comments · Fixed by #4058

Comments

@sylus
Copy link
Contributor

sylus commented Oct 15, 2018

Is this a request for help?: Yes


Is this an ISSUE or FEATURE REQUEST? (choose one): Bug


What version of acs-engine?: v0.23.1


Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)

Kubernetes 1.11.3

What happened:

Running the following:

az group deployment create --name "k8s-acs-stc-mgmt" `
>>                            --resource-group "k8s-management-rg" `
>>                            --template-file "./_output/k8s-acs-stc-mgmt/azuredeploy.json" `
>>                            --parameters "./_output/k8s-acs-stc-mgmt/azuredeploy.parameters.json"

Where the initial cluster spec is:

{
  "apiVersion": "vlabs",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "orchestratorRelease": "1.11",
      "kubernetesConfig": {
        "kubernetesImageBase": "k8s-gcrio.azureedge.net/",
        "useManagedIdentity": false,
        "privateCluster": {
          "enabled": true
        },
        "addons": [
          {
            "name": "tiller",
            "enabled" : true
          },
          {
            "name": "kubernetes-dashboard",
            "enabled" : true
          },
          {
            "name": "rescheduler",
            "enabled" : true
          }
        ]
      }
    },
    "masterProfile": {
      "count": 1,
      "dnsPrefix": "k8s-acs-stc-mgmt",
      "vmSize": "Standard_D4s_v3",
      "OSDiskSizeGB": 200,
      "vnetSubnetId": "/subscriptions/XXX/resourceGroups/network-management-rg/providers/Microsoft.Network/virtualNetworks/k8s-mgmt-vnet/subnets/k8sMasterSubnet",
      "firstConsecutiveStaticIP": "172.20.58.4",
      "vnetCidr": "172.20.58.0/23"
    },
    "agentPoolProfiles": [
      {
        "name": "linuxpool1",
        "count": 3,
        "customNodeLabels": {
          "os": "linux"
        },
        "vmSize": "Standard_D8s_v3",
        "OSDiskSizeGB": 200,
        "storageProfile" : "ManagedDisks",
        "availabilityProfile": "AvailabilitySet",
        "vnetSubnetId":  "/subscriptions/XXX/resourceGroups/network-management-rg/providers/Microsoft.Network/virtualNetworks/k8s-mgmt-vnet/subnets/k8sAgentSubnet"
      },
      {
        "name": "windowspool1",
        "count": 1,
        "customNodeLabels": {
          "os": "windows"
        },
        "osType": "Windows",
        "vmSize": "Standard_D8s_v3",
        "OSDiskSizeGB": 200,
        "storageProfile" : "ManagedDisks",
        "availabilityProfile": "AvailabilitySet",
        "vnetSubnetId":  "/subscriptions/XXX/resourceGroups/network-management-rg/providers/Microsoft.Network/virtualNetworks/k8s-mgmt-vnet/subnets/k8sAgentSubnet"
      }
    ],
    "linuxProfile": {
      "adminUsername": "azureuser",
      "ssh": {
        "publicKeys": [
          {
            "keyData": "ssh-rsa"
          }
        ]
      }
    },
    "windowsProfile": {
      "adminUsername": "",
      "adminPassword": ""
    },
    "servicePrincipalProfile": {
      "clientId": "",
      "secret": ""
    }
  }
}

The error I get is:

Deployment failed. Correlation ID: 706b8f5d-d0a5-42db-ae2f-a6f89b273182. {
  "status": "Failed",
  "error": {
    "code": "ResourceDeploymentFailure",
    "message": "The resource operation completed with terminal provisioning state 'Failed'.",
    "details": [
      {
        "code": "VMExtensionProvisioningError",
        "message": "VM has reported a failure when processing extension 'cse-agent-0'. Error message: \"Finished executing command\"."
      }
    ]
  }
}

What you expected to happen: The cluster to successfully provision

How to reproduce it (as minimally and precisely as possible): Given in summary

Anything else we need to know: N/A

@sylus
Copy link
Contributor Author

sylus commented Oct 15, 2018

I checked the logs on my cse-agent-0 node and think my problem is related to the following:

root@k8s-linuxpool1-42020970-0:/var/log/azure/custom-script# cat handler.log
+ /var/lib/waagent/Microsoft.Azure.Extensions.CustomScript-2.0.6/bin/custom-script-extension install
time=2018-10-15T16:04:41Z version=v2.0.6/git@1008306-clean operation=install seq=0 event=start
time=2018-10-15T16:04:41Z version=v2.0.6/git@1008306-clean operation=install seq=0 status="not reported for operation (by design)"
time=2018-10-15T16:04:41Z version=v2.0.6/git@1008306-clean operation=install seq=0 event="migrate to mrseq" error="Can't find out seqnum from /var/lib/waagent/Microsoft.Azure.Extensions.CustomScript-2.0.6/status, not enough files."
time=2018-10-15T16:04:41Z version=v2.0.6/git@1008306-clean operation=install seq=0 event="created data dir" path=/var/lib/waagent/custom-script
time=2018-10-15T16:04:41Z version=v2.0.6/git@1008306-clean operation=install seq=0 event=installed
time=2018-10-15T16:04:41Z version=v2.0.6/git@1008306-clean operation=install seq=0 status="not reported for operation (by design)"
time=2018-10-15T16:04:41Z version=v2.0.6/git@1008306-clean operation=install seq=0 event=end
Writing a placeholder status file indicating progress before forking: /var/lib/waagent/Microsoft.Azure.Extensions.CustomScript-2.0.6/status/0.status
+ nohup /var/lib/waagent/Microsoft.Azure.Extensions.CustomScript-2.0.6/bin/custom-script-extension enable
time=2018-10-15T16:04:42Z version=v2.0.6/git@1008306-clean operation=enable seq=0 event=start
time=2018-10-15T16:04:42Z version=v2.0.6/git@1008306-clean operation=enable seq=0 event=pre-check
time=2018-10-15T16:04:42Z version=v2.0.6/git@1008306-clean operation=enable seq=0 event="comparing seqnum" path=mrseq
time=2018-10-15T16:04:42Z version=v2.0.6/git@1008306-clean operation=enable seq=0 event="seqnum saved" path=mrseq
time=2018-10-15T16:04:42Z version=v2.0.6/git@1008306-clean operation=enable seq=0 event="reading configuration"
time=2018-10-15T16:04:42Z version=v2.0.6/git@1008306-clean operation=enable seq=0 event="read configuration"
time=2018-10-15T16:04:42Z version=v2.0.6/git@1008306-clean operation=enable seq=0 event="validating json schema"
time=2018-10-15T16:04:42Z version=v2.0.6/git@1008306-clean operation=enable seq=0 event="json schema valid"
time=2018-10-15T16:04:42Z version=v2.0.6/git@1008306-clean operation=enable seq=0 event="parsing configuration json"
time=2018-10-15T16:04:42Z version=v2.0.6/git@1008306-clean operation=enable seq=0 event="parsed configuration json"
time=2018-10-15T16:04:42Z version=v2.0.6/git@1008306-clean operation=enable seq=0 event="validating configuration logically"
time=2018-10-15T16:04:42Z version=v2.0.6/git@1008306-clean operation=enable seq=0 event="validated configuration"
time=2018-10-15T16:04:42Z version=v2.0.6/git@1008306-clean operation=enable seq=0 event="creating output directory" path=/var/lib/waagent/custom-script/download/0
time=2018-10-15T16:04:42Z version=v2.0.6/git@1008306-clean operation=enable seq=0 event="created output directory"
time=2018-10-15T16:04:42Z version=v2.0.6/git@1008306-clean operation=enable seq=0 files=0
time=2018-10-15T16:04:42Z version=v2.0.6/git@1008306-clean operation=enable seq=0 event="executing command" output=/var/lib/waagent/custom-script/download/0
time=2018-10-15T16:04:42Z version=v2.0.6/git@1008306-clean operation=enable seq=0 event="executing protected commandToExecute" output=/var/lib/waagent/custom-script/download/0
time=2018-10-15T16:24:43Z version=v2.0.6/git@1008306-clean operation=enable seq=0 event="failed to execute command" error="command terminated with exit status=100" output=/var/lib/waagent/custom-script/download/0
time=2018-10-15T16:24:43Z version=v2.0.6/git@1008306-clean operation=enable seq=0 event="enable failed"
time=2018-10-15T16:24:43Z version=v2.0.6/git@1008306-clean operation=enable seq=0 event="failed to handle" error="failed to execute command: command terminated with exit status=100"

@sylus
Copy link
Contributor Author

sylus commented Oct 15, 2018

Also the first error I get is:

Deployment failed. Correlation ID: a49ac658-e5c4-441c-aec5-f2f1347c0f2a. {
  "error": {
    "code": "InvalidTemplate",
    "message": "Unable to process template language expressions for resource '/subscriptions/XXX/resourceGroups/k8s-management-rg/providers/Microsoft.Compute/virtualMachines/42020k8s9010' at line '1' and column '85623'. 'The template parameter 'masterSubnet' is not found. Please see https://aka.ms/arm-template/#parameters for usage details.'"
  }

But I manually add the managedSubNet variable to get around that and the errors above will hapen. I am a bit confused as was told the CI tests were green for hybrid cluster. Not really sure what I am doing wrong. ^_^

@sylus
Copy link
Contributor Author

sylus commented Oct 15, 2018

So it seems to be working except for the master subnet variable when I remove storing my values in key vault. Have to assume is something wrong with my configuration.

Sorry for the noise!

@tariq1890
Copy link
Contributor

We are currently working on fixing this issue. Thanks for reporting @sylus

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants