Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(SURE-5437) AzureGov Cloud Credentials fail to create/provisioning broken #98

Closed
kkaempf opened this issue Feb 3, 2023 · 31 comments
Closed

Comments

@kkaempf
Copy link

kkaempf commented Feb 3, 2023

Describe the bug

When creating a cloud-credential for Azure using the AzureUSGovernmentCloud environment option, the creation will fail. When inspecting the logs in the browser, there is an error referencing 'SubscriptionId not found'. The subscription id has been validated to work as the service principal being used in the credential was created using it.

This error is not present when using standard Azure. It is suspected that AzureGov endpoints are not correctly set internally on a dependent tool.

To Reproduce

Create Service Principal in AzureGov using

az ad sp create-for-rbac \ -name="<Rancher ServicePrincipal name>" \ --role="Contributor" \ --scopes="/subscriptions/<subscription Id>" * Sign into Rancher UI>Cluster-Management->Cloud-Credentials and create a new Azure Credential

Select AzureUSGovernmentCloud for the environment
Plug in generated appId into Client-Id field
Plug in generated password into Client-Secret field
Plug in subscription id into Subscription Id field
Click Create, see non-descriptive error
Open Console and inspect Network and fire command again
Inspect Response to see RESTful error describing the unknown subscription Id

Expected Result

Credential Created Successfully

Additional context

When bypassing the cloud credential creation using Terraform, the credential itself will fail to work when creating a cluster and repeat the same 'subscription id not found' error.

@mjura mjura self-assigned this Feb 10, 2023
@mjura
Copy link
Contributor

mjura commented Feb 13, 2023

It seems that it is related to #62

PR needs rebase and more refactoring from our site, currently I am working on this

mjura pushed a commit to mjura/aks-operator that referenced this issue Feb 21, 2023
Fixes: rancher#98

(cherry picked from commit 41a68e8)
mjura pushed a commit to mjura/aks-operator that referenced this issue Feb 21, 2023
Fixes: rancher#98

(cherry picked from commit 41a68e8)
@mjura
Copy link
Contributor

mjura commented Feb 21, 2023

Fix #136

mjura pushed a commit to mjura/aks-operator that referenced this issue Feb 24, 2023
Fixes: rancher#98

(cherry picked from commit 41a68e8)
@mjura
Copy link
Contributor

mjura commented Feb 24, 2023

Fix was submitted, there is also need to adjust Rancher part for it

@kkaempf
Copy link
Author

kkaempf commented Mar 15, 2023

Fix was submitted, there is also need to adjust Rancher part for it

Which part of Rancher ?!

@mjura
Copy link
Contributor

mjura commented Mar 29, 2023

Fix was submitted, there is also need to adjust Rancher part for it

Which part of Rancher ?!

@kkaempf kkaempf added this to the 2023-Q3-v2.7x milestone Mar 29, 2023
@kkaempf kkaempf added the kind/enhancement New feature or request label Mar 29, 2023
mjura added a commit to mjura/rancher that referenced this issue Mar 30, 2023
@kkaempf
Copy link
Author

kkaempf commented Apr 4, 2023

Waiting for access to AzureGov Cloud

@mjura
Copy link
Contributor

mjura commented Sep 25, 2023

@atoy3731 Can you please help us with testing this change once again ?

@atoy3731
Copy link

Testing this against v2.8.0-alpha1. I'm seeing progress, still looks to be an issue on the backend:

  • The UI stuff seems to be sending the right request to the backend now.. this is from the JS console when i go to create my cloud credential in the UI:

    {
     "clientId": "REDACTED",
     "clientSecret": "REDACTED",
     "subscriptionId": "REDACTED",
     "environment": "AzureUSGovernmentCloud"
    }
    
  • The backend is still struggling to validate the subscription it seems. That request responds with:

    {
        "error": "invalid credentials: subscription.SubscriptionsClient#Get: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code=\"SubscriptionNotFound\" Message=\"The subscription 'REDACTED' could not be found.\""
    }
    
  • Don't get a whole lot of feedback from the Rancher pod logs, even in debug mode. It is essentially the same message from above.

  • I tried to replicate the underlying logic directly and it did look like I made it a step further. This code resulted in a 401 Unauthenticated instead of a 404 Subscription not found, and if I changed the subscription ID to some junk data, it failed back to 404, so I'm guessing this snippet is making it past the subscription look-up stage:

     func main() {
         // Gov
         subscriptionID := "REDACTED"
         env := azure.USGovernmentCloud
     
         ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
         c := subscriptionsClient(env.ResourceManagerEndpoint)
         fmt.Println(env.ResourceManagerEndpoint)
         resp, err := c.Get(ctx, subscriptionID)
         fmt.Println(err.Error())
     }
    

    With valid subscriptionID (this same error occurs when using azure.PublicCloud and a valid public cloud Subscription ID):

     [11:44:23] /private/tmp/go-test $ go run main.go 
     https://management.usgovcloudapi.net/
     subscriptions.Client#Get: Failure responding to request: StatusCode=401 -- Original Error: autorest/azure: Service returned an error. Status=401 Code="AuthenticationFailed" Message="Authentication failed. The 'Authorization' header is missing."[11:44:32]
    

    With invalid, made-up subscriptionID:

     [11:46:26] /private/tmp/go-test $ go run main.go 
     https://management.usgovcloudapi.net/
     subscriptions.Client#Get: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="SubscriptionNotFound" Message="The subscription '00000000-0000-0000-0000-000000000000' could not be found."[11:46:28]
    

@mjura
Copy link
Contributor

mjura commented Sep 26, 2023

@atoy3731 Hi, can you please test for us following procedure.

  1. Create secret with Azure Cloud Credentials:
kubectl create secret generic cc-azure-gov \
               --from-literal=azurecredentialConfig-clientId="AZURE_CLIENT_ID" \ 
               --from-literal=azurecredentialConfig-clientSecret="AZURE_CLIENT_SECRET" \
               --from-literal=azurecredentialConfig-environment="AzureUSGovernmentCloud" \
               --from-literal=azurecredentialConfig-subscriptionId="AZURE_SUBSCRIPTION_ID"
  1. Create test AKC cluster
apiVersion: aks.cattle.io/v1
kind: AKSClusterConfig
metadata:
  name: test-azure-gov-cluester
spec:
  resourceLocation: "eastus"
  resourceGroup: "RESOURCE_GROUP_NAME"
  clusterName: "test-azure-gov-cluester"
  baseUrl: "https://management.usgovcloudapi.net/"
  authBaseUrl: "https://login.microsoftonline.us"
  azureCredentialSecret: "default:cc-azure-gov"
  dnsPrefix: "example-dns"
  privateCluster: false
  linuxAdminUsername: azureuser
  loadBalancerSku: "standard"
  kubernetesVersion: "1.26.6"
  nodePools:
  - name: "masters"
    count: 1
    vmSize: "Standard_DS2_v2"
    osDiskSizeGB: 128
    osDiskType: "Managed"
    maxPods: 110
    mode: "System"
    osType: "Linux"
  outboundType: "loadBalancer"
  1. Apply configuration
kubectl apply -f test-azure-gov-cluester.yaml
  1. Check logs from Rancher AKSv2 operator
kubectl logs -n cattle-system aks-config-operator-<POD_ID> -f

@atoy3731
Copy link

Just had to mod the resourceLocation from eastus to usgovvirginia, but doing that, it looks like everything is working on the provisioning side (if it is of consequence, the valid options are usgovarizona,usgovtexas,usgovvirginia):

time="2023-09-26T15:11:36Z" level=warning msg="Cluster [test-azure-gov-cluester] never advanced to creating status, will not delete AKS cluster"
time="2023-09-26T15:11:36Z" level=warning msg="Cluster [test-azure-gov-cluster] never advanced to creating status, will not delete AKS cluster"
time="2023-09-26T15:12:06Z" level=info msg="Creating cluster [test-azure-gov-cluester]"
time="2023-09-26T15:12:06Z" level=info msg="Checking if cluster [test-azure-gov-cluester] exists"
time="2023-09-26T15:12:07Z" level=info msg="Checking if resource group [atoy-dev] exists"
time="2023-09-26T15:12:07Z" level=info msg="Creating AKS cluster [test-azure-gov-cluester]"
time="2023-09-26T15:12:12Z" level=info msg="Waiting for cluster [test-azure-gov-cluster] to finish creating, cluster state: Creating"
time="2023-09-26T15:12:12Z" level=info msg="Waiting for cluster [test-azure-gov-cluster] to finish creating, cluster state: Creating"
time="2023-09-26T15:12:13Z" level=info msg="Waiting for cluster [test-azure-gov-cluster] to finish creating, cluster state: Creating"
time="2023-09-26T15:12:43Z" level=info msg="Waiting for cluster [test-azure-gov-cluster] to finish creating, cluster state: Creating"
time="2023-09-26T15:13:13Z" level=info msg="Waiting for cluster [test-azure-gov-cluster] to finish creating, cluster state: Creating"
time="2023-09-26T15:13:44Z" level=info msg="Waiting for cluster [test-azure-gov-cluster] to finish creating, cluster state: Creating"
time="2023-09-26T15:14:15Z" level=info msg="Waiting for cluster [test-azure-gov-cluster] to finish creating, cluster state: Creating"
time="2023-09-26T15:14:45Z" level=info msg="Waiting for cluster [test-azure-gov-cluster] to finish creating, cluster state: Creating"
time="2023-09-26T15:15:16Z" level=info msg="Cluster [test-azure-gov-cluester] created successfully"
time="2023-09-26T15:15:17Z" level=info msg="Checking configuration for cluster [test-azure-gov-cluester]"
time="2023-09-26T15:15:17Z" level=info msg="Configuration for cluster [test-azure-gov-cluester] was verified"

@mjura
Copy link
Contributor

mjura commented Sep 28, 2023

I have developed new aks-client to test Azure Gov communication
https://github.com/mjura/aks-client

@atoy3731
Copy link

Testing using @mjura 's client and think I found the root of the authentication issue.

This line I think needs to be update to (maybe conditionally?) use NewSubscriptionsClientWithBaseURI instead.

when I update the code to use that and pass in the baseURL as an argument, it works for GovCloud to find the subscription.

My updated function:

func NewSubscriptionServiceClient(cap *Capabilities, baseUrl string) (*subscription.SubscriptionsClient, error) {
	authorizer, err := NewAzureClientAuthorizer(cap)
	if err != nil {
		return nil, err
	}

	subscriptionService := subscription.NewSubscriptionsClientWithBaseURI(baseUrl)
	subscriptionService.Authorizer = authorizer

	return &subscriptionService, nil
}

And my implementation:

	client, err := NewSubscriptionServiceClient(&cred, cred.BaseURL)

Let me know if that helps. Thanks!

mjura added a commit to mjura/rancher that referenced this issue Sep 29, 2023
Issue: rancher/aks-operator#98

Signed-off-by: Michal Jura <mjura@suse.com>
@mjura mjura self-assigned this Sep 29, 2023
@mjura
Copy link
Contributor

mjura commented Sep 29, 2023

I have created fix for it rancher/rancher#43009

@atoy3731
Copy link

Tested the fix and it works! One note, if the secret coming from the UI starts with a ".", there looks like a parsing error (I unfortunately don't have the error message). If the password doesn't have the "." at the beginning, the flow works as expected to create a credential secret.

There are other issues in the Create Cluster screen, but I think that would deserve a new issue for tracking since I think it is in the UI.

@mjura
Copy link
Contributor

mjura commented Oct 4, 2023

It is blocked by rancher/rancher#43024

@gaktive
Copy link
Member

gaktive commented Oct 4, 2023

Related UI ticket: rancher/dashboard#9858

@kkaempf is this for 2.8.0? I'm being pinged on other fronts that this can wait to Q1 but I don't know the full scope of impact since there are now multiple tickets floating around. cc @nwmac

@gaktive
Copy link
Member

gaktive commented Oct 5, 2023

UI ticket now Q1, which is noted elsewhere outside of Github.

rohitsakala pushed a commit to rohitsakala/rancher that referenced this issue Oct 6, 2023
Issue: rancher/aks-operator#98

Signed-off-by: Michal Jura <mjura@suse.com>
@kkaempf kkaempf modified the milestones: v2.8.0, 2024-Q1-v2.8x Oct 17, 2023
nickwsuse pushed a commit to nickwsuse/rancher that referenced this issue Nov 21, 2023
Issue: rancher/aks-operator#98

Signed-off-by: Michal Jura <mjura@suse.com>
@mjura
Copy link
Contributor

mjura commented Nov 21, 2023

UI ticket is still in progress

@kkaempf
Copy link
Author

kkaempf commented Dec 6, 2023

Moving to "to test" as UI ticket got closed with a v2.8.0 milestone 🤞🏻

@kkaempf
Copy link
Author

kkaempf commented Dec 7, 2023

confirmed fixed per UI qa

@kkaempf kkaempf closed this as completed Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests