Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[auth] Allow use of cloud access tokens for hail batch authorization #13131

Merged
merged 14 commits into from
Sep 5, 2023

Conversation

daniel-goldstein
Copy link
Contributor

@daniel-goldstein daniel-goldstein commented May 31, 2023

Deprecate hail-minted API keys in favor of using access tokens from the identity providers already associated with user identities. For more context and a high-level overview of the implementation, see this RFC

@daniel-goldstein
Copy link
Contributor Author

@danking This is far from done, evidenced by the fact that for some reason I'm getting tons of QoB test failures (but some are passing! :/) but I would appreciate a first pass on this if you want to do a high-level review. I do have a rough RFC that is not up to date, so let me know if you'd prefer to start the discussion from such a doc instead of the code.

Copy link
Contributor

@danking danking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all seems broadly what I expected. I didn't carefully review the auth flow yet.

The idea is that we're either using:

  1. latent credentials like google application default or the azure managed identity.
  2. An OAuth2 flow to get authorization to use the user's email/openid.

In either case, we get an OAuth2 access token for email or openid, we send that to Hail which uses it to verify that the caller produced a valid email/openid access token. We accept that as proof of identity. The user sends that along with every request as proof it is still logged in. If the token expires, the client requests a new one and sends that along in the next request.

We are gonna use either a "hail" audience or create our own scopes to ensure that Google doesn't let arbitrary third parties steal these tokens and log into, say, Terra.

Does that all check out?

hail/python/hailtop/auth/auth.py Show resolved Hide resolved
hail/python/hailtop/auth/auth.py Outdated Show resolved Hide resolved
hail/python/hailtop/auth/auth.py Show resolved Hide resolved
hail/python/hailtop/hailctl/auth/login.py Fixed Show resolved Hide resolved
@@ -22,5 +22,6 @@ else
JAR_LOCATION="${TEST_STORAGE_URI}/${NAMESPACE}/jars/${TOKEN}/${REVISION}.jar"
fi

python3 -m hailtop.aiotools.copy -vvv 'null' '[{"from":"'${SHADOW_JAR}'", "to":"'${JAR_LOCATION}'"}]'
gcloud storage cp ${SHADOW_JAR} ${JAR_LOCATION}
# python3 -m hailtop.aiotools.copy -vvv 'null' '[{"from":"'${SHADOW_JAR}'", "to":"'${JAR_LOCATION}'"}]'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:/. Timeout issues, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh ya.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm -rf and please PR something to back off the default timeout so we can feel confident in copy.

.createScoped(
"openid",
"https://www.googleapis.com/auth/userinfo.email",
"https://www.googleapis.com/auth/userinfo.profile"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need profile or openid?

Copy link
Contributor Author

@daniel-goldstein daniel-goldstein Jun 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I can try to remove them.

override def accessToken(): String = {
val context = new TokenRequestContext()
// TODO I hope I dont have to give this scope
context.addScopes("https://management.azure.com/.default")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openid should be enough no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya that's why I have the TODO, going to make sure I can trim that down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually replaced by the hail-specific OAuth scope.

oauth2_permission_scope {
admin_consent_description = "Allow hailctl to access the Hail Batch service on behalf of the signed-in user."
admin_consent_display_name = "hailctl"
user_consent_description = "Allow hailctl to access the Hail Batch service on your behalf."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also the hail Python library right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm ya I can change the text here to the Hail Python Library.

build.yaml Outdated
if [ "$?" -ne 0 ]
then
kubectl delete secret -n {{ default_ns.name }} ssl-config-hail-root
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated updates of the root cert, yeah?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya this was merged in a different PR already. Dropped this commit.

'https://www.googleapis.com/auth/userinfo.email',
'https://www.googleapis.com/auth/cloud-platform',
'https://www.googleapis.com/auth/appengine.admin',
'https://www.googleapis.com/auth/compute',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can drop the last three, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to change the defaults in this PR, as these are used for other purposes like the FS which definitely needs at least GCS scopes. I think as a follow-up we should get rid of default scopes in general and require use sites to specify the min set of scopes they ned.

Copy link
Contributor

@danking danking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dismiss if ready for another look

danking
danking previously requested changes Aug 7, 2023
Copy link
Contributor

@danking danking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent changes. Some conceptual (but, I guess, not RFC-level, more code-level concepts) things we need to iron out and get written down. A few nits. Overall feels good!

@@ -518,6 +518,12 @@ async def rest_login(request: web.Request) -> web.Response:
)


@routes.get('/api/v1alpha/oauth2-client')
async def hailctl_oauth_client(request): # pylint: disable=unused-argument
idp = 'Google' if CLOUD == 'gcp' else 'Microsoft'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: I don't love that gcp is a magic string for Google, but it is what is for now.

Shouldn't you use IdentityProvider.GOOGLE.value and IdentityProvider.MICROSOFT.value for the idp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're right, I'll use that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump, the literal strings are still here.

auth/auth/auth.py Show resolved Hide resolved
auth/auth/auth.py Show resolved Hide resolved
auth/auth/auth.py Show resolved Hide resolved
raise web.HTTPUnauthorized()


async def get_userinfo_from_login_id_or_hail_identity_id(request, login_id):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to only be used with a IdP's uid, right? I think I'm a bit confused on hail identity uid vs login id. The login id is the username from the email, right? And hail identity uid is the uid from the cloud provider?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

login_id is what we use to identify human users. In GCP it is currently an email, in Azure it is the actual sub of that user in AAD.

hail_identity is the email in GCP or common name in Azure of the robot identity assigned to the user. hail_identity_uid is the actual sub of the robot identity in both of those places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to only be used with a IdP's uid, right

This is not the case because we do not store UIDs for human users in GCP, only their email.

hail/src/main/scala/is/hail/services/Requester.scala Outdated Show resolved Hide resolved
infra/azure/modules/auth/main.tf Outdated Show resolved Hide resolved
infra/azure/modules/auth/main.tf Show resolved Hide resolved
@danking
Copy link
Contributor

danking commented Aug 21, 2023

review is in progress, sorry for the delay.

@@ -518,6 +518,12 @@ async def rest_login(request: web.Request) -> web.Response:
)


@routes.get('/api/v1alpha/oauth2-client')
async def hailctl_oauth_client(request): # pylint: disable=unused-argument
idp = 'Google' if CLOUD == 'gcp' else 'Microsoft'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump, the literal strings are still here.

auth/auth/auth.py Show resolved Hide resolved
auth/auth/auth.py Outdated Show resolved Hide resolved
Copy link
Contributor

@danking danking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not finished with review, but here's what I have

hail/python/hailtop/auth/flow.py Show resolved Hide resolved
hail/python/hailtop/aiocloud/aioazure/credentials.py Outdated Show resolved Hide resolved
batch/batch/cloud/azure/worker/worker_api.py Outdated Show resolved Hide resolved
batch/batch/worker/worker_api.py Outdated Show resolved Hide resolved
hail/python/hailtop/aiocloud/aioazure/credentials.py Outdated Show resolved Hide resolved
hail/python/hailtop/auth/flow.py Show resolved Hide resolved
hail/python/hailtop/auth/flow.py Show resolved Hide resolved
hail/python/hailtop/auth/flow.py Show resolved Hide resolved
hail/python/hailtop/batch_client/aioclient.py Outdated Show resolved Hide resolved
@@ -58,12 +58,53 @@ resource "azuread_application_password" "oauth2" {
application_object_id = azuread_application.oauth2.object_id
}

resource "random_uuid" "hailctl_oauth2_idenfier_uri_id" {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to apply this before we merge, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya. This is one of the annoying papercuts I was talking about. Unless we stack them, this PR and the ubuntu PR cannot be passing at the same time because they have different terraform changes.

@danking
Copy link
Contributor

danking commented Aug 24, 2023

I think we're very close; lots of little details of course, but we're close.

Copy link
Contributor

@danking danking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A big but important and necessary change.

Great work.

@@ -205,6 +206,7 @@ def create_vm_config(
DOCKER_ROOT_IMAGE=$(jq -r '.docker_root_image' userdata)
DOCKER_PREFIX=$(jq -r '.docker_prefix' userdata)
REGION=$(jq -r '.region' userdata)
HAIL_AZURE_OAUTH_SCOPE=$(jq -r '.hail_azure_oauth_scope' userdata)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder of this before my approval.

@danking danking added the WIP label Aug 31, 2023
@danking
Copy link
Contributor

danking commented Aug 31, 2023

I put WIP on in light of the infra ordering issue.

@danking danking merged commit dc1f086 into hail-is:main Sep 5, 2023
5 checks passed
@daniel-goldstein daniel-goldstein added terraform full-deploy Requires a full deployment at this commit before following commits can be deployed labels Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full-deploy Requires a full deployment at this commit before following commits can be deployed NIST 800-53
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants