-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[auth] IDP access tokens over hail-minted tokens (#2)
* [auth] IDP access tokens over hail-minted tokens * address comments * change python to Python * remove unused endorsement section
- Loading branch information
1 parent
504b2ff
commit 11e785d
Showing
1 changed file
with
342 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,342 @@ | ||
============== | ||
OAuth 2.0 Authorization in the Hail Service | ||
============== | ||
|
||
.. author:: Daniel Goldstein | ||
.. date-accepted:: 2023/08/02 | ||
.. implemented:: Leave blank. This will be filled in with the first Hail version which | ||
implements the described feature. | ||
.. header:: This proposal is `discussed at this pull request <https://github.com/hail-is/hail-rfc/pull/2>`_. | ||
.. sectnum:: | ||
.. contents:: | ||
.. role:: Python(code) | ||
|
||
Motivation | ||
========== | ||
|
||
This proposal focuses on the way by which users of Hail services | ||
authorize programmatic access to the Hail API. | ||
|
||
The Hail Service authenticates users using the OAuth2 protocol, relying on either | ||
GCP IAM or Azure AD as the identity providers. However, while the Hail Service | ||
relies on these identity providers for authentication, it currently does *not* use them | ||
to authorize access to Hail APIs. The Hail ``auth`` service acts as an Authentication | ||
Server for the Hail API, minting long-lived tokens after the OAuth2 flow that are persisted | ||
on user machines. Minting our own tokens imposes a maintenance and security burden | ||
on the Hail team and any operators of a Hail Service. | ||
|
||
This proposal deprecates the use of Hail-minted tokens in favor of using | ||
access tokens from the identity providers listed above to authorize API access. | ||
This removes the security burden of minting and protecting our own authorization | ||
tokens while reducing code complexity since cloud access tokens are already | ||
used within the Hail codebase to access cloud APIs. | ||
|
||
Proposed Change Specification | ||
============================= | ||
|
||
Currently, requests to the Hail APIs send one of the aforementioned Hail-minted tokens in the | ||
``Authorization`` header of HTTP requests. This token is stored in a well-known | ||
location on the user's disk. | ||
For user machines, this file is persisted during the login flow ``hailctl auth login``. | ||
For use in Batch jobs, the tokens are stored in Kubernetes secrets and delivered | ||
to the Batch Worker as part of the job spec. | ||
|
||
This proposal adds the ability for HTTP requests from Hail clients to send | ||
OAuth2 access tokens in the ``Authorization`` header instead of Hail-minted | ||
tokens. The ``auth`` service will: | ||
|
||
- Assert the validity, expiration and audience of access tokens and associate | ||
them with users of the system. | ||
- Support Hail-minted tokens for backwards compatibility with old clients | ||
for a limited time. Eventually, support for Hail-minted tokens will be dropped. | ||
|
||
Hail clients will be updated to use access tokens in requests to Hail APIs. How | ||
they do so is described in the following subsections. | ||
|
||
|
||
Overview of Relevant OAuth2 Background | ||
-------------------------------------- | ||
|
||
Prior to discussing the details of the implementation, it is worth covering some | ||
background on OAuth2. Note that much of this functionality is encapsulated in the | ||
Google OAuth and AAD client libraries that we use, but a thorough understanding | ||
is valuable to ensure that we are using them properly. | ||
|
||
We'll consider four primary entities in an OAuth2 interaction: | ||
|
||
- The user/identity | ||
- The client (e.g. the Hail Python library) | ||
- The Authorization Server (Google IAM or AAD) | ||
- The API/Resource Server (the Hail service) | ||
|
||
For clients operated by a human user, the client must obtain credentials to act | ||
on behalf of the user before it can perform any further operations. | ||
The client uses an `OAuth2 client secret <https://developers.google.com/identity/protocols/oauth2/native-app>`_ | ||
to initiate a web-based flow with the Authorization Server. During this flow, the | ||
user must authenticate and authorize the client to act on the user's behalf with | ||
a given set of capabilities (scopes). | ||
|
||
From this point forward, the client can perform operations without manual intervention, | ||
using the credentials granted from the flow in the human case, and using a robot identity's | ||
key or password in the robot case. | ||
|
||
When the client wants to perform some operation against the Resource Server, it must | ||
first request an access token from the Authorization Server. | ||
Three important factors to note about the access token are: | ||
|
||
- The scopes the token is granted. These specify to the API server the purposes | ||
for which the token may be used. It is the responsibility of the API server to | ||
respect the scopes. | ||
- The identity represented by that token. This is either the user or robot identity. | ||
In JWTs, the identity is uniquely identified by the | ||
`sub <https://www.rfc-editor.org/rfc/rfc7519#section-4.1.2>`_ (Subject) claim. This prevents | ||
the token from being used to act on a different identity's behalf. Note that the | ||
sub need not be globally unique, but it must be unique amongst all subs at this | ||
identity provider. | ||
- The "intended audience" of the token. What this means exactly varies between | ||
Google and Azure, but in both cases is represented by the | ||
`aud <https://www.rfc-editor.org/rfc/rfc7519#section-4.1.3>`_ (Audience) claim. | ||
It is the responsibility of the resource server to respect this so that it does | ||
not accept tokens intended for other APIs. | ||
|
||
The client should then request a token with the minimal set of scopes required to | ||
perform the desired operation (in our case just enough to identify the user) and with | ||
an audience that will be accepted by the Resource Server. It then sends this token | ||
in the ``Authorization`` header of requests to the Resource Server. | ||
|
||
When the Resource Server receives the request, it can verify the validity and | ||
expiration of the token, identify the user through the ``sub`` claim, and finally | ||
accept the token only if its ``aud`` claim is one that the Resource Server recognizes | ||
and permits. This way tokens from that user that were generated and intended | ||
for other systems cannot be replayed against this Resource Server. | ||
|
||
Unfortunately Google and Azure have slightly different approaches to this interaction. | ||
Both scenarios will involve installing an OAuth2 client credential on the user's machine | ||
to be used by the Hail Python library, and they will involve similar changes to the ``auth`` | ||
service. However, their implementations vary slightly when it comes to the audience | ||
claim, so the process to obtain access tokens will look slightly different. | ||
The following sections detail how that process would work with those two identity providers. | ||
|
||
|
||
Google Implementation | ||
--------------------- | ||
|
||
When a client application requests an access token from Google IAM, the ``aud`` | ||
claim is always set to the unique ID of the client. On a user's machine, ``aud`` | ||
would be the client ID of the OAuth2 Client used to obtain that credential. For | ||
service accounts, it would be the unique ID of the service account in IAM. Note | ||
that in the service account case ``aud == sub``, but not in the case of the Hail | ||
Python library acting on behalf of a user. | ||
|
||
I find this unintuitive, but I suppose this can be interpreted as "the intended | ||
recipient of this token is the application that requested it, and Resource Servers | ||
should maintain a list of trusted applications". | ||
|
||
Thus, when the ``auth`` service validates an access token, it must assert that | ||
the ``aud`` claim is *either* the Client ID for the Python library OAuth2 Client | ||
or the unique ID of a Hail-owned service account in the system. Doing so protects | ||
against client applications that we don't control impersonating human users to our | ||
system. | ||
|
||
Another detail of note is that Google IAM access tokens are *opaque*, so in order | ||
to decode them the ``auth`` server must submit them to a Google API. The ``auth`` | ||
service should take care to properly cache requests for no more than one minute | ||
to prevent rate-limiting by Google IAM. Requests to Google IAM scale linearly with | ||
concurrent users, but that is not a concern at time of writing since | ||
Hail services receive single to double digit concurrent users. | ||
|
||
|
||
Azure Implementation | ||
--------------------- | ||
|
||
Azure, however, interprets "intended recipient" as the Resource Server for which | ||
a token is destined, and infers that recipient based on the scopes requested | ||
by the client. For example, requesting the scope ``https://management.azure.com/.default`` | ||
results in tokens whose ``aud`` claim is the ID of the Graph API. In order to use | ||
non-Azure Resource Servers, AAD allows you to create custom scopes. We register | ||
a custom scope like ``api://<SOME_UNIQUE_ID>`` with the AAD OAuth2 Client application | ||
and then any code that requests that scope will receive a token whose ``aud`` | ||
scope is the ID of that OAuth2 Client application. | ||
|
||
This simplifies the work of the ``auth`` service, as there is a single audience | ||
it must trust. However, it means that we must communicate this custom scope to | ||
all our environments. | ||
|
||
As opposed to the opaque access tokens in Google, Azure access tokens are JWTs. | ||
That means they can be decoded and cryptographically validated by the ``auth`` | ||
service without making a network request. | ||
|
||
|
||
User Machine Configuration Changes | ||
---------------------------------- | ||
|
||
If we remove Hail-minted tokens, the Hail Python client needs a mechanism | ||
for requesting access tokens on behalf of the user. The way to do this is to have | ||
a Desktop OAuth2 client credential that lives on the user's machine that administers | ||
the OAuth2 flow and is later used to request tokens. | ||
|
||
Instead of depositing a ``tokens.json`` file during the login flow, | ||
``hailctl auth login`` will instead result in the following file placed in the | ||
user's configuration directory at ``$XDG_CONFIG_HOME/hail/identity.json``. | ||
|
||
.. code-block:: json | ||
{ | ||
"idp": "Google" | "Microsoft", | ||
... Optional IDP-Specific OAuth2 client secret ... | ||
} | ||
This file contains the identity provider the user used to log into the Hail | ||
Service and a OAuth2 client credential file issued by the Hail Service | ||
for that identity provider along with the refresh token. This client credential | ||
will be used in future requests by the client to obtain scoped access tokens | ||
from the identity provider that are intended for the Hail Service. In Azure, | ||
this will include the custom scope that the client needs for requests. | ||
|
||
For further information on the details of the OAuth2 flow, see the User Login | ||
Flow Changes section. | ||
|
||
If a user does not reauthenticate after updating their Hail version, | ||
the client will continue to use extant ``tokens.json`` file. | ||
|
||
|
||
Batch Job Configuration Changes | ||
------------------------------- | ||
Batch jobs do not authenticate through an OAuth2 flow in the way that human users do. | ||
The service account keys or metadata server available in batch jobs both provide | ||
ways to easily obtain access tokens. All that the job needs to know is which identity | ||
provider it should use, so it will be provided with the following | ||
identity config: ``{"idp": "Google" | "Microsoft"}``. Instead of writing this to the | ||
filesystem on every job, Batch can provide this through a ``HAIL_IDENTITY_JSON`` environment | ||
variable. Without the presence of a specific OAuth2 client to use for generating tokens, | ||
the Hail library will fall back to the latent credentials in the environment, | ||
e.g. ``GOOGLE_APPLICATION_CREDENTIALS`` or the metadata server. | ||
|
||
In Azure, there will be another environment variable ``HAIL_AZURE_OAUTH_SCOPE`` | ||
that clients must use to obtain an appropriate audience claim. | ||
|
||
|
||
User Login Flow Changes | ||
----------------------- | ||
|
||
Currently, ``hailctl auth login`` performs a sort of mixed desktop and server | ||
OAuth2 login flow, which occurs in the following sequence: | ||
|
||
1. User executes ``hailctl auth login`` via the command line | ||
2. The user's machine prompts the Hail ``auth`` service to initiate a login flow | ||
by making a request to ``/api/v1alpha/login``. The ``auth`` service responds | ||
with an authorization URL that ``hailctl`` then opens in a browser. | ||
3. The user authenticates and provides user consent | ||
4. The OAuth2 provider authenticates the user and sends a callack to ``localhost`` | ||
with an authorization code. | ||
5. ``hailctl`` sends that authorization code to the ``auth`` service, which uses | ||
it to complete the OAuth flow, receiving an ID token, an access token and a refresh token. | ||
6. The ``auth`` service uses the ID token to identify the user and assert that the | ||
user has an account with the Service. | ||
7. The ``auth`` service mints a token that it sends in the response to ``hailctl``. | ||
8. ``hailctl`` persists the token for future authorization of API calls to the Service. | ||
|
||
|
||
The proposed ``hailctl auth login`` flow is as follows: | ||
|
||
1. User executes ``hailctl auth login`` via the command line | ||
2. ``hailctl`` obtains the OAuth2 client credentials from a well-known, public | ||
endpoint on the ``auth`` API. Note that it is OK to make this resource public | ||
as Desktop OAuth2 Client Secrets `are not considered secret <https://developers.google.com/identity/protocols/oauth2/native-app>`_ | ||
as they cannot necessarily store data confidentially on the user's machine. | ||
3. ``hailctl`` performs the full Desktop OAuth flow on the user's machine, | ||
persisting the ``refresh_token`` it receives at the end of the flow along with | ||
the OAuth2 client credentials. | ||
4. ``hailctl`` attempts to access the ``/userinfo`` endpoint on the ``auth`` service | ||
to confirm that the logged in user is registered with the Hail service. | ||
|
||
|
||
The programmatic OAuth2 flow will use a different OAuth2 client than that used | ||
in the typical Web flow. When conducting a web-based flow, the OAuth2 client credentials | ||
can be kept secret by the server and Google can verify that the request to initiate a | ||
login flow is coming from a source that owns the OAuth2 client. As such, it is valuable to | ||
keep the OAuth2 client actually secret. However, this does not exist in the world of | ||
Desktop applications, as client secrets stored on user devices *cannot be considered secret*. | ||
In order to preserve the integrity of the web-based login, it is best to maintain a separate | ||
OAuth2 client that is issued specifically for desktop applications. There is also an intuitive | ||
argument for why we should generate two OAuth clients, as the Hail Python library and the Hail | ||
web service are two distinct applications, and we could in the future want different scopes | ||
in those two environments. | ||
|
||
It is worth noting that attackers with access to the user's filesystem can use the | ||
``refresh_token`` to create access tokens. That being said, the access tokens | ||
that an attacker could obtain from this OAuth2 secret can only be used outside of the Hail | ||
Service to obtain the user's email. If an attacker wanted additional scopes they would need | ||
to initate an OAuth2 flow which would require manual user consent for the elevated permissions. | ||
More realistically, an attacker can just as easily obtain ``gcloud`` access tokens that are likely | ||
to be far more privileged. So it is reasonable to say that we are not introducing new | ||
vulnerabilities to the user's machine. | ||
|
||
|
||
Effect and Interactions | ||
----------------------- | ||
|
||
It is worth comparing the privileges obtained in both the current and proposed scenario | ||
to determine if there are any increased risks under the new regime. | ||
|
||
For Hail-minted access tokens: | ||
|
||
- An attacker who obtains a token can fully impersonate a user to the Hail Service | ||
- The token is *only* authorized to access the Hail Service | ||
- Tokens can be explicitly revoked by the user by executing ``hailctl auth logout`` but | ||
are otherwise long-lived. | ||
|
||
For Hail-audience client secret: | ||
|
||
- An attacker can just as easily access the client secret as they can the Hail tokens. | ||
The attacker can then generate access tokens if the user has previously logged in | ||
and the refresh token is still valid. | ||
- The audience claim of these access tokens will be the Hail Python package, so these | ||
tokens can only be used against the Hail Service. | ||
- Unlike the Hail-minted tokens, the Bearer token in the requests are short-lived | ||
access tokens. So any access tokens that might be leaked are unlikely to pose | ||
a security risk. | ||
- The client can dynamically configure the validity period for access tokens it | ||
generates. | ||
- The refresh token is also a long-lived credential, but can be invalidated by | ||
the user revoking it through ``hailctl auth logout``. | ||
|
||
|
||
Alternatives | ||
------------ | ||
|
||
An alternative to persisting a Hail-owned client secret on the user's machine | ||
is to use the latent credentials from ``gcloud`` Application Default Credentials. | ||
However, this is seen as an abuse of the OAuth2 model. Using Application Default | ||
Credentials would require that the ``auth`` service accept tokens with the | ||
``gcloud`` audience claim. It would obviate the need to authenticate with the | ||
Hail Service and any entity with a gcloud-generated user access token | ||
would be able to impersonate the user to the Hail Service. Additionally, the | ||
Hail Service, if compromised, could impersonate the user to other APIs that | ||
accept the ``gcloud`` audience claim. | ||
|
||
Another alternative is simply to not change our authorization model. Doing nothing | ||
would leave Hail Service operators with the management of token secrets. It would | ||
also make more difficult the integration of Hail services inside other | ||
environments that use access-token based authentication such as the Terra platform. | ||
|
||
Not an alternative, but an extension to this model could be encrypting and protecting | ||
access to the OAuth2 client secret using something like Apple Keychain or equivalent | ||
on other operating systems. The user would then be prompted to enter their password | ||
when ``hailctl`` attempts to access the file and would therefore make it obvious to | ||
the user if other applications try to do the same. Given that even ``gcloud`` does | ||
not do this, we are leaving it out of this initial proposal. | ||
|
||
|
||
Unresolved Questions | ||
-------------------- | ||
|
||
It is as of yet unclear whether regular rotation of client secrets stored on | ||
client devices should be performed. If that should be the case, we could do so | ||
without much effort because the Hail Service distributes the client secrets in | ||
the first place. We would simply need to configure the ``hailctl`` client to reinitiate | ||
a login flow when the credential expires or is revoked. | ||
|
||
It is also unclear whether there is any way to somehow restrict the audience of | ||
service account access tokens in Google as you can in Azure. I think this is a minor | ||
concern as the tokens we'll generate for Hail auth will be strictly scoped. |