-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support redirection through webbrowser for OAuth authentication #162
Conversation
Adding the keyring feature instead of the threadlocal made the test fail. IMHO the threadlocal seems not so useful, this works different from the Trino CLI and JDBC driver. However maybe from security perspective it's a good default to not cache the token over multiple threads. In trino CLI: Further queries in the CLI session do not require additional logins while the authentication token remains valid. Token expiration depends on the external authentication type configuration. Expired tokens force you to log in again. in trino JDBC: @lukasz-walkiewicz : what do you think? |
4483b8c
to
367eb45
Compare
Open items:
|
trino/auth.py
Outdated
@@ -251,13 +262,34 @@ def _get_token(self, token_server, response, **kwargs): | |||
|
|||
raise exceptions.TrinoAuthError("Exceeded max attempts while getting the token") | |||
|
|||
def _get_token_from_cache(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be thread locked, the same for storing a token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, i see what you mean, but thread locking just reads and writes will not prevent us from retrieving the token multiple times in parallel threads when we get 40X errors. Ideally we will only launch the webbrowser flow only once, even in a multithread case. Still need to test this scenario more in depth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thread locking has been implemented, covering reads/writes to cache and launching the OIDC flow only once per process and per Connection object. I think this is how it should work. The default is now class instance based caching, if keyring is installed within venv, use keyring caching per username and host. If webbrowser is not installed, for example in a headless environment (CLI), the url is printed in the console as before.
I think i can add another test to proof that the caching of the token is per connection and not per cursor as before.
trino/auth.py
Outdated
self._thread_local.token = self._get_token(token_server, response, **kwargs) | ||
return self._retry_request(response, **kwargs) | ||
token = self._get_token(token_server, response, **kwargs) | ||
self._store_token_to_cache(token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default, let's not cache a token like in JDBC driver to not introduce a potential breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current behaviour is printing the OIDC url in the console? Could you elaborate on how the current OIDC is used in production?
I actually agree that caching the token in the keychain shouldn't be the default from a security perspective as it will persist even outside of the process context. However within the same process, given the same connection (host and username), I think it can be cached. Woud you agree?
Actually the unit test test_multithreaded_oauth2_authentication_flow
makes you think that it only retrieves the token once per thread. However that's not actually happening because the _OAuth2TokenBearer is recreated for each http_session (every time we create a cursor a separate TrinoRequest is being created). Which makes the threadlocal storage only work per cursor. That's why i opted to make the _OAuth2TokenBearer an instance property instead of being recreated for every http_session, IMHO it should be linked to the lifetime of the connection instead of a cursor.
def set_http_session(self, http_session):
http_session.auth = _OAuth2TokenBearer(http_session, self._redirect_auth_url)
return http_session
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, are you suggesting caching a token by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default before was already caching the token per cursor and per thread, which to me seems illogical. The newly default introduced in this PR is a thread-safe per connection cache, giving the user the choice to either create a connection once and share it over multiple threads or create a connection per thread. If you look at the test (test_multithreaded_oauth2_authentication_flow) you can see that before this change the token is already cached: the token is only retrieved once per thread, while there are many more requests. However the test didn't take into account that the Bearer class is instantiated for each cursor, so from a user perspective it seemed like no caching is performed.
I think for most users the keyring will provide the most seamless experience. they can either do pip install keyring
or pip install 'trino-python-client[local-secure-storage]'
to even get less browser popups as they will solely depend on the lifetime of the token.
367eb45
to
8edb8e4
Compare
README.md
Outdated
@@ -169,7 +169,7 @@ the [`JWT` authentication type](https://trino.io/docs/current/security/jwt.html) | |||
|
|||
- `OAuth2Authentication` class can be used to connect to a Trino cluster configured with | |||
the [OAuth2 authentication type](https://trino.io/docs/current/security/oauth2.html). | |||
- A callback to handle the redirect url can be provided via param `redirect_auth_url_handler`, by default it just outputs the redirect url to stdout. | |||
- A callback to handle the redirect url can be provided via param `redirect_auth_url_handler`, by default it will try to launch a web browser to go through the authentication flow, if that fails it just outputs the redirect url to stdout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add info about caching to README.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added more details about the caching in the README.
trino/auth.py
Outdated
self._thread_local.token = self._get_token(token_server, response, **kwargs) | ||
return self._retry_request(response, **kwargs) | ||
token = self._get_token(token_server, response, **kwargs) | ||
self._store_token_to_cache(token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, are you suggesting caching a token by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think i can add another test to proof that the caching of the token is per connection and not per cursor as before.
That would be great.
First of all, thanks for the contribution @mdesmet.
Environment:
Which platform you tested it on? |
Tested on following platform:
Checking on https://pypi.org/project/keyring/ Compatibility - macOS |
8edb8e4
to
63222f7
Compare
So it's possible that even if you have installed keyring, it doesn't work as there is no suitable backend. I adapted the code that it will fallback to just store the token as if keyring is not installed. |
I added the test: test_token_only_retrieved_once_per_connection |
63222f7
to
d26e575
Compare
@lukasz-walkiewicz, @hovaesco : Any other remarks or concerns? FYI launching the webbrowser and storing credentials in keyring is also what Snowflake is doing in their python connector, I think it will greatly improve the user experience of dbt in combination with oauth authentication. Am I missing any usage patterns that this behavior for OAuth authentication won't be compatible with? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. One question, if keyring is installed then a token is cached by default? Basically, having keyring installed is like an enabler to cache a token?
trino/auth.py
Outdated
|
||
@staticmethod | ||
def _determine_user_host(url, headers) -> Tuple[str, str]: | ||
user = headers.get("X-Trino-User") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a default value for user
if there is no X-Trino-User
(just in case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have a point here: the user is not really required for using OAuth. Maybe we should just get rid of trying to determine the user and only use the host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it used to create a key for saving a password by keyring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's it. Now given that the lifetime of a token isn't generally that long, I would think it's probably OK to get rid of the user. Also the keyring is not installed by default, so it means the user opts in on this behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we use a different user between connections? The token in keyring will be overwritten by the newest one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you don't use keyring, the token is stored as instance variable within the Authentication class. So it's not reused unless you would reuse that same instance over multiple connections.
With keyring it will just try to reuse the existing token, only after it expires it will prompt again. Now the earlier code was dependent on the user set in the trino.dbapi.Connection
object, this user is straight from user input. It is not required to set it up for OAuth connections.
The whole point about using keyring is mostly about providing a default user-friendly experience for user ran CLI applications like dbt, I think it would be quite uncommon if they would have multiple users for the same host and also using OAuth. The workaround would be then to not use keyring in this case. Maybe we need to describe this better in the docs?
I also don't think a lot of server applications will use this webbrowser and keyring feature. They can support redirection by adding their own handle_redirect_auth_url
.
I also would never install keyring by default, it's the users them self that opt in by installing keyring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need to describe this better in the docs?
We did something similar for JDBC driver: https://trino.io/docs/current/installation/jdbc.html -> externalAuthenticationTokenCache
README.md
Outdated
* DBAPI | ||
A callback to handle the redirect url can be provided via param `redirect_auth_url_handler`, by default it will try to launch a web browser to go through the authentication flow, if that fails it just outputs the redirect url to stdout. | ||
|
||
The OAuth2 token will be cached either per `trino.auth.OAuth2Authentication` instance or, when keyring is installed, it will be cached within a secure backend (macos keychain, windows credential locker, ...) under a key including host and user of the Trino connection. Keyring can be installed using `pip install trino-python-client['local-secure-storage']`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pip install 'trino[secure-local-storage]'
or pip install -e '.[secure-local-storage]'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what three dots mean in this sentence, maybe it's better to include here all storages across all OS or just add etc
(macos keychain, windows credential locker, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pip install 'trino[secure-local-storage]'
orpip install -e '.[secure-local-storage]'
I would only add the first one as the second one is only for local development (might confuse normal users).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's ok.
Even without keyring installed the token is always cached per connection (actually per instance of OAuth2Authentication), compared to the old code where the token was cached per cursor. When you install keyring, the token will be reused even when you restart or rerun your application (eg multiple dbt runs won't prompt authentication again if your token is still valid). |
d26e575
to
f7baeec
Compare
trino/auth.py
Outdated
@@ -157,8 +161,12 @@ def __eq__(self, other): | |||
|
|||
|
|||
def handle_redirect_auth_url(auth_url): | |||
print("Open the following URL in browser for the external authentication:") | |||
print(auth_url) | |||
print("Opening your browser for the external authentication:") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there 2 unrelated changes:
- opening web browser on authentication,
- keyring to store the obtained token.
Please use seperate commits for them.
tests/unit/test_dbapi.py
Outdated
|
||
@httprettified | ||
def test_token_only_retrieved_once_per_connection(sample_post_response_data): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pylint didn't pick this up?
tests/unit/test_dbapi.py
Outdated
|
||
|
||
@httprettified | ||
def test_token_only_retrieved_once_per_connection(sample_post_response_data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it once per connection or once per OAuth2Authentication
instance? Does having a test in which multiple connections reuse the same OAuth2Authentication
instance make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's like you said, we can add a test to proof this is actually working.
trino/auth.py
Outdated
|
||
@staticmethod | ||
def _determine_host(url) -> Tuple[str, str]: | ||
host = urlparse(url).hostname |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return urlparse(url).hostname
self._keyring = None | ||
print("Warning: keyring module not found. OAuth2 token will not be stored in keyring.") | ||
self._token = None | ||
self._token_lock = threading.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why there are 2 locks needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The token lock is for reads and writes to the token storage.
The token attempt lock is to only have one oauth attempt at a time and block the other responses to be retried. As part of the oauth attempt the token is written, hence we need two locks.
To be honest I wasn't entirely sure about this but apparently my intuition was right as following code shows
import threading
test = threading.Lock()
with test:
print("first lock acquired")
print(test.locked())
with test:
# not getting here as above line is waiting until the lock is released
print("second lock acquired")
print(test.locked())
print("second lock released")
print(test.locked())
print(test.locked())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand. So there are two critical sections:
- get/set token from the cache,
- obtain and replace the token.
For the latter_inside_oauth_attempt_lock
does guarantee that only one thread tries to get a new token at a given time but I believe other threads will still try to get a new token after obtaining this lock even though the token was just refreshed and is still valid:
with self._inside_oauth_attempt_lock:
if not self._inside_oauth_attempt:
self._inside_oauth_attempt = True
self._attempt_oauth(response, **kwargs)
self._inside_oauth_attempt = False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're indeed right, it's not correct. I will make the change and add a test for this behaviour.
trino/auth.py
Outdated
self._keyring = importlib.import_module("keyring") | ||
except ImportError: | ||
self._keyring = None | ||
print("Warning: keyring module not found. OAuth2 token will not be stored in keyring.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it be printed for all clients which are not using keyring
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as the _OAuth2TokenBearer
is instantiated as part of the init of the OAuth2Authentication
class, it will be shown only once. I thought it's a good way to remind the user that they can reduce the amount of browser oauth popups by installing keyring. It will indeed be shown if you don't install keyring and use the OAuth2Authentication
class.
Maybe we can remove this and just write good docs. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the motiviation but it should use a logger and maybe not on warning. It's a hint, not an indication of a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will use a logger. Not sure if anyone runs this with logging set to INFO, but I agree to your point that it's not really a warning from a logging perspective. Anyway it's explained in the docs already.
trino/auth.py
Outdated
except self._keyring.errors.NoKeyringError: | ||
pass | ||
|
||
# keyring is not installed, so we fall back to a per connection cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per instance of OAuth2Authentication
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's indeed correct, also mentioned in earlier reviews. I will update the comment.
trino/auth.py
Outdated
print(auth_url) | ||
print("Opening your browser for the external authentication:") | ||
result = webbrowser.open_new(auth_url) | ||
if not result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about having console and browser handlers seperately? This would allow users to use either
connect(
"coordinator",
auth=OAuth2Authentication(redirect_auth_url_handler=WebBrowserHandler())
or
connect(
"coordinator",
auth=OAuth2Authentication(redirect_auth_url_handler=ConsoleHandler())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no objection against having both handlers, but what's the use of the ConsoleHandler
? If needed, a user can provide their own handler and override the default behaviour of launching the webbrowser. They would probably do something with the url instead of just printing it to the console?
The behaviour is actually pretty similar to how the JDBC driver works. See https://trino.io/docs/current/installation/jdbc.html?highlight=externalAuthentication
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The jdbc driver actually defines an interface: https://github.com/trinodb/trino/blob/master/client/trino-client/src/main/java/io/trino/client/auth/external/RedirectHandler.java with multiple implementations:
DESKTOP_OPEN(new DesktopBrowserRedirectHandler()),
SYSTEM_OPEN(new SystemOpenRedirectHandler()),
PRINT(new SystemOutPrintRedirectHandler()),
OPEN(new CompositeRedirectHandler(ImmutableList.of(SYSTEM_OPEN, DESKTOP_OPEN))),
ALL(new CompositeRedirectHandler(ImmutableList.of(OPEN, PRINT)))
and even though the default is ALL
users can also use PRINT
which does not try to open up the browser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will adapt the implementation to provide a ConsoleHandler
and WebBrowserHandler
.
c92323e
to
6c24cdd
Compare
Awesome job 🎉 One general question. Have you fully tested it with dbt-trino? |
Haven't tried that. Will try it during next week. On first sight we might have to ensure that only one instance of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks much much better now 👍
One last comment, could you please change the name of the PR to something more descriptive?
trino/auth.py
Outdated
principal: Optional[str] = None, | ||
delegate: bool = False, | ||
ca_bundle: Optional[str] = None, | ||
self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
undo or move to a different commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick skim, still need to look at the _OAuth2TokenBearer
changes for correctness.
trino/auth.py
Outdated
print(auth_url) | ||
class RedirectHandler(metaclass=abc.ABCMeta): | ||
""" | ||
Abstract class for Oauth redirect handlers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: improve docstrings here and for classes below, like OAuth
, remove extra spaces, keep dots etc.
1f551ea
to
ba27735
Compare
I have succesfully tested the feature with dbt and also added the changes in the dbt PR. We still will need to update the trino-python-client version there, after release. If you want to try it yourself, you can run following commands
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impl looks good to me. I'd love to be able to integration test these instead of doing manually but that's separate effort.
Some comments on README.
There was still some issue with the locking that i fixed yesterday. The integration tests actually lock on each query, which means that every request including retry is queued instead of executed in parallel. The locking is required to make httpretty threadsafe. So the existing tests are not completely covering the real nature of a multiple requests blocking until one finishes the oauth authentication process. trino-python-client/tests/unit/test_client.py Lines 589 to 590 in 0430df3
|
I don't think Or do you mean we lock per-instance of connection? |
It's not that we want to share a cursor over multiple threads. It's about the requests that happen within the cursor execution. The cursor is the first place where we actually talk to the coordinator, as the connection is just a cursor factory. So within a mulithreaded execution (each thread with their own cursor), multiple requests receive a 401 error and should wait until the first lock goes through the OAuth authentication flow. When that one thread completes and stores the token, the other threads should wake up and retry the request. Within the integration tests, as we put a lock on every cursor execution, this effectively lets only one thread go through the request, 401, retrieve token and retry request flow. while all other threads are waiting and then execute with the token provided by the first thread. This doesn't match the real behaviour where actually multiple threads fail on 401 (as mentioned above). I tried without the lock but then the httpretty request counts not match up anymore. So the issue lies within the test framework as mentioned in the comment in the code. trino-python-client/tests/unit/test_client.py Lines 587 to 591 in 0430df3
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested it with dbt-trino and it works flawlessly, thank you for great work!
ba27735
to
18fdee8
Compare
We can't add add |
18fdee8
to
1b52f62
Compare
Thanks, please add that as a comment to |
1b52f62
to
342f862
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % better name if anyone has any
342f862
to
045c2f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your patience @mdesmet and contributing this useful feature.
Fixes #161