-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-11004: [FlightRPC][Python] Header-based auth in clients #8959
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? Then could you also rename pull request title in the following format?
See also: |
return "" | ||
|
||
|
||
class HeaderAuthServerMiddlewareFactory(ServerMiddlewareFactory): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class should not be part of test_flight.py to be used in production. It should also be changed to have some API for checking username/password/bearer rather than being hard coded.
Unless this PR is really for header-based auth in just clients, in which case the description is not really correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is only for the client implementation of header based auth in Python, same as the C++ implementation done here: #8724. I've created a new subtask in ARROW-10486 that reflects this scope of work: https://issues.apache.org/jira/browse/ARROW-11004.
python/pyarrow/_flight.pyx
Outdated
@@ -1150,6 +1156,38 @@ cdef class FlightClient(_Weakrefable): | |||
self.client.get().Authenticate(deref(c_options), | |||
move(handler))) | |||
|
|||
def authenticateBasicToken(self, username, password, | |||
options: FlightCallOptions = None): | |||
"""Authenticate to the server with header token authentication. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really header token authentication, it's HTTP basic authentication
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in new commit.
@@ -307,6 +308,12 @@ cdef extern from "arrow/flight/api.h" namespace "arrow" nogil: | |||
CStatus Authenticate(CFlightCallOptions& options, | |||
unique_ptr[CClientAuthHandler] auth_handler) | |||
|
|||
# TODO: Add AuthenticateBasicToken |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in new commit.
|
||
def is_valid(self, token): | ||
"""Do nothing.""" | ||
return "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to return None than empty string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning None gives the error Flight RPC failed with Python exception "TypeError: expected bytes, NoneType found"
. Based on the method's documentation here, returning an empty string is allowed, I added a comment to clarify the behaviour.
python/pyarrow/tests/test_flight.py
Outdated
"""Validates incoming username and password.""" | ||
|
||
def start_call(self, info, headers): | ||
auth_header = headers.get('authorization') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is test code, but this should be a case insensitive lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in new commit.
self.token = token | ||
|
||
def sending_headers(self): | ||
return {'authorization': 'Bearer ' + self.token} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had been using Authorization (title case) in other languages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would seem the C++ grpc impl does not allow capitals in key, when using capital A, the header was rejected due to https://github.com/grpc/grpc/blob/b1df40104c1720bde3b22ded451a037f11b5dc54/src/core/lib/surface/validate_metadata.cc#L78
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should that be a comment for the headers parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Headers are supposed to be treated case-insensitively so even though other languages may use Authorization, it will all get folded to the same case
auth_header = headers.get('authorization') | ||
values = auth_header.split(' ') | ||
return [values[1].encode("utf-8")] | ||
raise flight.FlightUnauthenticatedError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it's an internal server error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in new commit.
c_string pw = tobytes(password) | ||
|
||
with nogil: | ||
result = self.client.get().AuthenticateBasicToken(deref(c_options), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't there be a FlightCallOption implementation to do this? How do you use HTTP basic auth without using authenticateBasicToken? (eg in Java you can directly call getFlightInfo without calling an authenticate function and supply a FlightCallOption with the basic auth info).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on how the C++ code was implemented , the Python wrapper can only wrap the AuthenticateBasicToken method in the C++ client. Seems like we would need to expand the C++ client implementation if we were to have C++/Python impl match the Java client implementation. #8724 -> C++ impl only exposed authenticateBasicToken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jduo are the changes in this PR sufficient?
I added a test example implementation called ClientHeaderAuthMiddleware, it intercepts headers from a HTTP header auth enabled server and make it accessible to the code using the FlightClient. A new test also demonstrates how to use the ClientMiddleware to intercept an Authorization header returned from the server and to use it in subsequent calls. |
python/pyarrow/_flight.pyx
Outdated
@@ -1150,6 +1156,38 @@ cdef class FlightClient(_Weakrefable): | |||
self.client.get().Authenticate(deref(c_options), | |||
move(handler))) | |||
|
|||
def authenticateBasicToken(self, username, password, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to add a description for self in the list of parameters below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self is never documented - it's implicitly passed by Python (it's equivalent to this
in Java et al)
@@ -1871,7 +1909,6 @@ cdef CStatus _server_authenticate(void* self, CServerAuthSender* outgoing, | |||
reader.poison() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a typo "poison"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a typo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to ensure a server doesn't use the Python reader beyond the lifetime of the C++ reader it wraps.
@@ -307,6 +308,11 @@ cdef extern from "arrow/flight/api.h" namespace "arrow" nogil: | |||
CStatus Authenticate(CFlightCallOptions& options, | |||
unique_ptr[CClientAuthHandler] auth_handler) | |||
|
|||
CResult[pair[c_string, c_string]] AuthenticateBasicToken( | |||
CFlightCallOptions& options, | |||
const c_string& username, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the uninitiated, what are the coding standards on Class & variable names? I see a mixture of camel case and underscore separated names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mixture here is because this is a set of Cython type definitions for C++ code, which uses the C++ conventions (e.g. CResult), but the overall project is in Python, which uses Python conventions (e.g. c_string)
self.factory = factory | ||
|
||
def received_headers(self, headers): | ||
auth_header = case_insensitive_header_lookup(headers, 'Authorization') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a mix of single and double quotes for strings. Should these be consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two are interchangeable.
python/pyarrow/tests/test_flight.py
Outdated
decoded = base64.b64decode(values[1]) | ||
pair = decoded.decode("utf-8").split(':') | ||
if not (pair[0] == 'test' and pair[1] == 'password'): | ||
raise flight.FlightUnauthenticatedError('Invalid credentials') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 uses of the string 'Invalid credentials'. Should this be made a constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
python/pyarrow/_flight.pyx
Outdated
@@ -1150,6 +1156,38 @@ cdef class FlightClient(_Weakrefable): | |||
self.client.get().Authenticate(deref(c_options), | |||
move(handler))) | |||
|
|||
def authenticateBasicToken(self, username, password, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's follow Python naming conventions - this should use snake_case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
python/pyarrow/_flight.pyx
Outdated
@@ -118,12 +118,18 @@ cdef class FlightCallOptions(_Weakrefable): | |||
write_options : pyarrow.ipc.IpcWriteOptions, optional | |||
IPC write options. The default options can be controlled | |||
by environment variables (see pyarrow.ipc). | |||
|
|||
headers : vector[pair[c_string, c_string]], optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type hint should use Python conventions (e.g. List[Tuple[str, str]]
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
python/pyarrow/_flight.pyx
Outdated
|
||
Returns | ||
------- | ||
pair : pair[string, string] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for the type hints here - use str
and Tuple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
@@ -307,6 +308,11 @@ cdef extern from "arrow/flight/api.h" namespace "arrow" nogil: | |||
CStatus Authenticate(CFlightCallOptions& options, | |||
unique_ptr[CClientAuthHandler] auth_handler) | |||
|
|||
CResult[pair[c_string, c_string]] AuthenticateBasicToken( | |||
CFlightCallOptions& options, | |||
const c_string& username, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mixture here is because this is a set of Cython type definitions for C++ code, which uses the C++ conventions (e.g. CResult), but the overall project is in Python, which uses Python conventions (e.g. c_string)
return "" | ||
|
||
|
||
def case_insensitive_header_lookup(headers, lookup_key): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is specifically to extract an authentication header based n the exception it raises, let's make sure the name reflects that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified the function to not raise an exception when the header is not found. The method is used in a test not related to authentication as well.
self.token = token | ||
|
||
def sending_headers(self): | ||
return {'authorization': 'Bearer ' + self.token} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Headers are supposed to be treated case-insensitively so even though other languages may use Authorization, it will all get folded to the same case
python/pyarrow/tests/test_flight.py
Outdated
client = FlightClient(('localhost', server.port)) | ||
token_pair = client.authenticateBasicToken(b'test', b'password') | ||
assert token_pair[0] == b'authorization' | ||
assert token_pair[1] == b'Bearer ' + b'token1234' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why concat here? (and above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
@@ -1871,7 +1909,6 @@ cdef CStatus _server_authenticate(void* self, CServerAuthSender* outgoing, | |||
reader.poison() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a typo.
@@ -1871,7 +1909,6 @@ cdef CStatus _server_authenticate(void* self, CServerAuthSender* outgoing, | |||
reader.poison() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to ensure a server doesn't use the Python reader beyond the lifetime of the C++ reader it wraps.
python/pyarrow/_flight.pyx
Outdated
@@ -1150,6 +1156,38 @@ cdef class FlightClient(_Weakrefable): | |||
self.client.get().Authenticate(deref(c_options), | |||
move(handler))) | |||
|
|||
def authenticateBasicToken(self, username, password, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self is never documented - it's implicitly passed by Python (it's equivalent to this
in Java et al)
c_string pw = tobytes(password) | ||
|
||
with nogil: | ||
result = self.client.get().AuthenticateBasicToken(deref(c_options), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jduo are the changes in this PR sufficient?
@lidavidm yes these changes are sufficient. |
Closes apache#8959 from tifflhl/ARROW-10486 Authored-by: tifflhl <tiffanylamhl@gmail.com> Signed-off-by: David Li <li.davidm96@gmail.com>
No description provided.