Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-11004: [FlightRPC][Python] Header-based auth in clients #8959

Closed
wants to merge 14 commits into from

Conversation

tifflhl
Copy link
Contributor

@tifflhl tifflhl commented Dec 18, 2020

No description provided.

@github-actions
Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@tifflhl tifflhl changed the title Arrow 10486: [Python] Header-based auth in servers Arrow-10486: [Python] Header-based auth in servers Dec 18, 2020
@tifflhl tifflhl changed the title Arrow-10486: [Python] Header-based auth in servers Arrow-10486: [Python] Header-based auth in client Dec 18, 2020
return ""


class HeaderAuthServerMiddlewareFactory(ServerMiddlewareFactory):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class should not be part of test_flight.py to be used in production. It should also be changed to have some API for checking username/password/bearer rather than being hard coded.

Unless this PR is really for header-based auth in just clients, in which case the description is not really correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is only for the client implementation of header based auth in Python, same as the C++ implementation done here: #8724. I've created a new subtask in ARROW-10486 that reflects this scope of work: https://issues.apache.org/jira/browse/ARROW-11004.

@@ -1150,6 +1156,38 @@ cdef class FlightClient(_Weakrefable):
self.client.get().Authenticate(deref(c_options),
move(handler)))

def authenticateBasicToken(self, username, password,
options: FlightCallOptions = None):
"""Authenticate to the server with header token authentication.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really header token authentication, it's HTTP basic authentication

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in new commit.

@@ -307,6 +308,12 @@ cdef extern from "arrow/flight/api.h" namespace "arrow" nogil:
CStatus Authenticate(CFlightCallOptions& options,
unique_ptr[CClientAuthHandler] auth_handler)

# TODO: Add AuthenticateBasicToken
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove TODO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in new commit.


def is_valid(self, token):
"""Do nothing."""
return ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to return None than empty string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning None gives the error Flight RPC failed with Python exception "TypeError: expected bytes, NoneType found". Based on the method's documentation here, returning an empty string is allowed, I added a comment to clarify the behaviour.

"""Validates incoming username and password."""

def start_call(self, info, headers):
auth_header = headers.get('authorization')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is test code, but this should be a case insensitive lookup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in new commit.

self.token = token

def sending_headers(self):
return {'authorization': 'Bearer ' + self.token}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had been using Authorization (title case) in other languages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would seem the C++ grpc impl does not allow capitals in key, when using capital A, the header was rejected due to https://github.com/grpc/grpc/blob/b1df40104c1720bde3b22ded451a037f11b5dc54/src/core/lib/surface/validate_metadata.cc#L78

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should that be a comment for the headers parameter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Headers are supposed to be treated case-insensitively so even though other languages may use Authorization, it will all get folded to the same case

auth_header = headers.get('authorization')
values = auth_header.split(' ')
return [values[1].encode("utf-8")]
raise flight.FlightUnauthenticatedError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it's an internal server error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in new commit.

c_string pw = tobytes(password)

with nogil:
result = self.client.get().AuthenticateBasicToken(deref(c_options),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't there be a FlightCallOption implementation to do this? How do you use HTTP basic auth without using authenticateBasicToken? (eg in Java you can directly call getFlightInfo without calling an authenticate function and supply a FlightCallOption with the basic auth info).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on how the C++ code was implemented , the Python wrapper can only wrap the AuthenticateBasicToken method in the C++ client. Seems like we would need to expand the C++ client implementation if we were to have C++/Python impl match the Java client implementation. #8724 -> C++ impl only exposed authenticateBasicToken.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jduo are the changes in this PR sufficient?

@tifflhl tifflhl changed the title Arrow-10486: [Python] Header-based auth in client ARROW-10486: [Python] Header-based auth in client Dec 21, 2020
@tifflhl tifflhl changed the title ARROW-10486: [Python] Header-based auth in client ARROW-11004: [Python] Header-based auth in client Dec 21, 2020
@tifflhl tifflhl changed the title ARROW-11004: [Python] Header-based auth in client ARROW-11004: [FlightRPC][Python] Header-based auth in clients Dec 21, 2020
@github-actions
Copy link

@tifflhl
Copy link
Contributor Author

tifflhl commented Dec 22, 2020

I added a test example implementation called ClientHeaderAuthMiddleware, it intercepts headers from a HTTP header auth enabled server and make it accessible to the code using the FlightClient. A new test also demonstrates how to use the ClientMiddleware to intercept an Authorization header returned from the server and to use it in subsequent calls.
I don't think it makes sense to include the example ClientMiddleware to the .pyx pr .pxd, since they are not callable objects in C++. Also seems like a convention to not define explicit middleware objects in python, but rather, provide examples?

@@ -1150,6 +1156,38 @@ cdef class FlightClient(_Weakrefable):
self.client.get().Authenticate(deref(c_options),
move(handler)))

def authenticateBasicToken(self, username, password,
Copy link
Contributor

@stevelorddremio stevelorddremio Dec 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to add a description for self in the list of parameters below?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self is never documented - it's implicitly passed by Python (it's equivalent to this in Java et al)

@@ -1871,7 +1909,6 @@ cdef CStatus _server_authenticate(void* self, CServerAuthSender* outgoing,
reader.poison()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a typo "poison"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a typo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to ensure a server doesn't use the Python reader beyond the lifetime of the C++ reader it wraps.

@@ -307,6 +308,11 @@ cdef extern from "arrow/flight/api.h" namespace "arrow" nogil:
CStatus Authenticate(CFlightCallOptions& options,
unique_ptr[CClientAuthHandler] auth_handler)

CResult[pair[c_string, c_string]] AuthenticateBasicToken(
CFlightCallOptions& options,
const c_string& username,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the uninitiated, what are the coding standards on Class & variable names? I see a mixture of camel case and underscore separated names.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mixture here is because this is a set of Cython type definitions for C++ code, which uses the C++ conventions (e.g. CResult), but the overall project is in Python, which uses Python conventions (e.g. c_string)

self.factory = factory

def received_headers(self, headers):
auth_header = case_insensitive_header_lookup(headers, 'Authorization')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a mix of single and double quotes for strings. Should these be consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two are interchangeable.

decoded = base64.b64decode(values[1])
pair = decoded.decode("utf-8").split(':')
if not (pair[0] == 'test' and pair[1] == 'password'):
raise flight.FlightUnauthenticatedError('Invalid credentials')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 uses of the string 'Invalid credentials'. Should this be made a constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

@lidavidm lidavidm self-requested a review December 23, 2020 19:49
@@ -1150,6 +1156,38 @@ cdef class FlightClient(_Weakrefable):
self.client.get().Authenticate(deref(c_options),
move(handler)))

def authenticateBasicToken(self, username, password,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's follow Python naming conventions - this should use snake_case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

@@ -118,12 +118,18 @@ cdef class FlightCallOptions(_Weakrefable):
write_options : pyarrow.ipc.IpcWriteOptions, optional
IPC write options. The default options can be controlled
by environment variables (see pyarrow.ipc).

headers : vector[pair[c_string, c_string]], optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type hint should use Python conventions (e.g. List[Tuple[str, str]])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.


Returns
-------
pair : pair[string, string]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for the type hints here - use str and Tuple

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

@@ -307,6 +308,11 @@ cdef extern from "arrow/flight/api.h" namespace "arrow" nogil:
CStatus Authenticate(CFlightCallOptions& options,
unique_ptr[CClientAuthHandler] auth_handler)

CResult[pair[c_string, c_string]] AuthenticateBasicToken(
CFlightCallOptions& options,
const c_string& username,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mixture here is because this is a set of Cython type definitions for C++ code, which uses the C++ conventions (e.g. CResult), but the overall project is in Python, which uses Python conventions (e.g. c_string)

return ""


def case_insensitive_header_lookup(headers, lookup_key):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specifically to extract an authentication header based n the exception it raises, let's make sure the name reflects that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the function to not raise an exception when the header is not found. The method is used in a test not related to authentication as well.

self.token = token

def sending_headers(self):
return {'authorization': 'Bearer ' + self.token}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Headers are supposed to be treated case-insensitively so even though other languages may use Authorization, it will all get folded to the same case

client = FlightClient(('localhost', server.port))
token_pair = client.authenticateBasicToken(b'test', b'password')
assert token_pair[0] == b'authorization'
assert token_pair[1] == b'Bearer ' + b'token1234'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why concat here? (and above)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

@@ -1871,7 +1909,6 @@ cdef CStatus _server_authenticate(void* self, CServerAuthSender* outgoing,
reader.poison()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a typo.

@@ -1871,7 +1909,6 @@ cdef CStatus _server_authenticate(void* self, CServerAuthSender* outgoing,
reader.poison()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to ensure a server doesn't use the Python reader beyond the lifetime of the C++ reader it wraps.

@@ -1150,6 +1156,38 @@ cdef class FlightClient(_Weakrefable):
self.client.get().Authenticate(deref(c_options),
move(handler)))

def authenticateBasicToken(self, username, password,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self is never documented - it's implicitly passed by Python (it's equivalent to this in Java et al)

c_string pw = tobytes(password)

with nogil:
result = self.client.get().AuthenticateBasicToken(deref(c_options),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jduo are the changes in this PR sufficient?

@jduo
Copy link
Member

jduo commented Dec 26, 2020

@lidavidm yes these changes are sufficient.

@lidavidm lidavidm closed this in 4d97d83 Dec 26, 2020
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
Closes apache#8959 from tifflhl/ARROW-10486

Authored-by: tifflhl <tiffanylamhl@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants