Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to reuse the HttpClient between requests by default #526

Conversation

jameshageman-stripe
Copy link
Contributor

@jameshageman-stripe jameshageman-stripe commented Jan 25, 2019

Previously, if the user has not set stripe.default_http_client, a new client would be created for every request. This commit changes the APIRequestor to populate stripe.default_http_client if it had not been set by the user, so the client can be reused between requests.

r? @brandur-stripe @ob-stripe
cc @stripe/api-libraries

@jameshageman-stripe
Copy link
Contributor Author

Only the Python 3.7 tests are complaining about formatting:

would reformat /home/travis/build/stripe/stripe-python/stripe/version.py

@jameshageman-stripe jameshageman-stripe force-pushed the jameshageman/persist-connections-by-default branch from 827159c to 56aa6b5 Compare January 25, 2019 18:32
Copy link
Contributor

@brandur-stripe brandur-stripe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added minor ask below. Going to leave final say on this one to @ob-stripe, but LGTM.

Only the Python 3.7 tests are complaining about formatting:

Oops, we recently switched to double-quoted strings for Black-powered formatting in the project, but our release scripts were still improperly producing single quoted strings. I just patched that in #527. You should be able to get a mostly-passing build (all except Pypy3) if you rebase on master.

tests/test_api_requestor.py Show resolved Hide resolved
@jameshageman-stripe jameshageman-stripe force-pushed the jameshageman/persist-connections-by-default branch from b67211f to 5dee6b1 Compare January 25, 2019 23:50
@brandur-stripe
Copy link
Contributor

@jameshageman-stripe Just a note that there was an internal JIRA ticket opened contemporaneously with this PR where Alex expressed some usability concern regarding setting proxy after an HTTP client's been initialized. I mentioned you on it, but just double-posting here to make sure you see it before we action too much further here.

@brandur-stripe
Copy link
Contributor

And one more note that we've temporarily removed Pypy3 from the build matrix until we can get a new release of Pipenv to fix that Pypy3 problem: #528 You should be able to get a fully passing build on a rebase.

Previously, if the user has not configured a default_http_client, a new
client would be created for every request. This commit changes the
APIRequestor to populate the default_http_client if it had not been set
by the user, so the client could be reused between requests.
@jameshageman-stripe jameshageman-stripe force-pushed the jameshageman/persist-connections-by-default branch 3 times, most recently from bb2c043 to 8f37b1e Compare January 28, 2019 20:19
@jameshageman-stripe jameshageman-stripe force-pushed the jameshageman/persist-connections-by-default branch from 8f37b1e to 3b86e58 Compare January 28, 2019 20:41
@jameshageman-stripe
Copy link
Contributor Author

@brandur-stripe @ob-stripe I have written a few integration tests that spin up a real HTTPServer on a random port.

One of the tests exposed an interesting find: the global stripe.proxy doesn't seem to get used. The only way to configure the proxy is to pass it to the client's constructor like this:

stripe.default_http_client = stripe.http_client.new_default_http_client(proxy="http://localhost:1234")

My integration tests show that stripe.proxy does not work, as this test fails:

def test_hits_stripe_proxy(self):
class MockServerRequestHandler(BaseHTTPRequestHandler):
num_requests = 0
def do_GET(self):
self.__class__.num_requests += 1
self.send_response(200)
self.send_header(
"Content-Type", "application/json; charset=utf-8"
)
self.end_headers()
self.wfile.write(json.dumps({}).encode("utf-8"))
return
self.setup_mock_server(MockServerRequestHandler)
stripe.proxy = "http://localhost:%s" % self.mock_server_port
stripe.Balance.retrieve()
assert MockServerRequestHandler.num_requests == 0

but this test passes:

def test_hits_client_proxy(self):
class MockServerRequestHandler(BaseHTTPRequestHandler):
num_requests = 0
def do_GET(self):
self.__class__.num_requests += 1
self.send_response(200)
self.send_header(
"Content-Type", "application/json; charset=utf-8"
)
self.end_headers()
self.wfile.write(json.dumps({}).encode("utf-8"))
return
self.setup_mock_server(MockServerRequestHandler)
stripe.default_http_client = stripe.http_client.new_default_http_client(
proxy="http://localhost:%s" % self.mock_server_port
)
stripe.Balance.retrieve()
assert MockServerRequestHandler.num_requests == 1

I think there are two paths forward. Either we fix the client to respect stripe.proxy, or we completely remove stripe.proxy from the code and documentation since it never worked anyway.

@jameshageman-stripe
Copy link
Contributor Author

My mistake, I messed up the test. stripe.proxy is used, only when a no custom client is created. Here is the usage: https://github.com/stripe/stripe-python/pull/526/files#diff-06e8e6a0633cc19f1fef8fbc334012cfR87.

Back to the discussion around what to do about changing the proxy between requests, I think a warning will suffice. I'll try to work that into this PR.

@brandur-stripe
Copy link
Contributor

My mistake, I messed up the test. stripe.proxy is used, only when a no custom client is created. Here is the usage: https://github.com/stripe/stripe-python/pull/526/files#diff-06e8e6a0633cc19f1fef8fbc334012cfR87.

Phew, I'm glad to see that we haven't been broken for all these years! That makes sense because hopefully if a feature like that had been broken for so long, someone would have noticed and told us. (It'd be quite concerning if they hadn't ;)

Back to the discussion around what to do about changing the proxy between requests, I think a warning will suffice. I'll try to work that into this PR.

SG and +1. Thank you!

@jameshageman-stripe
Copy link
Contributor Author

@brandur-stripe I've added the warning with appropriate tests.

Aside: I'd like to keep the tests/test_integration.py tests in here, although I think it could use a better name. This library has a whole bunch of abstractions, making it difficult to know which layer to mock out when writing tests. The integration tests file actually spins up a real HTTPServer and points the client at it, to allow us to assert that requests actually hit the expected endpoint. It should also come in handy when I go to fix the client telemetry stuff. It's a little different than then rest of our unit tests, so let me know what you think of them.

Copy link
Contributor

@brandur-stripe brandur-stripe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jameshageman-stripe! Left a few comments.

Aside: I'd like to keep the tests/test_integration.py tests in here, although I think it could use a better name. This library has a whole bunch of abstractions, making it difficult to know which layer to mock out when writing tests. The integration tests file actually spins up a real HTTPServer and points the client at it, to allow us to assert that requests actually hit the expected endpoint. It should also come in handy when I go to fix the client telemetry stuff. It's a little different than then rest of our unit tests, so let me know what you think of them.

Cool. Since you wrote them already, it can't hurt to have them in here.

cc @ob-stripe You may want to take a look as well.

self.mock_server_port = get_free_port()
self.mock_server = HTTPServer(
("localhost", self.mock_server_port), handler
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeesh, I was surprised to find that Python's docs for this are astoundingly awful, but passing 0 here should have the server select a port, which allows you to remove get_free_port and a possible race condition.

Here's an example with ThreadedTCPServer:

if __name__ == "__main__":
    # Port 0 means to select an arbitrary unused port
    HOST, PORT = "localhost", 0

    server = ThreadedTCPServer((HOST, PORT), ThreadedTCPRequestHandler)
    ip, port = server.server_address

    # Start a thread with the server -- that thread will then start one
    # more thread for each request
    server_thread = threading.Thread(target=server.serve_forever)
    # Exit the server thread when the main thread terminates
    server_thread.daemon = True
    server_thread.start()
    print "Server loop running in thread:", server_thread.name

    client(ip, port, "Hello World 1")
    client(ip, port, "Hello World 2")
    client(ip, port, "Hello World 3")

    server.shutdown()
    server.server_close()

README.md Outdated
@@ -113,7 +113,8 @@ production use, but `debug` is also available for more verbosity.

There are a few options for enabling it:

1. Set the environment variable `STRIPE_LOG` to the value `debug` or `info`
1. Set the environment variable `STRIPE_LOG` to the value `debug`, `info` or
`warning`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't tell that easily from the changes whether this already happens, but warnings should be emitted by default (if STRIPE_LOG isn't set), because otherwise no one will ever see them. Does the code do that right now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have it respect either warn or warning? Especially in logging libraries these two are often interchanged that it'd be sort of a nice feature to support both. If we're supporting only one, I'd say that we might want to do warn instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brandur-stripe My code currently requires a log level to be set in order to show warnings. I just did this to follow the existing pattern from info and debug - I'm fine with changing it.

We could simplify the interface and not allow warn (or warning) at all, and just always print warnings. Is there a case (STRIPE_LOG/stripe.log value) where you wouldn't expect warnings to be printed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be preferable to keep the STRIPE_LOG interface unchanged and use the warnings module to output warnings, like here. It's more idiomatic and has a better change of being noticed by users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds a whole lot simpler - I'll do that!

)
assert MockServerRequestHandler.num_requests == 2

def test_hits_client_proxy(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name this one something like test_hits_proxy_through_default_http_client?

@jameshageman-stripe jameshageman-stripe force-pushed the jameshageman/persist-connections-by-default branch from 6f80cc5 to f38bb7e Compare January 29, 2019 01:10
@jameshageman-stripe jameshageman-stripe force-pushed the jameshageman/persist-connections-by-default branch from e48f0e6 to 652ef29 Compare January 29, 2019 16:57
@jameshageman-stripe
Copy link
Contributor Author

@brandur-stripe @ob-stripe re-review?

@brandur-stripe
Copy link
Contributor

LGTM. Thanks James!

@ob-stripe Excellent call on using the built-in warnings module — that one was a TIL for me.

@brandur-stripe brandur-stripe merged commit 744a88b into stripe:master Jan 29, 2019
@jameshageman-stripe jameshageman-stripe deleted the jameshageman/persist-connections-by-default branch January 29, 2019 18:49
@brandur-stripe
Copy link
Contributor

Released as 2.20.0.

@ob-stripe
Copy link
Contributor

ob-stripe commented Jan 29, 2019

Er, sorry for the late feedback, but I wonder if this change is thread safe, since now all threads will share the same client.

Should we do something similar to stripe-ruby and use thread-local storage to store a client per thread instead?

@jameshageman-stripe
Copy link
Contributor Author

@ob-stripe Good catch. I can look into it.

@brandur-stripe
Copy link
Contributor

Should we do something similar to stripe-ruby and use thread-local storage to store a client per thread instead?

@ob-stripe @jameshageman-stripe Just taking a closer look at this, I think that reusing the clients between threads is actually pretty close to thread-safe, and may have been meant originally to be thread safe. The only one that doesn't seem to be is RequestsClient because we're sharing a session object.

One thing that occurs to me is that using the pre-existing stripe.default_http_client wasn't really thread-safe either, because if you instantiated it with RequestsClient, it would have the same problem of a session shared across threads.

As an alternative to a client-per-thread, what do you guys about dictating that all HTTP clients should be thread-safe, and have RequestsClient put sessions in thread-local storage instead?

@jameshageman-stripe
Copy link
Contributor Author

That sounds good to me - I came across the same issue while working on #530. We can't avoid sharing clients between threads, so we should ensure that the clients are thread safe.

brandur-stripe pushed a commit that referenced this pull request Jan 29, 2019
It came up in #526 that our implementation isn't quite thread-safe.
*Most* HTTP clients are by virtue of the fact that their underlying
libraries are, but the recommended client, `RequestsClient` for the
Requests library, creates and reuses only a single session for all the
requests it makes.

The Requests maintainers are non-committal when it comes to the thread
safety of a session, but the general sentiment of its userbase is that
sessions are not thread safe.

Here we tweak the `RequestsClient` implementation so that we create one
session per thread by leveraging Python's thread-local storage.

Unfortunately, this implementation still leaves a pitfall in that if you
explicitly pass in a session in, it will get used between all threads.
The ideal thing to do here I think would be to probably not accept a
session object anymore, but that'd be a breaking change. One possibility
is that we could start warning about the use of the session parameter,
but for the time being I've opted to just leave this as a known problem
(given that hopefully this patch will still be a strict improvement).
@brandur-stripe
Copy link
Contributor

Took a shot at fixing the thread safety problem for RequestsClient in #531.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants