Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative simplified HTTP client API #4346

Closed
ksamuel opened this issue Nov 14, 2019 · 10 comments
Closed

Alternative simplified HTTP client API #4346

ksamuel opened this issue Nov 14, 2019 · 10 comments
Assignees

Comments

@ksamuel
Copy link
Contributor

ksamuel commented Nov 14, 2019

As discussed with @asvetlov, here is a proposal for an additional API to make HTTP requests, in two parts:

  • aiohttp.client.HttpClientSession: an API with reasonable performances, and simpler semantics than aiohttp.ClientSession. This is what most small to average programs will want to use. The most important feature being the preloading of the request body, but it doesn't support WebSocket.
  • aiohttp.client.requests: an API with weak performances, but very easy to play with, that will remind the hello world from the requests library. This is what we can put in the README to ease people into the transition into the async world. It's also very nice to use in a shell or quick and dirty scripts.

The existing ClientSession API is not going to be deprecated, just presented in the documentation as a more advanced use case for when you want to get the best performances and control possible. The two other API in fact uses ClientSession under the hood.

The README and the documentation will reflect this, and will clearly mention the pros and cons of each API, with links to the other. In order to avoid overwhelming the new comer, and to also avoid confusion ("why 3 ways of doing the same thing ?"), we will not present them at the same time.

It should be progressive:

1 - README:

Demonstrates aiohttp.client.requests as a hello world, then leads with a phrase like:

"Learn more about those simple requests<doc>. Want better performances than this ? Read about about the more efficent alternatives<doc>."

2 - The page "Learn more about those simple requests" leads to the requests API tutorial. At the end of this tutorial, a mention is made about the pros and cons of this API, and a link to the aiohttp.client.HttpClientSession API tutorial is offered.

3 - At the end of the aiohttp.client.HttpClientSession API tutorial, a mention is made about the pros and cons of this API, and a link to the aiohttp.ClientSession API tutorial is offered.

4 - Read about about the more efficent alternatives<doc>_. explains the pros and cons of each API, and link to both the aiohttp.client.HttpClientSession and the aiohttp.ClientSession API tutorial. It will recommand to give a try to aiohttp.client.HttpClientSession, and to choose aiohttp.ClientSession only if performances, control or WebSocket are important to the user.

API design

Look and feel

Current ClientSession API

import aiohttp

async with aiohttp.ClientSession() as session:
    async with session.get('http://httpbin.org/get') as resp:
        print(resp.status)
        print(await resp.text())

New aiohttp.client.requests API

from aiohttp.client import requests

resp = await requests.get('http://httpbin.org/get') # open and close everything
print(resp.status)
print(resp.text) # response body is preloaded

New aiohttp.client.HTTPClientSession API

from aiohttp.client import HTTPClientSession

async with HTTPClientSession() as session: # Same params as ClientSession
    resp = await session.get('http://httpbin.org/get') # client keeps a ref on  each response object
    print(resp.status)
    print(resp.text)  # response body is preloaded
# close all referenced response objects

Differences with the ClientSession API

HTTPClientSession

HTTPClientSession is similar to ClientSession, except:

  • remove connector_owner param: sharing a connector is an advance use case, leave it to the advanced API.
  • add _responses_history attribute: this will hold a reference to all created responses and will be looped over for cleaning on HTTPClientSession.close().
  • remove ws_response_class param: HTTPClientSession targets HTTP only. For websocket, use the advanced API.
  • calling ws_connect() raises ClientTypeError("HTTPClientSession does not support WebSocket. Use ClientSession instead.") as defined by:
class ClientTypeError(TypeError, aiohttp.ClientError):
    pass

HTTPClientSession.request/get/post/... and requests.get/post/...

Same as ClientSession.request/get/post..., except:

  • remove chunked param: body will be preloaded, so it doesn't make sense.
  • add content_length_limit param: if content length is set too high, read only up to this limit as a safety mechanism. This is necesarry since read() is automatically called. Do you think I should set it to None by default or just a very high value ? Should it raise an exception is limit is reached ?
  • removed read_until_eof param: always read up to the "Content-Length" header or content_length_limit
  • return a PreloadedClientResponse object instead of a ClientResponse.
  • PreloadedClientResponse inherits from ClientResponse.
  • ClientResponse.read() immediatly called on PreloadedClientResponse prior return.
  • HTTPClientSession.content is not a aiohttp.StreamReader but bytes.
  • PreloadedClientResponse.read() raises ResponseTypeError("PreloadedClientResponse content is automatically preloaded, don't call read(), access the 'content' and 'text' attributes.") as defined by:
class ResponseTypeError(TypeError, aiohttp.ClientResponseError):
    pass
  • PreloadedClientResponse.text is not a coroutine, but a @property that lazily decodes content on first read, then is replaced by a regular attribute containing the result for subsequent reads. Do you have utils to do that in the code base ?
  • add a PreloadedClientResponse.encoding attribute, preloaded at __init__ after .read(). Can be set manually by the user to compensate for the fact text is not a regular function anymore and you can't pass it params.
  • add encoding_errors param: accept "errors", "replace", etc., stored in encoding_errors attribute. Compensate for the fact text is not a regular function anymore and you can't pass param. (This could be a good addition to ClientSession as well, in case one wants the same strat for all requests instead of passing it to each call)

HTTPClientSession inherits from ClientSession.

requests will in fact be a singleton (just instanciated at the root of the module, no fancy __new__ tricks), and an instance of BasicHTTPClient. BasicHTTPClient will not inherit from ClientSession, but will use composition to delegate the work to HTTPClientSession and only expose .get/post/etc

Testing

I'll scan test_client_response.py and test_client_session.py for inspiration (espacially for the mocking), but I don't intend to duplicate the tests in there.

For HTTPClientSession, I'll create:

  • test_preloaded_client_response.py
  • test_http_client_session.py

And only tests the API differences with their parent, plus the signature, the behavior of overriden methods and ensure closing of the response objects.

For requests, I'll create test_requests.py and test the entire high level API, as well as use mocks to ensure we are calling the cleaning code correctly.

@asvetlov
Copy link
Member

A few things.
Sorry for possible misleading, but the aiohttp.client already exists (while it is not promoted as a public API).
We need another name.

Since it is for simplified usage please consider aiohttp.nutshell. I'm not an English speaker though. Maybe @mjpieters can suggest something? Looks like he is good in a words juggling :)

If we pin the aiohttp.nutshell namespace, I see all new APIs in this subpackage. HTTPClientSession is cumbersome, from aiohttp import nutshell; nutshell.Client looks better.
There is no need for nested requests subname, resp = await nutshell.get() looks good enough as well.

from aiohttp import nutshell

resp = await nutshell.get('http://httpbin.org/get') # open and close everything
print(resp.status)
print(resp.text) # response body is preloaded

and

from aiohttp import nutshell

async with nutshell.Client() as client:
    resp = await client.get('http://httpbin.org/get')
    print(resp.status)
    print(resp.text)  # response body is preloaded

Since nutshell.Response is a pure dataclass (attrs class actually because we use attrs already and should support Python 3.6) without IO there is no need for client._responses_history; all responses are released before an exit from await client.get(...) and family.

Websockets are not supported, at least for the first version -- I agree.

I suggest starting from the very minimalistic API, extend the list of supported features incrementally.
Say, I'm not happy with the proposal of changing the type of resp.content attribute. It looks very confusing. Bytes mode needs more attention. Maybe the simplified mode doesn't require it at all.
resp.read() should go, let's stick with resp.text property.

Our intention should be not requests API copy, it is a not realistic target. Let's design the simplistic API from scratch.

BasicHTTPClient is not a thing. Singletons don't work with asyncio well. I can imagine nutshell.get() as a simple shortcut for async with nutshell.Client() as client; return await client.get(...).
Simple and elegant for the performance price :)

Regarding testing: please don't use mocking at all, it is not required for the discussed API. Take ./tests/test_client_functional.py for inspiration.

@ksamuel
Copy link
Contributor Author

ksamuel commented Nov 16, 2019

A few things.
Sorry for possible misleading, but the aiohttp.client already exists (while it is not promoted as a public API).
We need another name.

Since it is for simplified usage please consider aiohttp.nutshell. I'm not an English speaker though. Maybe @mjpieters can suggest something? Looks like he is good in a words juggling :)

Got it. My preference usually goes to something close to the domain, though. So I'd suggest "requests" or "http":

from aiohttp import http
resp = await http.get('http://httpbin.org/get') # open and close everything
async with http.Client() as client:

Or:

from aiohttp import requests
resp = await requests.get('http://httpbin.org/get') # open and close everything
async with requests.Client() as client:

But honestly I can work with any name, just chose the one you like.

Since nutshell.Response is a pure dataclass (attrs class actually because we use attrs already and should support Python 3.6) without IO there is no need for client._responses_history; all responses are released before an exit from await client.get(...) and family.

I was thinking that ".text" could lazily read and close the connection, as to mitigate the perf cost we already pays. But in retrospect, this will cost more memory and make the code more complex and harder to test. attrs class it is.

I suggest starting from the very minimalistic API, extend the list of supported features incrementally.
Say, I'm not happy with the proposal of changing the type of resp.content attribute. It looks very confusing. Bytes mode needs more attention. Maybe the simplified mode doesn't require it at all.
resp.read() should go, let's stick with resp.text property.

Alright, but I need to store the bytes somewhere: any request to an image, a zip or an excel file is a legit case for the simplified API, but we won't decode that as text. Also, I expect json() will run faster if it is fed from bytes, not str. Given I'll have it anyway, can I just publicly expose it, as a "payload" or a "bytes" attribute ?

Our intention should be not requests API copy, it is a not realistic target. Let's design the simplistic API from scratch.

Yes.

BasicHTTPClient is not a thing. Singletons don't work with asyncio well. I can imagine nutshell.get() as a simple shortcut for async with nutshell.Client() as client; return await client.get(...).
Simple and elegant for the performance price :)

Ok. I'll make it a functional API a the root of the module. No need for a class.

Regarding testing: please don't use mocking at all, it is not required for the discussed API. Take ./tests/test_client_functional.py for inspiration.

Thank you.

@asvetlov
Copy link
Member

Let's wait for a while, maybe the best name will come up.
aiohttp.http module already exists as aiohttp.client.
aiohttp.requests sounds like requests library in aiohttp which is not true. This fact embarasses me a little.

I just publicly expose it, as a "payload" or a "bytes" attribute ?

Let's use bytes, resp.text and resp.bytes look natural. Plus resp.json() function, right?

@mjpieters
Copy link
Contributor

I do like nutshell; async HTTP requests in a nutshell. It neatly conveys that this is the digest reader version of the full API. The downside is that those building a server may ask for a nutshell simplification of the server APIs too.

My preference usually goes to something close to the domain, though. So I'd suggest "requests" or "http":

But the top-level namespace is already aiohttp. aiohttp.http feels rather redundant, aiohttp.requests both feels like either a copy of the requests project and at the same time not clearly defining why it is different from aiohttp.client (most developers don't know nor care that requests is a friendly wrapper around urllib3). If you want a literal name, go for aiohttp.simple.

But I have a soft spot for more cutesy names like nutshell.

@asvetlov
Copy link
Member

aiohttp.simple was number two on my list.
I proposed nutshell exactly for the reason @mjpieters mentioned; async HTTP requests in a nutshell sounds like a motto :)

@ksamuel
Copy link
Contributor Author

ksamuel commented Nov 16, 2019 via email

@mjpieters
Copy link
Contributor

Cutesie names have given us Flask and Click and other household names.

But if the name aiohttp.simple was used and the documentation for it starts with a tagline: http requests in a nutshell, that might also work.

@asvetlov
Copy link
Member

Aren't fork or socket little playful names for such serious things?

From my understanding, simple is mostly used as an adjective.
I believe the namespace should be named by something that is a noun.
Is there a deadly official and serious name that fits our needs?

Also, I recall very many books about programming that are called in a nutshell.
By this, I think that nutshell is a pretty common word in the software development world.

Anyway, the name can be changed easily, it is not a show-stopper for starting simplified API. The thing is very useful and important regardless of the naming.

@Dreamsorcerer
Copy link
Member

Just to link these issues, I think the new proposed API above is a bad idea, but maybe some shortcut methods for simple cases could be included (e.g. ClientSession.get_text()): #5385 (comment)

@Dreamsorcerer
Copy link
Member

Let's close this one in favour of #5385.

@Dreamsorcerer Dreamsorcerer closed this as not planned Won't fix, can't repro, duplicate, stale Aug 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants