ClientSession signals to track events related to each request. #2313

pfreixes · 2017-10-11T07:03:39Z

The current implementation of ClientSession does not provide a clean and straightforward way to track events related to each request.

Right now in our organization, we are tracking the time spent and the status code overriding the _request private method and implementing a customized ClientSession. The solution that we have implemented is far from being the best.

To address this situation, ClientSession could implement a new set of methods that will allow the user to instrumentalize different metrics related to each request. The following is a list of new methods that will be used for this purpose:

on_pre_request Executed before the request is sent.
on_post_request Executed after the request is sent.
on_queued Executed when the request is queued.
on_unqueued Executed when the request is unqueued.
on_create_conn Executed when a new connection is created.

We are planning on have support for three new metrics: number of TCP connections created, number of operations queued because the limit has been reached, time spent in the queue
The previous methods will allow us to track all of these metrics.

Any comments appreciate.

The text was updated successfully, but these errors were encountered:

asvetlov · 2017-10-11T07:51:47Z

Not methods but signals I guess.
Also we need to track request/response headers and bodies.

pfreixes · 2017-10-11T08:05:24Z

Yes sorry, I meant signals.

Can you evolve that please:

track request/response headers and bodies.

asvetlov · 2017-10-11T08:22:10Z

something like on_pre_response, on_post_response, on_request_data, on_response_data.

pfreixes · 2017-10-11T08:53:42Z

Current Signal implementation is coupled to the App interface [1]. I was wondering how changing the name of the class to AppSignal and implementing a more naive Signal class without the app dependency - and probably without the pre and post signals - might break the API.

First thoughts?

[1] https://github.com/aio-libs/aiohttp/blob/master/aiohttp/signals.py#L20

jettify · 2017-10-11T12:03:55Z

I was going to ask something similar for my tracing project https://github.com/aio-libs/aiozipkin other tools like DataDog, NewRelic, etc will benefit too.

Just FYI popular java http client okhttp has two types of hooks: interceptors (can modify request, add retries and caching) and events (does not change anything). Signals looks like events:

https://github.com/square/okhttp/wiki/Events

So should we also add similar hooks? like on_dsn_start/stop, on_headers_send?

asvetlov · 2017-10-11T12:11:21Z

There are at least three signal sets:

Request: send headers -> [ send body part ] -> finish.
Response: receive headers -> [ receive body part ] -> finish.
Connection: request connection -> connection acquired -> connection released. Maybe we need additional signals for DNS resolving and proxy tunneling.

pfreixes · 2017-10-11T12:36:43Z

I will compile all of this informatio and I will try to make a first draft if the events that ClientSession should support.

My most immediate concern related with the implementation and consistence is about how signals must to be exposed by the Connector and the DnsResolver taking into account that these objects can be shared across different ClientSession instances.

Having all signals handled by the ClientSession can end up with some inconsistences, where signals that are raised by one specific source will be broadcasted to other sources. On the other way arround, having the signls decoupled wont help third services to have a clean interface to identify an event related to a specific request.

Therefore IMHO the solution must found an implementation that guarantees the consistence and makes easy for the developper conect all of the events for a secific request.

asvetlov · 2017-10-11T12:50:26Z

I think signals should be in client session instance only.
Resolver is called by connector explicitly, there is no need to send something from resolver --> connector could do it.
Connector could call session's signals.

pfreixes · 2017-10-13T09:47:37Z

The following list, the proposed signals that would be implemented taking into account your comments. All of them have at least the same first two arguments, the client_session and request instance objects related to the specific request that is being traced. Later other divergent arguments can be considered taking into account each specific event.

on_request_start(client_session, request)
on_dns_start(client_session, request)
on_dns_end(client_session, request)
on_request_queued(client_session, request)
on_request_unqueued(client_session, request)
on_createconn_start(client_session, request)
on_createconn_end(client_session, request)
on_connection_adquired(client_session, request)
on_connection_released(client_session, request)
on_request_headers_sent(client_session, request)
on_request_body_sent(client_session, request)
on_request_headers_received(client_session, request)
on_request_content_received(client_session, request)
on_request_end(client_session, request)

The use case is represented by the following snippet:

async def request_start(client_session, request):
    print("Request {} started".format(request))

async def request_end(client_session, request):
    print("Request {} finished".format(request))

request_tracing = RequestTracing()
request_traccing.on_request_start.add(request_start)
request_traccing.on_request_end.add(request_end)
session = ClientSession(request_tracing=request_tracing)

Under the hood

Before PEP 550 [1] is not available, I will go for an ad-hoc implementation that stores at task level
the object that would allow sending a signal by any piece of code involved to make a request placed at almost any part of the code. It will give us the following pros:

Request tracing is explicitly activated at instantiation time using the ClientSession object.
Underlying objects such as BaseConnector u others do not pollute their signatures, introspecting at real time if the request tracing is enabled at some point.
Each trace, a unique code path executed at the moment making a request, is accompanied with the ClientSession and the Request instance.

The following snippet shows how the signal related to the on_request_queued and on_request_unqueued operation might be implemented inside of the BaseConnector code:

class BaseConnector:
    .....

    @asyncio.coroutine
    def connect(self, req):
        ....

        # Wait if there are no available connections.
        if available <= 0:
            fut = helpers.create_future(self._loop)

            # This connection will now count towards the limit.
            waiters = self._waiters[key]
            waiters.append(fut)

            if RequestTracing.trace():
                yield from RequestTracing.trace().on_request_queued.send()

            yield from fut

            if RequestTracing.trace():
                yield from RequestTracing.trace().on_request_unqueued.send()

            waiters.remove(fut)

        ....

The trace method returns an object that was created and stored at the task level pointing to
the ClientSession and the Request object. The following code might show how this object/trace is created when the ClientSession._request method is called:

class ClientSession

    @asyncio.coroutine
    def _request(....):
        ....
        req = self._request_class(..........)
        if request_tracing:
            trace = request_tracing.new_trace(self, req)
            yield from trace.on_requests_start.send()
        ...

At the moment that request_tracing.new_trace is executed a new trace at task level is stored bound to the ClientSession and the Request instance, making it available for the underlying objects.

[1] https://www.python.org/dev/peps/pep-0550/

kxepal · 2017-10-13T11:59:58Z

@pfreixes
Quite large list of signals. Why so much?

pfreixes · 2017-10-14T06:37:03Z

@kxepal basically its the outcome of compile the events mentioned here. I would say that this list almost matches with the events supported by the okhttp Java client [1]

[1] https://github.com/square/okhttp/wiki/Events

asvetlov · 2017-10-17T11:05:18Z

I don't like request parameter.

on_request_body_sent should pass body (might be called several times).
ClientRequest is not documented, it is not first class citizen in the library.
After implementing Introduce session.make_request() API #2019 (I love to do it in next release) request will gone (or at least changed very much)

on_request_headers_received is on_response_headers_received most likely. The same for on_request_content_received.

I suggest not using context locals at all, at least for initial implementation.
Let's pass request_tracing to session explicitly.
After PEP 550 we could provide a tracer implementation which is based on implicit context.
For older pythons there is a possibility to implement the logic with custom task factory.
But please keep initial tracer simple.

pfreixes · 2017-10-18T15:10:22Z

The idea of the request was given some parameter that might be used by the developer to "connect" all those signals that are happening for a specific request.

Another way to do it, and maybe more clean, is giving a context param at each signal call which will be the same for each unique request. By default, it might be a dictionary but the user could replace the factory, perhaps:

await on_request_start(session, context):
    context.started = loop.time()

await on_request_end(session, context):
    context.total = loop.time() - context.started

TraceContext = NamedTuple("TraceContext", ['started', 'total'])
tracer = RequestTracing(ctx_factory=TraceContext)
tracer. on_request_start(on_request_start)
tracer. on_request_end(on_request_start)
session = ClientSession(request_tracing=tracer)

That's just a sketch, does it sound reasonable for you @asvetlov ?

Just realized that the TCPConnector is in fact very coupled to the ClientSession, indeed the destructor of the ClientSession closes all the opened connections that are held by the TCPConnector, without taking care if it was instantiated by the Session or was given as a parameter. Therefore, the rationale that I exposed about the feature of share the TCPConnetor and the DNSResolver is after all not valid. The context won't be necessary for that case, let's do it explicitly.

asvetlov · 2017-10-18T16:09:23Z

I like context idea. Let's make it types.SimpleNamespace by default.
But we definitely need to carefully define a list of accepted params for every hook.
About connectors and DNSResolver: I'm ok to break API by passing tracer to connector.connect().
We can do wild changes, I very doubt if people create on connectors. Maybe only few ones.

@pfreixes I suggest starting PR with implementing very few hooks and adding more later.

BTW @jettify and @thehesiod might be interested in the feature.

kxepal · 2017-10-18T16:12:52Z

Btw. What kind of actions could be done with such signals except time measurement?

kxepal · 2017-10-18T16:14:39Z

@pfreixes
I believe here should be context.started, right?

await on_request_end(session, context):
    contex.total = loop.time() - session.started

jettify · 2017-10-18T16:50:44Z

One other use case is context propagation, say you need to trace how many services was involved in serving particular request. Best way to do this is to add 3 headers, and pass them to downstream services. Most distributed tracers work this way, so as result you can aggregate and build full tree of chained calls.

okhttp also has interceptors, so user can implement caching, and retries inside client.
https://github.com/square/okhttp/wiki/Interceptors

pfreixes · 2017-10-18T21:20:47Z

@kxepal yes my mistake.

@jettify Yes you are right, forward headers its the way to chain different calls that belong to the same "segment" - using AWS XRay terminology [1]. To implement that pattern we should change the none nocive signals to something like interceptors as you mentioned.

Indeed, this is the pattern that we are following right now, aside of tracing the request overriding the ClientSession._request method, we are modifying the request header adding automatically the proper header to propagate the request_id.

Therefore, IMHO we should consider that the final implementation should have the interceptor pattern. This will bring us to the previous discussion about the particularity of exposing the request class. Anyway, the on_request_end must give as a param the response object, otherwise is not possible to trace the request using the status code.

My 2 cents here, let's make a first POC and let's comment it.

[1] https://github.com/aws/aws-xray-sdk-python

pfreixes · 2017-10-19T11:26:15Z

Just a couple of examples about how the developer could use the new request tracing feature. I prefer a top to bottom implementation which helps me to make sure that we will meet all of the requirements that we are exposing here.

From my point of view the main requirement that we should meet are:

A request tracking system that links all of the signals that were triggered by the same request for a specific ClientSession instance.
The scope of the context == Request scope
Follow the same Signals exposure of web.Application class

The first example [1] shows how these requirements are exposed to the user:

Signals are finally attached to the ClientSession class, there is no RequestTrace object.
Enable or disable means attach or not attach listeners.
trace_context is related to each request, therefore its instantiated at each post, get, whatever method.
Developer can give an alternative trace_context instance as a new keyword for the post, get, whatever method.

The second example [2] is a more complicated one which subclasses the ClientSession, having, as a result, a new class that allows the user to store some variables that won't mutate for all of the requests using a specific class instance.

Leaving aside the singals/interceptors supported, and the signature of these methods I believe that this way to expose the request tracing feature might be a good candidate IMHO to build the final implementation.

[1] https://gist.github.com/pfreixes/847a18b973266950c2cf8279b2435da6
[2] https://gist.github.com/pfreixes/e0e3b3f98339d6689b031027ed6fa4b8

asvetlov · 2017-10-19T17:12:07Z

#1309 is superseded by the issue

thehesiod · 2017-10-24T09:18:56Z

sorry guys will be out for awhile on all fronts, just had our first 👶 :)

This commit has as a main proposing start a discussion about if this what was expected and clear some of the uncertainties related to the implementation. The following list enumerates each point that could be discussed, asside of the general implementation. 1) Break the `Signal` object to accommodate the old implementation as `AppSignal` having the `Signal` for generic signals 2) Coroutine signals are reserved only for the start, end and exception signals. Others are implemented as `FuncSignals` 3) The final list of signals suported is the ones that are in this commit, however not all of them are implemented yet. Just follow the new `ClienSession` properties that match the `on_*` pattern. 4) Redirects have an own signal `on_request_redirect` that is triggered at each redirect response. 5) There is no a clear way to pass the `Session` and the `trace_context` for the underlying objects used by the `ClientSession`. If you have better ideas please shout it. 7) Signals related to the progress of the upload/download will be implemented at the streamers/protocols

pfreixes · 2017-10-24T15:59:29Z

Just a POC but I would appreciate your feedback

pfreixes@a8da61d

For now we are going to not implement the signals related to the chunks uploaded. Once all MVP of the request tracing is implemented we will improve it adding the proper signals to follow the upload and download by chunk

pfreixes · 2017-10-25T22:20:06Z

The work related to this features is almost done, sorry for the many changes. If you wanna take a look at the current changes compare to the master branch, please:

master...pfreixes:clientsession_signals

What is missing?

Documentation
on_request_content_chunk_sent and on_request_content_chunk_received. Im inclided on laeve these two missing signals for another MR

My idea is synchronize with the current master and rebase my branch and squash all of the commits in just one commit with all of the changes. I hope that this might be done by Tomorrow.

asvetlov · 2017-10-26T08:01:36Z

@pfreixes don't warry too much about squashing -- we do squash on merging usually.

asvetlov · 2017-10-26T08:03:21Z

Please make a PR -- I like to have a proper place for review.

Implementation of the client signals exposed by the `ClientSession` class, to get a list of the all signals implementation please visit the documentation.

* Client signals #2313 Implementation of the client signals exposed by the `ClientSession` class, to get a list of the all signals implementation please visit the documentation. * Removed not already implemented signal * Removed support for on_content and on_headers signals * fixed small sutff * Get rid of no longer used helpers.create_future * Update 2313.feature * on_request_start receives the whole URL object * on_request_end and error flavor receive the method, URL and headers * TraceConfig as a way to configure the ClientSession tracing * Fix flake import issues * Increase dns tracing coverage * Tracing signals are explicits * Removed not used session kw argument * Accept multiple TraceConfig objects for the ClientSession class * Renamed trace context vars * Fix invalid test func definition * Fixed docstring params codification

pfreixes · 2017-11-18T20:46:01Z

can we close this? IMHO the MVP its done. Surely it can be improved adding the signals related to the headers and content upload and download, but we can create the proper issues and PR for new improvements.

asvetlov · 2017-11-19T06:41:22Z

Agree

lock · 2019-10-28T13:01:39Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].
[new issue]: https://github.com/aio-libs/aiohttp/issues/new

pfreixes mentioned this issue Oct 11, 2017

aiohttp 2.3 release schedule #2250

Closed

asvetlov added this to the 3.0 milestone Oct 19, 2017

This was referenced Oct 19, 2017

Client Tracing #1309

Closed

Client Upload/Download progress #800

Closed

pfreixes added a commit to pfreixes/aiohttp that referenced this issue Oct 25, 2017

Signals on_headers_received and on_content_received aio-libs#2313

5ba42d9

pfreixes added a commit to pfreixes/aiohttp that referenced this issue Oct 25, 2017

Functional test for request tracing aio-libs#2313

7f4471e

pfreixes added a commit to pfreixes/aiohttp that referenced this issue Oct 25, 2017

DNS signals implementation aio-libs#2313

06e987d

pfreixes added a commit to pfreixes/aiohttp that referenced this issue Oct 29, 2017

Client signals aio-libs#2313

e946ec5

Implementation of the client signals exposed by the `ClientSession` class, to get a list of the all signals implementation please visit the documentation.

pfreixes mentioned this issue Oct 29, 2017

Client signals #2313 #2429

Merged

5 tasks

asvetlov mentioned this issue Nov 10, 2017

Logging send/receive data at DEBUG level #1087

Closed

asvetlov closed this as completed Nov 19, 2017

lock bot added the outdated label Oct 28, 2019

lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ClientSession signals to track events related to each request. #2313

ClientSession signals to track events related to each request. #2313

pfreixes commented Oct 11, 2017

asvetlov commented Oct 11, 2017

pfreixes commented Oct 11, 2017

asvetlov commented Oct 11, 2017

pfreixes commented Oct 11, 2017

jettify commented Oct 11, 2017 •

edited

Loading

asvetlov commented Oct 11, 2017

pfreixes commented Oct 11, 2017

asvetlov commented Oct 11, 2017

pfreixes commented Oct 13, 2017

kxepal commented Oct 13, 2017

pfreixes commented Oct 14, 2017

asvetlov commented Oct 17, 2017

pfreixes commented Oct 18, 2017 •

edited

Loading

asvetlov commented Oct 18, 2017

kxepal commented Oct 18, 2017

kxepal commented Oct 18, 2017

jettify commented Oct 18, 2017 •

edited

Loading

pfreixes commented Oct 18, 2017

pfreixes commented Oct 19, 2017

asvetlov commented Oct 19, 2017

thehesiod commented Oct 24, 2017

pfreixes commented Oct 24, 2017

pfreixes commented Oct 25, 2017 •

edited

Loading

asvetlov commented Oct 26, 2017

asvetlov commented Oct 26, 2017

pfreixes commented Nov 18, 2017

asvetlov commented Nov 19, 2017

lock bot commented Oct 28, 2019

ClientSession signals to track events related to each request. #2313

ClientSession signals to track events related to each request. #2313

Comments

pfreixes commented Oct 11, 2017

asvetlov commented Oct 11, 2017

pfreixes commented Oct 11, 2017

asvetlov commented Oct 11, 2017

pfreixes commented Oct 11, 2017

jettify commented Oct 11, 2017 • edited Loading

asvetlov commented Oct 11, 2017

pfreixes commented Oct 11, 2017

asvetlov commented Oct 11, 2017

pfreixes commented Oct 13, 2017

kxepal commented Oct 13, 2017

pfreixes commented Oct 14, 2017

asvetlov commented Oct 17, 2017

pfreixes commented Oct 18, 2017 • edited Loading

asvetlov commented Oct 18, 2017

kxepal commented Oct 18, 2017

kxepal commented Oct 18, 2017

jettify commented Oct 18, 2017 • edited Loading

pfreixes commented Oct 18, 2017

pfreixes commented Oct 19, 2017

asvetlov commented Oct 19, 2017

thehesiod commented Oct 24, 2017

pfreixes commented Oct 24, 2017

pfreixes commented Oct 25, 2017 • edited Loading

asvetlov commented Oct 26, 2017

asvetlov commented Oct 26, 2017

pfreixes commented Nov 18, 2017

asvetlov commented Nov 19, 2017

lock bot commented Oct 28, 2019

jettify commented Oct 11, 2017 •

edited

Loading

pfreixes commented Oct 18, 2017 •

edited

Loading

jettify commented Oct 18, 2017 •

edited

Loading

pfreixes commented Oct 25, 2017 •

edited

Loading