Documentation request - safe async usage #531

hsghori · 2023-09-18T19:16:47Z

I'm trying to understand to what extent actual parallel usage of this client is safe / tested / supported.

For example, since I can't use bulk publish to send multiple messages in a single payload if those messages are from different channels, I've been looking into using python coroutines to reduce the amount of time I need to wait on the ably servers.

But it's not clear to me if this usage is actually safe from a state / resources perspective.

client = AblyRest(get_api_key())
with client:
    responses = asyncio.gather(*[
        client.channels.get(channel_name).publish(name, message) for channel_name, name, message in messages
    ])

Under the hood it looks like Channel.publish calls back to the shared ably.http client and that can actually set parameters like self.__host and self.__host_expires which feels fairly unsafe since different coroutines could be setting those parameters on the same client. But since channel.publish is an explicitly async method I'd expect it to be safe to parallelize.

So I'd love to understand to what extent async usage of this sdk is safe / how y'all would recommend we use the async capabilities of the sdk.

┆Issue is synchronized with this Jira Task by Unito

The text was updated successfully, but these errors were encountered:

sync-by-unito · 2023-09-18T19:21:25Z

➤ Automation for Jira commented:

The link to the corresponding Jira issue is https://ably.atlassian.net/browse/SDK-3856

sacOO7 · 2023-09-18T19:47:18Z

@hsghori I have notified the author of the of the library who wrote asyncio part, we will get back to you once available 👍

owenpearson · 2023-09-19T15:41:04Z

Hey @hsghori, thanks for reaching out!

I understand your concern with asynchronous operations changing internal state, however we have aimed to design the library so that it 'just works' without needing to worry too much about concurrency, so I don't have any specific advice on how to use the async methods.

In this particular case, when the internal http client encounters an error which necessitates the use of a fallback host, it will iterate through fallback hosts and if it finds a working host it will temporarily save that as a preference. I suppose there are cases where several requests fail at once and the fallback mechanism would occur for each failure with each overwriting the host preference when finished, but it's quite rare for this to happen and there are only 5 fallback hosts to try so, while it's not the perfect behaviour, I don't consider it to be unsafe or otherwise unacceptable.

If this behaviour is unacceptable for you, an alternative and more efficient way to publish multiple messages to multiple channels is to use our batch API. We don't have first class support for it in ably-python just yet, however you can still make use of the SDKs auth and retry mechanism by querying the api with the generic AblyRest.request method. If you do go down this route I'm happy to answer any questions or share an example if needed.

hsghori · 2023-09-19T15:45:17Z

@owenpearson thanks for the response! I would love an example if you have time. Also is there a limit to the total number of messages that can be sent via the bulk API at once? And is the 10s timeout sufficient for using the bulk API?

owenpearson · 2023-09-19T16:40:27Z

Hey @hsghori, here's an example of using the batch API. Note that while this example publishes a single 'batch spec', you can also publish an array of batch specs in order to send different messages to a different collection of channels. The results for each batch spec contain a successCount, failureCount, and a results array. The recommended way to use these is to check if failureCount is nonzero, and if so, loop through the results array to find which channels have an error field.

There is no limit to the number of messages, however there is a limit of 100 channels and 2MiB request body per request. And yes, the 10s timeout is fine for this endpoint.

import asyncio
from ably import AblyRest

async def main():
    client = AblyRest(key="your_api_key")

    batch_spec = {
        "channels": ["channel1", "channel2", "channel3"],
        "messages": [{"data": "message1"}, {"data": "message2"}],
    }
    res = await client.request(
        "POST", "/messages", "3", None, batch_spec
    )

    # an array of BatchResults, one for each BatchSpec, each containing a 
    # successCount and a failureCount indicating the number of channels to 
    # which the batch of messages was published successfully
    print(res.items)

asyncio.run(main())

hsghori · 2023-09-19T17:50:25Z

~~To clarify, is the channel limit BatchSpec or per request in total. In the api docs I see that we can have a request like:~~

[
  {
    channels: ['channel1', 'channel2'],
    messages: {data: 'My message'}
  },
  {
    channels: 'channel3',
    messages: [
      {data: 'My message'},
      {name: 'An event', data: 'My event message contents'},
    ]
  }
]

~~Is that entire array limited to 100 channels or is each BatchSpec in the array limitted to 100 channels. If it's the latter is there a limit on the number of BatchSpecs that can be sent at once?~~

Edit: I see that's explained in the docs. It's 100 channels per BatchSpec and the entire request body must be <2MB.

hsghori · 2023-09-19T17:57:29Z

In the example you provided though:

    res = await client.request(
        "POST", "/messages", "3", None, batch_spec
    )

It looks like you're passing the batch_spec into the headers arg instead of the body.

hsghori · 2023-09-19T18:33:01Z

Also do y'all have a P50 for performance of the bulk endpoint (esp related to input size). One of my goals here is to ensure that customers aren't waiting too long on message publishing. My application supports a lot of bulk operations which send a sends one message per item being operated on. So I want to make sure that a bulk op that takes ~1-5s does not end up getting significantly slower on average.

owenpearson · 2023-09-20T15:55:32Z

It looks like you're passing the batch_spec into the headers arg instead of the body.

The body arg comes before headers, see:

ably-python/ably/rest/rest.py

Lines 122 to 123 in 9b7c951

    
           async def request(self, method: str, path: str, version: str, params: 
        
                             Optional[dict] = None, body=None, headers=None):

As for the batch API performance guarantees, I'll need to check with another team so will let you know when I have something to share. If possible, it might be worth testing some large queries yourself to see how it performs with large payloads?

sacOO7 · 2023-09-28T13:11:36Z

I'm trying to understand to what extent actual parallel usage of this client is safe / tested / supported.

For example, since I can't use bulk publish to send multiple messages in a single payload if those messages are from different channels, I've been looking into using python coroutines to reduce the amount of time I need to wait on the ably servers.

But it's not clear to me if this usage is actually safe from a state / resources perspective.
client = AblyRest(get_api_key())
with client:
    responses = asyncio.gather(*[
        client.channels.get(channel_name).publish(name, message) for channel_name, name, message in messages
    ])
Under the hood it looks like Channel.publish calls back to the shared ably.http client and that can actually set parameters like self.__host and self.__host_expires which feels fairly unsafe since different coroutines could be setting those parameters on the same client. But since channel.publish is an explicitly async method I'd expect it to be safe to parallelize.

So I'd love to understand to what extent async usage of this sdk is safe / how y'all would recommend we use the async capabilities of the sdk.

@hsghori currently, I am in a position to ans. this question. ably-python doesn't use multithreading, so asyncio default single threaded eventloop won't cause any synchronization issues 👍
You can check https://www.pythontutorial.net/python-concurrency/python-event-loop/

This was referenced Sep 28, 2023

Feature request: synchronous API variant #532

Closed

Make library thread safe using single eventloop, avoid multithreading #534

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation request - safe async usage #531

Documentation request - safe async usage #531

hsghori commented Sep 18, 2023 •

edited by sync-by-unito bot

Loading

sync-by-unito bot commented Sep 18, 2023

sacOO7 commented Sep 18, 2023 •

edited

Loading

owenpearson commented Sep 19, 2023

hsghori commented Sep 19, 2023

owenpearson commented Sep 19, 2023

hsghori commented Sep 19, 2023 •

edited

Loading

hsghori commented Sep 19, 2023

hsghori commented Sep 19, 2023 •

edited

Loading

owenpearson commented Sep 20, 2023

sacOO7 commented Sep 28, 2023 •

edited

Loading

Documentation request - safe async usage #531

Documentation request - safe async usage #531

Comments

hsghori commented Sep 18, 2023 • edited by sync-by-unito bot Loading

sync-by-unito bot commented Sep 18, 2023

sacOO7 commented Sep 18, 2023 • edited Loading

owenpearson commented Sep 19, 2023

hsghori commented Sep 19, 2023

owenpearson commented Sep 19, 2023

hsghori commented Sep 19, 2023 • edited Loading

hsghori commented Sep 19, 2023

hsghori commented Sep 19, 2023 • edited Loading

owenpearson commented Sep 20, 2023

sacOO7 commented Sep 28, 2023 • edited Loading

hsghori commented Sep 18, 2023 •

edited by sync-by-unito bot

Loading

sacOO7 commented Sep 18, 2023 •

edited

Loading

hsghori commented Sep 19, 2023 •

edited

Loading

hsghori commented Sep 19, 2023 •

edited

Loading

sacOO7 commented Sep 28, 2023 •

edited

Loading