Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation request - safe async usage #531

Open
hsghori opened this issue Sep 18, 2023 · 10 comments
Open

Documentation request - safe async usage #531

hsghori opened this issue Sep 18, 2023 · 10 comments

Comments

@hsghori
Copy link

hsghori commented Sep 18, 2023

I'm trying to understand to what extent actual parallel usage of this client is safe / tested / supported.

For example, since I can't use bulk publish to send multiple messages in a single payload if those messages are from different channels, I've been looking into using python coroutines to reduce the amount of time I need to wait on the ably servers.

But it's not clear to me if this usage is actually safe from a state / resources perspective.

client = AblyRest(get_api_key())
with client:
    responses = asyncio.gather(*[
        client.channels.get(channel_name).publish(name, message) for channel_name, name, message in messages
    ])

Under the hood it looks like Channel.publish calls back to the shared ably.http client and that can actually set parameters like self.__host and self.__host_expires which feels fairly unsafe since different coroutines could be setting those parameters on the same client. But since channel.publish is an explicitly async method I'd expect it to be safe to parallelize.

So I'd love to understand to what extent async usage of this sdk is safe / how y'all would recommend we use the async capabilities of the sdk.

┆Issue is synchronized with this Jira Task by Unito

@sync-by-unito
Copy link

sync-by-unito bot commented Sep 18, 2023

➤ Automation for Jira commented:

The link to the corresponding Jira issue is https://ably.atlassian.net/browse/SDK-3856

@sacOO7
Copy link
Collaborator

sacOO7 commented Sep 18, 2023

@hsghori I have notified the author of the of the library who wrote asyncio part, we will get back to you once available 👍

@owenpearson
Copy link
Member

Hey @hsghori, thanks for reaching out!

I understand your concern with asynchronous operations changing internal state, however we have aimed to design the library so that it 'just works' without needing to worry too much about concurrency, so I don't have any specific advice on how to use the async methods.

In this particular case, when the internal http client encounters an error which necessitates the use of a fallback host, it will iterate through fallback hosts and if it finds a working host it will temporarily save that as a preference. I suppose there are cases where several requests fail at once and the fallback mechanism would occur for each failure with each overwriting the host preference when finished, but it's quite rare for this to happen and there are only 5 fallback hosts to try so, while it's not the perfect behaviour, I don't consider it to be unsafe or otherwise unacceptable.

If this behaviour is unacceptable for you, an alternative and more efficient way to publish multiple messages to multiple channels is to use our batch API. We don't have first class support for it in ably-python just yet, however you can still make use of the SDKs auth and retry mechanism by querying the api with the generic AblyRest.request method. If you do go down this route I'm happy to answer any questions or share an example if needed.

@hsghori
Copy link
Author

hsghori commented Sep 19, 2023

@owenpearson thanks for the response! I would love an example if you have time. Also is there a limit to the total number of messages that can be sent via the bulk API at once? And is the 10s timeout sufficient for using the bulk API?

@owenpearson
Copy link
Member

Hey @hsghori, here's an example of using the batch API. Note that while this example publishes a single 'batch spec', you can also publish an array of batch specs in order to send different messages to a different collection of channels. The results for each batch spec contain a successCount, failureCount, and a results array. The recommended way to use these is to check if failureCount is nonzero, and if so, loop through the results array to find which channels have an error field.

There is no limit to the number of messages, however there is a limit of 100 channels and 2MiB request body per request. And yes, the 10s timeout is fine for this endpoint.

import asyncio
from ably import AblyRest

async def main():
    client = AblyRest(key="your_api_key")

    batch_spec = {
        "channels": ["channel1", "channel2", "channel3"],
        "messages": [{"data": "message1"}, {"data": "message2"}],
    }
    res = await client.request(
        "POST", "/messages", "3", None, batch_spec
    )

    # an array of BatchResults, one for each BatchSpec, each containing a 
    # successCount and a failureCount indicating the number of channels to 
    # which the batch of messages was published successfully
    print(res.items)

asyncio.run(main())

@hsghori
Copy link
Author

hsghori commented Sep 19, 2023

To clarify, is the channel limit BatchSpec or per request in total. In the api docs I see that we can have a request like:

[
  {
    channels: ['channel1', 'channel2'],
    messages: {data: 'My message'}
  },
  {
    channels: 'channel3',
    messages: [
      {data: 'My message'},
      {name: 'An event', data: 'My event message contents'},
    ]
  }
]

Is that entire array limited to 100 channels or is each BatchSpec in the array limitted to 100 channels. If it's the latter is there a limit on the number of BatchSpecs that can be sent at once?

Edit: I see that's explained in the docs. It's 100 channels per BatchSpec and the entire request body must be <2MB.

@hsghori
Copy link
Author

hsghori commented Sep 19, 2023

In the example you provided though:

    res = await client.request(
        "POST", "/messages", "3", None, batch_spec
    )

It looks like you're passing the batch_spec into the headers arg instead of the body.

@hsghori
Copy link
Author

hsghori commented Sep 19, 2023

Also do y'all have a P50 for performance of the bulk endpoint (esp related to input size). One of my goals here is to ensure that customers aren't waiting too long on message publishing. My application supports a lot of bulk operations which send a sends one message per item being operated on. So I want to make sure that a bulk op that takes ~1-5s does not end up getting significantly slower on average.

@owenpearson
Copy link
Member

It looks like you're passing the batch_spec into the headers arg instead of the body.

The body arg comes before headers, see:

async def request(self, method: str, path: str, version: str, params:
Optional[dict] = None, body=None, headers=None):

As for the batch API performance guarantees, I'll need to check with another team so will let you know when I have something to share. If possible, it might be worth testing some large queries yourself to see how it performs with large payloads?

@sacOO7
Copy link
Collaborator

sacOO7 commented Sep 28, 2023

I'm trying to understand to what extent actual parallel usage of this client is safe / tested / supported.

For example, since I can't use bulk publish to send multiple messages in a single payload if those messages are from different channels, I've been looking into using python coroutines to reduce the amount of time I need to wait on the ably servers.

But it's not clear to me if this usage is actually safe from a state / resources perspective.

client = AblyRest(get_api_key())
with client:
    responses = asyncio.gather(*[
        client.channels.get(channel_name).publish(name, message) for channel_name, name, message in messages
    ])

Under the hood it looks like Channel.publish calls back to the shared ably.http client and that can actually set parameters like self.__host and self.__host_expires which feels fairly unsafe since different coroutines could be setting those parameters on the same client. But since channel.publish is an explicitly async method I'd expect it to be safe to parallelize.

So I'd love to understand to what extent async usage of this sdk is safe / how y'all would recommend we use the async capabilities of the sdk.

@hsghori currently, I am in a position to ans. this question. ably-python doesn't use multithreading, so asyncio default single threaded eventloop won't cause any synchronization issues 👍
You can check https://www.pythontutorial.net/python-concurrency/python-event-loop/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants