Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix detected blocking call inside the event loop #1120

Closed
wants to merge 1 commit into from

Conversation

chemelli74
Copy link

Description of Change

Fix detected blocking call inside the event loop. Reference HomeAssistant latest build:

2024-06-12 09:40:41.237 WARNING (MainThread) [homeassistant.util.loop] Detected blocking call to listdir inside the event loop by integration 'aws'
at homeassistant/components/aws/notify.py, line 38: return await session.get_available_regions(service) (offender: /usr/local/lib/python3.12/site-pa
ckages/botocore/loaders.py, line 311: api_versions = os.listdir(full_dirname)), please create a bug report at https://github.com/home-assistant/core
/issues?q=is%3Aopen+is%3Aissue+label%3A%22integration%3A+aws%22

Assumptions

none

Checklist for All Submissions

  • I have added change info to CHANGES.rst
  • If this is resolving an issue (needed so future developers can determine if change is still necessary and under what conditions) (can be provided via link to issue with these details):
    • Detailed description of issue
    • Alternative methods considered (if any)
    • How issue is being resolved
    • How issue can be reproduced
  • If this is providing a new feature (can be provided via link to issue with these details):
    • Detailed description of new feature
    • Why needed
    • Alternatives methods considered (if any)

Checklist when updating botocore and/or aiohttp versions

  • I have read and followed CONTRIBUTING.rst
  • I have updated test_patches.py where/if appropriate (also check if no changes necessary)
  • I have ensured that the awscli/boto3 versions match the updated botocore version

@thehesiod
Copy link
Collaborator

sessions should be cached for the life of use so if this is a major issue the session should be kept around for a longer time. really botocore needs to speed this up on their end because it's way too slow

@chemelli74
Copy link
Author

Until then we have a fix anyway.
Maybe we add a comment about that.

@thehesiod
Copy link
Collaborator

I'm hesitant of doing this as run_in_executor triggers a thread pool to be created (if you haven't called it before) which can cause overhead. Further the executor should be configurable..this really opens a can of worms.

@bdraco
Copy link
Member

bdraco commented Jun 14, 2024

It does looks like load_service_model calls are cached, but it does do blocking I/O in the event loop the first time its created. Ideally we would only call it in the executor the first time it needs to be created.

run_in_executor being called with None (default executor) seems fine to me as we do it all over the place in aiohttp and that is the prescribed way to run blocking code https://docs.python.org/3/library/asyncio-dev.html#running-blocking-code

@chemelli74
Copy link
Author

Any feedback @thehesiod ? Thank you in advance

@thehesiod
Copy link
Collaborator

so this seems like an issue with the aws component, why doesn't it do the session/client creation during async_setup instead? That seems like the correct fix no?

@thehesiod
Copy link
Collaborator

like i was saying, it should cache the session/client there, or if you don't want to hold onto the client, perhaps just pre-warm up the botocore cache

@thehesiod
Copy link
Collaborator

actually, this is invalid as botocore is not thread safe, with this you could have two requests, running in two different threads, trying to initialize the cache at the same time. closing

@thehesiod thehesiod closed this Jun 25, 2024
@bdraco
Copy link
Member

bdraco commented Jun 25, 2024

@chemelli74 Thanks for trying to fix this. This problem must be solved eventually (unless the intent is WONTFIX). Please open an issue instead to track this in the long term. Thanks.

@thehesiod
Copy link
Collaborator

ya let me know if I missed something, I truly believe this is a botocore issue. The fact they're reading a ton of JSON files is unsustainable. probably should be done on the fly per service when you use it...and even then should be in a post-processed state, like their build should generate python code based on those json files.

@chemelli74 chemelli74 deleted the chemelli74-fix-io branch June 25, 2024 07:11
@thehesiod
Copy link
Collaborator

so I looked a little into this, it seems like they use an instance_cache which basically just sets an attribute on the class, so presumably worst case scenario with this fix is you'd initialize the cache multiple times. However I really want to avoid adding threads to this library if possible. Looks like there are some async file alternatives like https://github.com/qweeze/uring_file for debian, I'd be open to something like that.

@chemelli74
Copy link
Author

What about pathlib ?
Is used all over

@thehesiod
Copy link
Collaborator

that uses aiofiles which just uses a threadpool again: https://github.com/Tinche/aiofiles/tree/main/src/aiofiles

@thehesiod
Copy link
Collaborator

i guess at the aiofiles level though it doesn't matter. But it sounds like there's a better solution

@chemelli74
Copy link
Author

chemelli74 commented Jul 17, 2024

i guess at the aiofiles level though it doesn't matter. But it sounds like there's a better solution

If you point me towards the solution you prefer, I'll try to work on it.
I really would like to fix this issue and it's effect on Home Assistant.

EDIT: Opened issue on botocore repo

@chemelli74
Copy link
Author

Here the official answer from botocore repo: boto/botocore#3222

What can we do to progress then ?

@jakob-keller
Copy link
Collaborator

jakob-keller commented Jul 24, 2024

edit: sorry, this is basically what @thehesiod suggested #1120 (comment). I also think it's the obvious solution for that HomeAssistant specific issue.

This might be a crazy idea, but if

  1. botocore caches the stuff it loads from disk in global state (not thread-local)
  2. aiobotocore uses the same cache
  3. HomeAssistant supports set-up hooks for the AWS integration

then

  1. The set up code in HomeAssistant could pre-populate the botocore cache in a non-blocking fashion, i.e. via asyncio.to_thread()
  2. Eventual calls to aiobotocore would no longer block and leverage the pre-warmed cache

This would be a fix over at HomeAssistant. What do you think?

@bdraco
Copy link
Member

bdraco commented Jul 24, 2024

This doesn’t seem like a Home Assistant specific issue. Any asyncio application is going to have its event loop blocked. The only difference is that Home Assistant is aware of its loop being blocked where-as another application may not be able to detect it.

@jakob-keller
Copy link
Collaborator

botocore (or us?) could provide some sort of synchronuous helper function to pre-populate the cache. Home Assistant or anyone else could then choose to run it as part of their setup routine, e.g. via asyncio.to_thread() and no one else would be affected. Does that make sense?

Btw, there are Python platforms that lack multithreading support. That's why I think @thehesiod has a point in not resorting to helper threads in this project unless absolutely necessary.

@thehesiod
Copy link
Collaborator

this isn't an aiobotocore problem I don't think, but a botocore problem, or do you need to go through an async call to get to the cache warmup? secondly they can already do this themselves without the need for a helper

@chemelli74
Copy link
Author

this isn't an aiobotocore problem I don't think, but a botocore problem.

Why you think so ?
This is a async issue and botocore is sync.
The main point of aiobotocore is to transform botocore to async, so IMHO it's a issue in this library
.

@thehesiod
Copy link
Collaborator

I suppose if we find a good async file API to use then it would be an async problem. but then that wouldn't require an init function. in terms of an init function that would equally apply to sync, however you'll then potentially run into locking issues

@chemelli74
Copy link
Author

@thehesiod how can we progress ?
I really want to find a fix

@thehesiod
Copy link
Collaborator

@chemelli74 suggest doing research of a good async API for aiobotocore to use. This will be a huge project though. Furthermore there are better ways to warm-up aiobotocore, if homeassistant wants to do that I already gave some suggestions like: #1120 (comment). They should init and keep the aioboto clients around instead of creating them on the fly

@thehesiod
Copy link
Collaborator

this is really a solved issue, we due the same thing, like in aiohttp server using the on_startup callback to init your resources

@jakob-keller
Copy link
Collaborator

There is another approach we could investigate:

  1. botocore.loaders contains relevant code for "loading various model files", including blocking file IO. We have not turned that codepath asynchronous yet for reasons stated above.
  2. We could implement aiobotocore.loaders with async codepaths, while initially retaining the blocking low-level file IO calls. The event loop would still be blocked, albeit more granularly and for shorter periods of time. This would already be a substantial undertaking for us.
  3. We could then provide an alternative thread-pool loader which uses asyncio.to_thread() or similar to prevent blocking the event loop. This behaviour would be configurable (opt-in) and could be useful for many use cases, such as the one raised by @chemelli74.
  4. Advanced users could implement custom loaders and use an async file IO library of their choice. I doubt this will add much value for regular users today, but maybe async file IO becomes more standardised over time.

@thehesiod: What is your opinion?

@chemelli74: Does that make sense to you?

@bdraco
Copy link
Member

bdraco commented Aug 25, 2024

2. We could then provide an alternative thread-pool loader which uses asyncio.to_thread() or similar to prevent blocking the event loop. This behaviour would be configurable (opt-in) and could be useful for many use cases, such as the one raised by @chemelli74.

Something like this sounds like a good solution. Maybe a flag to enable loading models in the executor. Rough untested proof of concept.

diff --git a/aiobotocore/client.py b/aiobotocore/client.py
index 762ae52..4428e88 100644
--- a/aiobotocore/client.py
+++ b/aiobotocore/client.py
@@ -1,3 +1,4 @@
+import asyncio
 from botocore.awsrequest import prepare_request_dict
 from botocore.client import (
     BaseClient,
@@ -40,26 +41,17 @@ class AioClientCreator(ClientCreator):
         api_version=None,
         client_config=None,
         auth_token=None,
+        load_executor=False,
     ):
         responses = await self._event_emitter.emit(
             'choose-service-name', service_name=service_name
         )
         service_name = first_non_none_response(responses, default=service_name)   
-        service_model = self._load_service_model(service_name, api_version)
-        try:
-            endpoints_ruleset_data = self._load_service_endpoints_ruleset(
-                service_name, api_version
-            )
-            partition_data = self._loader.load_data('partitions')
-        except UnknownServiceError:
-            endpoints_ruleset_data = None
-            partition_data = None
-            logger.info(
-                'No endpoints ruleset found for service %s, falling back to '
-                'legacy endpoint routing.',
-                service_name,
-            )
-
+        if load_executor:
+            model_data = await asyncio.get_running_loop().run_in_executor(None,self._load_models, service_name, api_version)
+        else:
+            model_data = self._load_models(service_name, api_version)
+        service_model, endpoints_ruleset_data, partition_data = model_data
         cls = await self._create_client_class(service_name, service_model)
         region_name, client_config = self._normalize_fips_region(
             region_name, client_config
@@ -104,6 +96,23 @@ class AioClientCreator(ClientCreator):
         )
         return service_client
     
+    def _load_models(self, service_name, api_version):
+        service_model = self._load_service_model(service_name, api_version)
+        try:
+            endpoints_ruleset_data = self._load_service_endpoints_ruleset(
+                service_name, api_version
+            )
+            partition_data = self._loader.load_data('partitions')
+        except UnknownServiceError:
+            endpoints_ruleset_data = None
+            partition_data = None
+            logger.info(
+                'No endpoints ruleset found for service %s, falling back to '
+                'legacy endpoint routing.',
+                service_name,
+            )
+        return service_model, endpoints_ruleset_data, partition_data
+
     async def _create_client_class(self, service_name, service_model):
         class_attributes = self._create_methods(service_model)
         py_name_to_operation_name = self._create_name_mapping(service_model)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants