-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: embed the synchronizer process within the plugin runner #333
Conversation
e223e50
to
5f29cba
Compare
e73a389
to
5f29cba
Compare
3c87abd
to
9659288
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overal LGTM.
Since we are using a single thread with concurrency, I would add a await asyncio.sleep(0.1)
at the end of the loop to prevent a busy loop when getting messages from the pubsub channel
1be3823
to
0856151
Compare
@djantzen it won't prevent other operations from occurring but we don't need to continuously using cpu and network to check for new messages. I would say that 1 second of delay it's totally fine. |
0856151
to
3e9d520
Compare
|
totally forgot that |
@djantzen @beaugunderson since we have the capability of specifying a timeout with |
then |
1 second, or a tenth of a second? I'm not entirely clear on the tradeoffs here. Seems like a longer timeout would make the listener more responsive to pubsub messages, at the cost of less responsiveness to events firing within the system. I'd think we want to prioritize the events rather than the reload commands. |
Initially, I thought the same—that calling The key reason is that, although the function signature and docstrings remain unchanged for Here’s what happens:
I agree that this isn't explicitly clear in the documentation, but after debugging the implementation, I can confirm that this is how it works. So, using |
Thanks for digging into the code @jamagalhaes . This makes sense to me. I was beginning to wonder about the whole point of |
@djantzen you're welcome. you still need to explicitly specify |
Actually, it may be unnecessary since we’re getting host as part of our other metrics already. I’ll remove it.
… On Jan 30, 2025, at 9:01 AM, José Magalhães ***@***.***> wrote:
@jamagalhaes commented on this pull request.
In settings.py <#333 (comment)>:
> +try:
+ # Aptible stores a unique identifier in this file
+ with open("/proc/sys/kernel/hostname") as file:
+ HOST_IDENTIFIER = file.read().strip()
+except FileNotFoundError:
+ HOST_IDENTIFIER = "localhost"
since this is not an env var, how it will be communicated to grafana? I'm just curious about it.
—
Reply to this email directly, view it on GitHub <#333 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALB6UD2XO5ID2Q3BRQFZZD2NJLFXAVCNFSM6AAAAABVIVS4ZWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKOBUGQ4TSMJRGE>.
You are receiving this because you were mentioned.
--
Disclaimer: This e-mail and any attachments may contain confidential
information. If you are not the intended recipient, any disclosure,
copying, distribution or use of any information contained herein is
strictly prohibited. If you have received this transmission in error,
please immediately notify the Security Officer
***@***.***>, and destroy the original transmission
and any attachments without reading or saving.
|
it's not used here so can be removed without changing anything but since these are statsd metrics they include the metrics from the environment as defined in telegraf.conf, namely:
none of which is the hostname, so if you need it you'll need to add it still |
n/m we discussed more here, telegraf will set it correctly https://canvas-medical.slack.com/archives/C1UGTHCKS/p1738271118532679 |
Would the problem manifest if Redis pubsub became overburdened and was slow to receive and acknowledge requests, thereby blocking other tasks?
TBH it didn’t even register with me that logs of these administrative actions would get routed through Redis. Do developers really need to see this activity?
… On Jan 30, 2025, at 9:15 AM, Christopher Sande ***@***.***> wrote:
@csande commented on this pull request.
In plugin_runner/plugin_runner.py <#333 (comment)>:
> @@ -192,14 +198,50 @@ async def ReloadPlugins(
self, request: ReloadPluginsRequest, context: Any
) -> AsyncGenerator[ReloadPluginsResponse, None]:
"""This is invoked when we need to reload plugins."""
+ log.info("Reloading plugins...")
Is this statement a blocking call, in addition to the other calls to the logger?
If you dig into our logging module, it looks like the PubSubBase class uses the non-asyncio version of Redis.
FWIW, we're also calling it a few times in HandleEvent, another asyncio context.
—
Reply to this email directly, view it on GitHub <#333 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALB6UCTRUAATKOPY6KEUET2NJM4NAVCNFSM6AAAAABVIVS4ZWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDKOBUGUZTOOJTHE>.
You are receiving this because you were mentioned.
--
Disclaimer: This e-mail and any attachments may contain confidential
information. If you are not the intended recipient, any disclosure,
copying, distribution or use of any information contained herein is
strictly prohibited. If you have received this transmission in error,
please immediately notify the Security Officer
***@***.***>, and destroy the original transmission
and any attachments without reading or saving.
|
I suppose this is theoretical until we see it happen, but if Redis were overburdened, it could block the main event loop. If the main event loop is blocked, then that would affect the ability of the plugin runner to continue handling incoming events. On the home-app side, It's worth mentioning that if we were using async Redis, and Redis were overburdened, we would still see problems. They would probably just manifest differently. Tasks would still have trouble running, but the reason would be due to Redis directly, rather than being blocked by other tasks. i.e. being blocked indirectly by Redis. When you introduce blocking calls in an asyncio project, it can be a chokepoint for everything. Even if Redis isn't overburdened, the Plugin Runner main event loop can't do anything else for the few milliseconds that it is communicating with the sync Redis client. There are a few things we could do.
|
@csande While this scenario is theoretically possible, I believe it would be one of the lesser concerns in practice. If our Redis instance were overburdened, many other components in home-app that rely on it would likely be impacted as well, not just the Plugin Runner. That said, you’re absolutely right in pointing out that |
Thank you for taking a look at this @jamagalhaes. I'm in agreement that this is theoretical and not a huge concern right now, but it's certainly a core tenet (i.e. this isn't premature optimization) to not make blocking calls in an asyncio framework. I don't know about our plans for scaling in the longer-term, but if we want the plugin runner to handle hundreds or thousands of events at once, those blocking calls to even a non-overburdened Redis will eventually add up, and the performance impact may be noticeable at some point. |
https://canvasmedical.atlassian.net/browse/KOALA-2440
At present, the plugin synchronizer runs as a separate process managed by Circus. Problems with doing so:
This PR moves the responsibilities of the synchronizer into an async task performed by the plugin runner.
Related home-app PR: https://github.com/canvas-medical/canvas/pull/17170