-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
azure-servicebus.AutoLockRenewer change ThreadPool usage #19362
Comments
hey @michaelcredera , thanks for your feedback, I'll take a look into it. -- You understanding is correct. The current implementation indeed has the implicit limitation -- one message per worker. This should be addressed.
thanks for the suggestions, they both sound like good plan. I'll allocate some time thinking about them, will get back to you after I figure it out. |
hey @michaelcredera , sorry we have limited resourcing currently. would you like to try making a PR for the change? I would be glad to review it😊 (It's completely fine if you don't have the time, just FYI we're an open-source library and we welcome user contributions) |
Hey @yunhaoling - I can try to get a PR up this week |
@yunhaoling Are there branching permissions on the repo? I didn't see anything in the contribution guide, but I'm getting a 403 error pushing my branch:
|
hey @michaelcredera , creating a branch directly in the azure-sdk-for-python repo is not allowed for external contributors. |
hey @michaelcredera , just friendly checking if you need any help on the PR? |
hey @michaelcredera , we have improved the implementation in the release v7.3.2 so that the message count that auto lock renewer could handle is not limited to the max_worker value, sorry for keeping you waiting and please try it out. however, it is still recommended that a large max_worker or executor of large max_worker is passed in the case that there're many objects to be renewed. |
Hi @michaelcredera. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text “ |
Hey @yunhaoling, I'd like to note here that this change seemed to be the cause of a significant performance degradation we've observed in a couple of our apps where processing times increased from ~1.7s to ~9s per message. We're running microservices which process one message at a time in a single process. However, since these services run as containers on an AKS with multi-core nodes, python still "thinks" it has multiple cores available and thus the Now, I don't want to call this out as a bug, mainly because we can't validate our claims, but also because we're able to mitigate this issue by simply setting |
hey @Gerrit-K , I sincerely apologize for the trouble that the change has caused to you and I totally agree upon that the computing resource are precious. I suspect that the 0.5s is the key factor here as the locker loop is executed as twice faster as before. we set the time interval to a shorter value because we want the auto lock renewer to be more "proactive" in renewing locks to keep messages/session live with short lock duration. I would be glad to roll it back to 1 if it helps mitigate the performance issue you're suffering from. Also I'll seek for ways to save CPU cycles. |
Hi @yunhaoling . Thanks for your hint.
I was honestly a bit sceptic with that being the culprit, as the performance decrease seemed to be too significant for that and also because the issue doesn't show for My next guess was that the I'll continue to dig through the changes and try to spot anything that might cause this behavior. |
hey @Gerrit-K , I took a closer look and think it might be the dispatcher worker that's adding the performance overhead. so basically when max_worker > 1, there's (1) one dispatcher worker thread which keeps checking the queue tasks, while (2) renew worker threads doing the real new lock work. The dispatcher thread didn't sleep while keep pumping the queue which seems to be the bottleneck here. what I am thinking is to add a sleep time in the end of the while loop within the _dispatch_worker as the following: def _dispatch_worker(self):
# ... codes
if time.time() - self._last_activity_timestamp >= self._dispatcher_timeout:
self._running_dispatcher.clear()
self._last_activity_timestamp = None
return
# [ADD A SLEEP HERE]
time.sleep(self._sleep_time) # save cpu cycles if there's currently no task in self._renew_tasks would you kindly help me to verify it? |
Hi @yunhaoling indeed that seems to be the culprit 👍 Now that I look at it again I'm wondering how I could not spot that ... I guess my attention was solely drawn towards the inner loop instead of the outer one. Anyhow, a simple |
@yunhaoling no worries! Our automated tests caught this one pretty quickly, so we didn't have a big impact - just some confusion 😉 Glad we could be of help detecting this! |
Is your feature request related to a problem? Please describe.
The azure.servicebus.AutoLockRenewer is implicitly limited to renewing locks for at most
max_workers
number of messages. For an application that takes many messages off of the queue for parallel processing, this can present problems.The azure.servicebus.AutoLockRenewer uses a single future inside of a ThreadPoolExecutor to monitor messages and renew locks. This means it's limited to monitoring at most
max_workers
messages at a time (since a thread pool only runs one future per-thread until the future completes).Describe the solution you'd like
An alternative implementation could instead use
max_workers
futures monitoring a set of messages per future rather than a single message per future.Describe alternatives you've considered
An alternative solution would be to implement parallelism with each worker taking a message off of the queue for itself to process. This is certainly possible, but there are still cases where taking more than
max_workers
messages off of the queue could be desirable (e.g. prefetching messages).The text was updated successfully, but these errors were encountered: