-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix slowdown and stoppage in the main event loop #499
Conversation
- Prevent the event loop from exiting immediately if it receives a stale event marked InProgress - Periodically log the size of the event queue. - Periodically "garbage-collect" the event queue to trim already-handled events and slow the rate of its unbounded growth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think GC of the event store is a good work around. Also, in the drainOrCordonIfNecessary
func, the err case can probably be used to delete the event from the store and rely on the sqs visibility timeout elapsing where another instance of NTH can retry or the same, wdyt?
I think you're right that the correct thing to do is to delete the event completely if |
turns out the answer to "why didn't I put the logging and GC stuff into the event store itself?" is that I didn't want to risk creating deadlocks, but I believe the deadlock-detecting unit test has saved me from myself. I do think this might be a tidier implementation now. |
I have sneaked one more important bugfix into this PR: I got NTH to run out of worker processes, and it started trying to log messages as fast as it possibly could, which is not the desired behavior! Now it logs at most one message per second. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm thanks for submitting this!
Issue #, if available: #498
Description of changes:
Prevent the event loop from receiving stale events marked InProgress and exiting immediately, before reaching urgent events that need handling.
Periodically log the size of the event queue.
Periodically "garbage-collect" the event queue to trim already-handled events and slow the rate of its unbounded growth.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.