-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log fatal exceptions in the scheduled metrics publisher #5
Conversation
The CloudWatchRecorder schedules a task to publish metrics periodically which will die silently if it encounters an exception. This catches and logs that exception.
initialDelay, | ||
publishFrequencySeconds * 1000, | ||
TimeUnit.MILLISECONDS); | ||
} | ||
|
||
/** | ||
* Signal that the recorder should shutdown. | ||
* Single that the recorder should shutdown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea where this came from.
try { | ||
this.sendAggregatedData(); | ||
} catch (Throwable t) { | ||
log.fatal("CloudWatch recorder encountered an unrecoverable error and will stop", t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be at warning level, or at most error. Problems in metric recording should not generally be assumed to be fatal to the application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the result of a discussion of the impact to applications when running without metrics, but I'll lower it to error
.
@@ -99,6 +99,14 @@ | |||
.build(); | |||
} | |||
|
|||
private final Runnable sendAggregatedDataRunnable = () -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This runnable seems excessive/unnecessary. Probably should catch and log or swallow the exception(s) lower down. In the sendData method perhaps.
For robustness I could see having some error handling logic around the actual sending, perhaps retrying if CloudWatch is being flaky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to ensure that ANY exception or error is caught, no matter how the code changes in the future. If I wrapped this around metricsClient.putMetricData( request )
, for example, someone could write code that is part of the runnable and throws, but we wouldn't catch and log it, leading to the exact problem we have today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about handling Throwables in the Executor?
Given there's a TODO to allow the Executor to be provided, then that would enable the behaviour to be configured there rather always being logged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that allowing an Executor to be injected adds any value for customers, since as a customer, I don't really care how my metrics get to CloudWatch, only that they get there. I don't think that having visibility into what is going wrong and why my metrics aren't showing up in CloudWatch is a special case that should require customers to inject something to handle it themselves.
Builder also allows passing in a ScheduledExecutorService for additional control over metric publishing execution.
Replaced by pull request #25. |
The
CloudWatchRecorder
schedules a task to publish metrics periodicallywhich will die silently if it encounters an exception. This catches and
logs that exception.