Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log fatal exceptions in the scheduled metrics publisher #5

Closed
wants to merge 4 commits into from

Conversation

MasonHM
Copy link

@MasonHM MasonHM commented Jan 23, 2018

The CloudWatchRecorder schedules a task to publish metrics periodically
which will die silently if it encounters an exception. This catches and
logs that exception.

The CloudWatchRecorder schedules a task to publish metrics periodically
which will die silently if it encounters an exception. This catches and
logs that exception.
initialDelay,
publishFrequencySeconds * 1000,
TimeUnit.MILLISECONDS);
}

/**
* Signal that the recorder should shutdown.
* Single that the recorder should shutdown.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea where this came from.

try {
this.sendAggregatedData();
} catch (Throwable t) {
log.fatal("CloudWatch recorder encountered an unrecoverable error and will stop", t);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be at warning level, or at most error. Problems in metric recording should not generally be assumed to be fatal to the application.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the result of a discussion of the impact to applications when running without metrics, but I'll lower it to error.

@@ -99,6 +99,14 @@
.build();
}

private final Runnable sendAggregatedDataRunnable = () -> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This runnable seems excessive/unnecessary. Probably should catch and log or swallow the exception(s) lower down. In the sendData method perhaps.

For robustness I could see having some error handling logic around the actual sending, perhaps retrying if CloudWatch is being flaky.

Copy link
Author

@MasonHM MasonHM Jan 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to ensure that ANY exception or error is caught, no matter how the code changes in the future. If I wrapped this around metricsClient.putMetricData( request ), for example, someone could write code that is part of the runnable and throws, but we wouldn't catch and log it, leading to the exact problem we have today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about handling Throwables in the Executor?

Given there's a TODO to allow the Executor to be provided, then that would enable the behaviour to be configured there rather always being logged.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that allowing an Executor to be injected adds any value for customers, since as a customer, I don't really care how my metrics get to CloudWatch, only that they get there. I don't think that having visibility into what is going wrong and why my metrics aren't showing up in CloudWatch is a special case that should require customers to inject something to handle it themselves.

Mason Meier added 3 commits January 23, 2018 18:06
Builder also allows passing in a ScheduledExecutorService for additional
control over metric publishing execution.
@MasonHM
Copy link
Author

MasonHM commented Jun 27, 2018

Replaced by pull request #25.

@MasonHM MasonHM closed this Jun 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants