-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Failed Scheduled Queries Report (#3798)
* initial work on e-mail report for failed queries * send failure report only for scheduled queries and not for adhoc queries * add setting to determine if to send failure reports * add setting to determine interval of aggregated e-mail report * html templating of scheduled query failure report * break line * support timeouts for failure reports * aggregate errors within message and warn if approaching threshold * handle errors in QueryExecutor.run instead of on_failure * move failure report to its own module * indicate that failure count is since last report * copy changes * format with <code> * styling, copy and add a link to the query instead of the query text * separate reports with <hr> * switch to UTC * move <h2> to actual e-mail subject * add explicit message for SoftTimeLimitExceeded * fix test to use soft time limits * default query failure threshold to 100 * use base_url from utils * newlines. newlines everywhere. * remove redundant import * apply new design for failure report * use jinja to format the failure report * don't show comment block if no comment is provided * don't send emails if, for some reason, there are no available errors * subtract 1 from failure count, because the first one is represented by 'Last failed' * don't show '+X more failures' if there's only one * extract subject to variable * format as text, while we're at it * allow scrolling for long exception messages * test that e-mails are scheduled only when beneath limit * test for indicating when approaching report limits + refactor * test that failures are aggregated * test that report counts per query and reason * test that the latest failure occurence is reported * force sending reports for testing purposes * Update redash/templates/emails/failures.html Co-Authored-By: Ran Byron <ranbena@gmail.com> * Update redash/templates/emails/failures.html Co-Authored-By: Ran Byron <ranbena@gmail.com> * Update redash/tasks/failure_report.py * add org setting for email reports * remove logo from failure report email * correctly use the organization setting for sending failure reports * use user id as key for failure reports data structure * Update redash/tasks/failure_report.py Co-Authored-By: Arik Fraimovich <arik@arikfr.com> * build comments while creating context for e-mail templates * figure out the base url when creating the e-mail * no need to expire pending failure report keys as they are deleted anyway when sent * a couple of CodeClimate changes * refactor key creationg to a single location * refactor tests to send e-mail from a single function * use beat to schedule a periodic send_aggregated_errors task instead of using countdown per email * remove pending key as it is no longer required when a periodic task picks up the reports to send * a really important blank line. REALLY important. * Revert "a really important blank line. REALLY important." This reverts commit c7d8ed8. * a really important blank line. REALLY important. It is the best blank line. * don't send failure emails to disabled users
- Loading branch information
Omer Lachish
authored
Jul 28, 2019
1 parent
f165168
commit 7fb33e3
Showing
11 changed files
with
258 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
from .general import record_event, version_check, send_mail, sync_user_details | ||
from .queries import QueryTask, refresh_queries, refresh_schemas, cleanup_query_results, execute_query, empty_schedules | ||
from .alerts import check_alerts_for_query | ||
from .failure_report import notify_of_failure |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
import datetime | ||
import re | ||
from collections import Counter | ||
from flask import render_template | ||
from redash.tasks.general import send_mail | ||
from redash.worker import celery | ||
from redash import redis_connection, settings, models | ||
from redash.utils import json_dumps, json_loads, base_url | ||
|
||
|
||
def key(user_id): | ||
return 'aggregated_failures:{}'.format(user_id) | ||
|
||
|
||
def comment_for(failure): | ||
schedule_failures = failure.get('schedule_failures') | ||
if schedule_failures > settings.MAX_FAILURE_REPORTS_PER_QUERY * 0.75: | ||
return """NOTICE: This query has failed a total of {schedule_failures} times. | ||
Reporting may stop when the query exceeds {max_failure_reports} overall failures.""".format( | ||
schedule_failures=schedule_failures, | ||
max_failure_reports=settings.MAX_FAILURE_REPORTS_PER_QUERY | ||
) | ||
|
||
|
||
@celery.task(name="redash.tasks.send_aggregated_errors") | ||
def send_aggregated_errors(): | ||
for key in redis_connection.scan_iter(key("*")): | ||
user_id = re.search(r'\d+', key).group() | ||
send_failure_report(user_id) | ||
|
||
|
||
def send_failure_report(user_id): | ||
user = models.User.get_by_id(user_id) | ||
errors = [json_loads(e) for e in redis_connection.lrange(key(user_id), 0, -1)] | ||
|
||
if errors: | ||
errors.reverse() | ||
occurrences = Counter((e.get('id'), e.get('message')) for e in errors) | ||
unique_errors = {(e.get('id'), e.get('message')): e for e in errors} | ||
|
||
context = { | ||
'failures': [{ | ||
'id': v.get('id'), | ||
'name': v.get('name'), | ||
'failed_at': v.get('failed_at'), | ||
'failure_reason': v.get('message'), | ||
'failure_count': occurrences[k], | ||
'comment': comment_for(v) | ||
} for k, v in unique_errors.iteritems()], | ||
'base_url': base_url(user.org) | ||
} | ||
|
||
html = render_template('emails/failures.html', **context) | ||
text = render_template('emails/failures.txt', **context) | ||
subject = "Redash failed to execute {} of your scheduled queries".format(len(unique_errors.keys())) | ||
send_mail.delay([user.email], subject, html, text) | ||
|
||
redis_connection.delete(key(user_id)) | ||
|
||
|
||
def notify_of_failure(message, query): | ||
subscribed = query.org.get_setting('send_email_on_failed_scheduled_queries') | ||
exceeded_threshold = query.schedule_failures >= settings.MAX_FAILURE_REPORTS_PER_QUERY | ||
|
||
if subscribed and not query.user.is_disabled and not exceeded_threshold: | ||
redis_connection.lpush(key(query.user.id), json_dumps({ | ||
'id': query.id, | ||
'name': query.name, | ||
'message': message, | ||
'schedule_failures': query.schedule_failures, | ||
'failed_at': datetime.datetime.utcnow().strftime("%B %d, %Y %I:%M%p UTC") | ||
})) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
<html> | ||
|
||
<head> | ||
<style> | ||
body { | ||
margin: 0; | ||
} | ||
</style> | ||
</head> | ||
|
||
<body style="margin: 0;"> | ||
<div | ||
style="background: #f6f8f9; font-family: arial; font-size: 14px; padding: 20px; color: #333; line-height: 20px"> | ||
<h1 style="font-size: 14px; font-weight: normal;">Redash failed to run the following queries:</h1> | ||
|
||
{% for failure in failures %} | ||
<div style="background: white;padding: 5px 20px 20px 20px; border-radius: 3px; box-shadow: 0 4px 9px -3px rgba(102,136,153,.15); margin-top: 20px"> | ||
<h2 style="font-size: 16px; position: relative; padding-right: 90px;">{{failure.name}} <a href="{{base_url}}/queries/{{failure.id}}" style="background-color: #ff7964;font-size: 13px;color: white;text-decoration: none;padding: 0 5px;display: inline-block;border-radius: 3px;font-weight: normal;position: absolute;right: 0;top:-1px">Go | ||
to Query</a></h2> | ||
<div> | ||
<p> | ||
Last failed: {{failure.failed_at}} | ||
{% if failure.failure_count > 1 %} | ||
<br /> | ||
<small style="color: #8c8c8c;font-size: 13px;"> + {{failure.failure_count - 1}} more failures since last report</small> | ||
{% endif %} | ||
</p> | ||
<h3 style="font-size: 14px; font-weight:normal; margin-bottom: 6px">Exception</h3> | ||
<div style="background: #ececec;padding: 8px;border-radius: 3px;font-family: monospace; word-break: break-all;">{{failure.failure_reason}}</div> | ||
</div> | ||
|
||
{% if failure.comment %} | ||
<div style="margin-top: 25px;font-size: 13px;border-top: 1px solid #ececec;padding-top: 19px;">{{failure.comment}}</div> | ||
{% endif %} | ||
</div> | ||
{% endfor %} | ||
|
||
</div> | ||
</body> | ||
|
||
</html> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
Redash failed to run the following queries: | ||
|
||
{% for failure in failures %} | ||
|
||
{{failure.name}} ({{base_url}}/queries/{{failure.id}}) | ||
Last failed: {{failure.failed_at}}{% if failure.failure_count > 1 %} + {{failure.failure_count - 1}} more failures since last report{% endif %} | ||
Exception: | ||
|
||
{{failure.failure_reason}} | ||
|
||
{% if failure.comment %} | ||
{{failure.comment}} | ||
{% endif %} | ||
|
||
{% endfor %} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
from unittest import TestCase | ||
|
||
import mock | ||
from freezegun import freeze_time | ||
import dateutil | ||
|
||
from tests import BaseTestCase | ||
from redash import redis_connection, models, settings | ||
from redash.tasks.failure_report import notify_of_failure, send_failure_report, send_aggregated_errors, key | ||
from redash.utils import json_loads | ||
|
||
class TestSendAggregatedErrorsTask(BaseTestCase): | ||
def setUp(self): | ||
super(TestSendAggregatedErrorsTask, self).setUp() | ||
redis_connection.flushall() | ||
self.factory.org.set_setting('send_email_on_failed_scheduled_queries', True) | ||
|
||
def notify(self, message="Oh no, I failed!", query=None, **kwargs): | ||
if query is None: | ||
query = self.factory.create_query(**kwargs) | ||
|
||
notify_of_failure(message, query) | ||
return key(query.user.id) | ||
|
||
@mock.patch('redash.tasks.failure_report.render_template') | ||
def send_email(self, user, render_template): | ||
send_failure_report(user.id) | ||
|
||
_, context = render_template.call_args | ||
return context['failures'] | ||
|
||
def test_schedules_email_if_failure_count_is_beneath_limit(self): | ||
key = self.notify(schedule_failures=settings.MAX_FAILURE_REPORTS_PER_QUERY - 1) | ||
email_pending = redis_connection.exists(key) | ||
self.assertTrue(email_pending) | ||
|
||
def test_does_not_report_if_failure_count_is_beyond_limit(self): | ||
key = self.notify(schedule_failures=settings.MAX_FAILURE_REPORTS_PER_QUERY) | ||
email_pending = redis_connection.exists(key) | ||
self.assertFalse(email_pending) | ||
|
||
def test_does_not_report_if_organization_is_not_subscribed(self): | ||
self.factory.org.set_setting('send_email_on_failed_scheduled_queries', False) | ||
key = self.notify() | ||
email_pending = redis_connection.exists(key) | ||
self.assertFalse(email_pending) | ||
|
||
def test_does_not_report_if_query_owner_is_disabled(self): | ||
self.factory.user.disable() | ||
key = self.notify() | ||
email_pending = redis_connection.exists(key) | ||
self.assertFalse(email_pending) | ||
|
||
def test_does_not_indicate_when_not_near_limit_for_a_query(self): | ||
self.notify(schedule_failures=settings.MAX_FAILURE_REPORTS_PER_QUERY / 2) | ||
failures = self.send_email(self.factory.user) | ||
|
||
self.assertFalse(failures[0]['comment']) | ||
|
||
def test_indicates_when_near_limit_for_a_query(self): | ||
self.notify(schedule_failures=settings.MAX_FAILURE_REPORTS_PER_QUERY - 1) | ||
failures = self.send_email(self.factory.user) | ||
|
||
self.assertTrue(failures[0]['comment']) | ||
|
||
def test_aggregates_different_queries_in_a_single_report(self): | ||
key1 = self.notify(message="I'm a failure") | ||
key2 = self.notify(message="I'm simply not a success") | ||
|
||
self.assertEqual(key1, key2) | ||
|
||
def test_counts_failures_for_each_reason(self): | ||
query = self.factory.create_query() | ||
|
||
self.notify(message="I'm a failure", query=query) | ||
self.notify(message="I'm a failure", query=query) | ||
self.notify(message="I'm a different type of failure", query=query) | ||
self.notify(message="I'm a totally different query") | ||
|
||
failures = self.send_email(query.user) | ||
|
||
f1 = next(f for f in failures if f["failure_reason"] == "I'm a failure") | ||
self.assertEqual(2, f1['failure_count']) | ||
f2 = next(f for f in failures if f["failure_reason"] == "I'm a different type of failure") | ||
self.assertEqual(1, f2['failure_count']) | ||
f3 = next(f for f in failures if f["failure_reason"] == "I'm a totally different query") | ||
self.assertEqual(1, f3['failure_count']) | ||
|
||
def test_shows_latest_failure_time(self): | ||
query = self.factory.create_query() | ||
|
||
with freeze_time("2000-01-01"): | ||
self.notify(query=query) | ||
|
||
self.notify(query=query) | ||
|
||
failures = self.send_email(query.user) | ||
latest_failure = dateutil.parser.parse(failures[0]['failed_at']) | ||
self.assertNotEqual(2000, latest_failure.year) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters