Release 0.4 - Tasks Error Handling and Logging Improvements #317

johnbaldwin · 2021-01-05T22:23:02Z

0.4 Tasks Error Handling and Logging Improvements

Restructured figures.tasks daily metrics functions

The purpose of restucturing is to ensure pipeline failures do not
escalate up the call chain:

A site failure should not impact any other site
A course failure should not cause site processing failure
- With the caveat that the failed course's data would not be reflected
  in the site aggregated data
An enrollment failure should not cause the course processing to fail
- with the caveat that the failed enrollment's data would not be
  reflected in the course aggregated data

See the module docstring in tests/tasks/test_daily_metrics.py for more
details

Logging has improvements. Namely, the logs have prefixes to help grep
through them for details. The information provided in the logs has
improved to identify the site, date for, and course id for the
processing and failures.

Logging to figures.models.PipelineError has been removed as it is not
providing benefit in its current form. Follow on work is providing post
pipeline reporting. This should provide better visibility into pipeline
data health

Restructured figures.tasks tests. Was 'tests/test_tasks.py'. Now there
are process specific modules in 'tests/tasks':

test_daily_metrics.py
test_monthly_metrics.py
test_mau_tasks.py - This may go away as it was never actually used in
production. MAU is captured monthly in figures.models.SiteMonthlyMetrics
and MAU 2G (second generation) is going to be finally implemented soon
we hope

Minor fix - pipeline course daily metrics now cases 'course_id' to string when
working with figures.models.CourseDailyMetrics. This is to prevent
errors if a CourseKey is used instead of the string representation of
the course id. (redacted rant on CourseKey)

Add comments to backfill.backfill_enrollment_data_for_site

Added comments as notes to improve updating enrollment data

1. Restructured figures.tasks daily metrics functions The purpose of restucturing is to ensure pipeline failures do not escalate up the call chain: * A site failure should not impact any other site * A course failure should not cause site processing failure * With the caveat that the failed course's data would not be reflected in the site aggregated data * An enrollment failure should not cause the course processing to fail * with the caveat that the failed enrollment's data would not be reflected in the course aggregated data See the module docstring in tests/tasks/test_daily_metrics.py for more details Logging has improvements. Namely, the logs have prefixes to help grep through them for details. The information provided in the logs has improved to identify the site, date for, and course id for the processing and failures. Logging to figures.models.PipelineError has been removed as it is not providing benefit in its current form. Follow on work is providing post pipeline reporting. This should provide better visibility into pipeline data health Restructured figures.tasks tests. Was 'tests/test_tasks.py'. Now there are process specific modules in 'tests/tasks': * test_daily_metrics.py * test_monthly_metrics.py * test_mau_tasks.py - This may go away as it was never actually used in production. MAU is captured monthly in figures.models.SiteMonthlyMetrics and MAU 2G (second generation) is going to be finally implemented soon we hope Minor fix - pipeline course daily metrics now cases 'course_id' to string when working with figures.models.CourseDailyMetrics. This is to prevent errors if a CourseKey is used instead of the string representation of the course id. (redacted rant on CourseKey)

johnbaldwin · 2021-01-05T22:24:41Z

figures/pipeline/course_daily_metrics.py

@@ -308,7 +308,7 @@ def save_metrics(self, date_for, data):
        """

        cdm, created = CourseDailyMetrics.objects.update_or_create(
-            course_id=self.course_id,
+            course_id=str(self.course_id),


This is so it won't fail if course_id is a CourseKey

johnbaldwin · 2021-01-05T22:24:48Z

figures/pipeline/course_daily_metrics.py

@@ -339,7 +339,7 @@ def load(self, date_for=None, force_update=False, **_kwargs):
        """
        date_for = pipeline_date_for_rule(date_for)
        try:
-            cdm = CourseDailyMetrics.objects.get(course_id=self.course_id,
+            cdm = CourseDailyMetrics.objects.get(course_id=str(self.course_id),


This is so it won't fail if course_id is a CourseKey

johnbaldwin · 2021-01-05T22:25:28Z

devsite/devsite/settings.py

@@ -52,8 +53,6 @@
 # Set the default Site (django.contrib.sites.models.Site)
 SITE_ID = 1

-# TODO: Update this to allow environment variable override
-ENABLE_DEVSITE_CELERY = env('ENABLE_DEVSITE_CELERY')


Removed duplicate

johnbaldwin · 2021-01-05T22:26:41Z

figures/tasks.py

+    try:
+        site = Site.objects.get(id=site_id)
+    except Site.DoesNotExist:
+        msg = 'add errro message'


Yep, I'll update to add a real error message

johnbaldwin · 2021-01-05T22:28:41Z

figures/tasks.py

               ' for site_id={}'.format(site_id))
        logger.exception(msg)


+# TODO: Sites iterator with entry and exit logging


Just a note and I want to leave it here as a reminder

johnbaldwin · 2021-01-05T22:29:03Z

figures/tasks.py


-    parallel, then when they are all done, populates the site metrics. See the
-    function ``experimental_populate_daily_metrics`` docstring for details
+    Deveoper note: Errors need to be handled at each layer in the call chain


Typo to fix "Developer"

johnbaldwin · 2021-01-05T22:32:13Z

tests/tasks/test_daily_tasks.py

+        # At least one with and without `message_dict`
+        raise FakeException('Hey!')
+
+    # def fake_pop_single_sdm(**_kwargs):


dead code I'll remove

johnbaldwin · 2021-01-05T22:33:17Z

tests/tasks/test_daily_tasks.py

+
+
+@pytest.mark.skipif(OPENEDX_RELEASE == GINKGO,
+                    reason='Broken test. Apparent Django 1.8 incompatibility')


TODO: remove 'Broken test' from reason

johnbaldwin · 2021-01-05T22:34:13Z

tests/tasks/test_monthly_tasks.py

+
+
+@pytest.mark.skipif(OPENEDX_RELEASE == GINKGO,
+                    reason='Broken test. Apparent Django 1.8 incompatibility')


TODO: remove 'Broken test' from reason

OmarIthawi

Thanks @johnbaldwin! LGTM. I've added one question.

OmarIthawi · 2021-01-06T05:42:28Z

figures/tasks.py

    logger.debug(
        'done running populate_site_daily_metrics for site_id={}'.format(site_id))


+@shared_task
+def populate_daily_metrics_for_site(site_id, date_for, force_update=False):


If I understand correctly, this is a task that we're now only calling as part of populate_daily_metrics but in theory we can run it independently later on. Correct?

Correct. I pulled it out of populate_daily_metrics so that we can in the future run asynchronously (using the .delay) but before we do, we need to make sure that RabbitMQ can handle it without blocking other async tasks

* Added docstring comments about mulitple orgs per site * Updated error handling in figures.tasks.populate_single_sdm. Now it logs when a site is not found, then rethrows the exception * updated figures/tasks/test_daily_tasks.py - improved testing of error cases

johnbaldwin added 2 commits December 29, 2020 21:10

Add comments to backfill.backfill_enrollment_data_for_site

83faead

Added comments as notes to improve updating enrollment data

johnbaldwin requested review from OmarIthawi and melvinsoft January 5, 2021 22:23

johnbaldwin commented Jan 5, 2021

View reviewed changes

johnbaldwin changed the title ~~John/0.4 pipeline error handling~~ 0.4 Tasks Error Handling and Logging Improvements Jan 5, 2021

johnbaldwin changed the title ~~0.4 Tasks Error Handling and Logging Improvements~~ Release 0.4 - Tasks Error Handling and Logging Improvements Jan 5, 2021

OmarIthawi approved these changes Jan 6, 2021

View reviewed changes

johnbaldwin merged commit f0d3a4f into master Jan 14, 2021

johnbaldwin deleted the john/0.4-pipeline-error-handling branch January 14, 2021 21:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 0.4 - Tasks Error Handling and Logging Improvements #317

Release 0.4 - Tasks Error Handling and Logging Improvements #317

johnbaldwin commented Jan 5, 2021 •

edited by OmarIthawi

Loading

johnbaldwin Jan 5, 2021

johnbaldwin Jan 5, 2021

johnbaldwin Jan 5, 2021

johnbaldwin Jan 5, 2021

johnbaldwin Jan 5, 2021

johnbaldwin Jan 5, 2021

johnbaldwin Jan 5, 2021

johnbaldwin Jan 5, 2021

johnbaldwin Jan 5, 2021

OmarIthawi left a comment

OmarIthawi Jan 6, 2021

johnbaldwin Jan 6, 2021



		@pytest.mark.skipif(OPENEDX_RELEASE == GINKGO,
		reason='Broken test. Apparent Django 1.8 incompatibility')

Release 0.4 - Tasks Error Handling and Logging Improvements #317

Release 0.4 - Tasks Error Handling and Logging Improvements #317

Conversation

johnbaldwin commented Jan 5, 2021 • edited by OmarIthawi Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OmarIthawi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnbaldwin commented Jan 5, 2021 •

edited by OmarIthawi

Loading