Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 0.4 - Tasks Error Handling and Logging Improvements #317

Merged
merged 3 commits into from
Jan 14, 2021

Conversation

johnbaldwin
Copy link
Contributor

@johnbaldwin johnbaldwin commented Jan 5, 2021

0.4 Tasks Error Handling and Logging Improvements

  1. Restructured figures.tasks daily metrics functions

The purpose of restucturing is to ensure pipeline failures do not
escalate up the call chain:

  • A site failure should not impact any other site
  • A course failure should not cause site processing failure
    • With the caveat that the failed course's data would not be reflected
      in the site aggregated data
  • An enrollment failure should not cause the course processing to fail
    • with the caveat that the failed enrollment's data would not be
      reflected in the course aggregated data

See the module docstring in tests/tasks/test_daily_metrics.py for more
details

Logging has improvements. Namely, the logs have prefixes to help grep
through them for details. The information provided in the logs has
improved to identify the site, date for, and course id for the
processing and failures.

Logging to figures.models.PipelineError has been removed as it is not
providing benefit in its current form. Follow on work is providing post
pipeline reporting. This should provide better visibility into pipeline
data health

Restructured figures.tasks tests. Was 'tests/test_tasks.py'. Now there
are process specific modules in 'tests/tasks':

  • test_daily_metrics.py
  • test_monthly_metrics.py
  • test_mau_tasks.py - This may go away as it was never actually used in
    production. MAU is captured monthly in figures.models.SiteMonthlyMetrics
    and MAU 2G (second generation) is going to be finally implemented soon
    we hope

Minor fix - pipeline course daily metrics now cases 'course_id' to string when
working with figures.models.CourseDailyMetrics. This is to prevent
errors if a CourseKey is used instead of the string representation of
the course id. (redacted rant on CourseKey)

Add comments to backfill.backfill_enrollment_data_for_site

Added comments as notes to improve updating enrollment data

Added comments as notes to improve updating enrollment data
1. Restructured figures.tasks daily metrics functions

The purpose of restucturing is to ensure pipeline failures do not
escalate up the call chain:

* A site failure should not impact any other site
* A course failure should not cause site processing failure
  * With the caveat that the failed course's data would not be reflected
    in the site aggregated data
* An enrollment failure should not cause the course processing to fail
  * with the caveat that the failed enrollment's data would not be
    reflected in the course aggregated data

See the module docstring in tests/tasks/test_daily_metrics.py for more
details

Logging has improvements. Namely, the logs have prefixes to help grep
through them for details. The information provided in the logs has
improved to identify the site, date for, and course id for the
processing and failures.

Logging to figures.models.PipelineError has been removed as it is not
providing benefit in its current form. Follow on work is providing post
pipeline reporting. This should provide better visibility into pipeline
data health

Restructured figures.tasks tests. Was 'tests/test_tasks.py'. Now there
are process specific modules in 'tests/tasks':

* test_daily_metrics.py
* test_monthly_metrics.py
* test_mau_tasks.py - This may go away as it was never actually used in
production. MAU is captured monthly in figures.models.SiteMonthlyMetrics
and MAU 2G (second generation) is going to be finally implemented soon
we hope

Minor fix - pipeline course daily metrics now cases 'course_id' to string when
working with figures.models.CourseDailyMetrics. This is to prevent
errors if a CourseKey is used instead of the string representation of
the course id. (redacted rant on CourseKey)
@@ -308,7 +308,7 @@ def save_metrics(self, date_for, data):
"""

cdm, created = CourseDailyMetrics.objects.update_or_create(
course_id=self.course_id,
course_id=str(self.course_id),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so it won't fail if course_id is a CourseKey

@@ -339,7 +339,7 @@ def load(self, date_for=None, force_update=False, **_kwargs):
"""
date_for = pipeline_date_for_rule(date_for)
try:
cdm = CourseDailyMetrics.objects.get(course_id=self.course_id,
cdm = CourseDailyMetrics.objects.get(course_id=str(self.course_id),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so it won't fail if course_id is a CourseKey

@@ -52,8 +53,6 @@
# Set the default Site (django.contrib.sites.models.Site)
SITE_ID = 1

# TODO: Update this to allow environment variable override
ENABLE_DEVSITE_CELERY = env('ENABLE_DEVSITE_CELERY')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed duplicate

figures/tasks.py Outdated
try:
site = Site.objects.get(id=site_id)
except Site.DoesNotExist:
msg = 'add errro message'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'll update to add a real error message

' for site_id={}'.format(site_id))
logger.exception(msg)


# TODO: Sites iterator with entry and exit logging
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note and I want to leave it here as a reminder

figures/tasks.py Outdated

parallel, then when they are all done, populates the site metrics. See the
function ``experimental_populate_daily_metrics`` docstring for details
Deveoper note: Errors need to be handled at each layer in the call chain
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo to fix "Developer"

# At least one with and without `message_dict`
raise FakeException('Hey!')

# def fake_pop_single_sdm(**_kwargs):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead code I'll remove



@pytest.mark.skipif(OPENEDX_RELEASE == GINKGO,
reason='Broken test. Apparent Django 1.8 incompatibility')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: remove 'Broken test' from reason



@pytest.mark.skipif(OPENEDX_RELEASE == GINKGO,
reason='Broken test. Apparent Django 1.8 incompatibility')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: remove 'Broken test' from reason

@johnbaldwin johnbaldwin changed the title John/0.4 pipeline error handling 0.4 Tasks Error Handling and Logging Improvements Jan 5, 2021
@johnbaldwin johnbaldwin changed the title 0.4 Tasks Error Handling and Logging Improvements Release 0.4 - Tasks Error Handling and Logging Improvements Jan 5, 2021
Copy link
Contributor

@OmarIthawi OmarIthawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @johnbaldwin! LGTM. I've added one question.

logger.debug(
'done running populate_site_daily_metrics for site_id={}'.format(site_id))


@shared_task
def populate_daily_metrics_for_site(site_id, date_for, force_update=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this is a task that we're now only calling as part of populate_daily_metrics but in theory we can run it independently later on. Correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I pulled it out of populate_daily_metrics so that we can in the future run asynchronously (using the .delay) but before we do, we need to make sure that RabbitMQ can handle it without blocking other async tasks

* Added docstring comments about mulitple orgs per site
* Updated error handling in figures.tasks.populate_single_sdm. Now it
logs when a site is not found, then rethrows the exception
* updated figures/tasks/test_daily_tasks.py - improved testing of error
cases
@johnbaldwin johnbaldwin merged commit f0d3a4f into master Jan 14, 2021
@johnbaldwin johnbaldwin deleted the john/0.4-pipeline-error-handling branch January 14, 2021 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants