Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log batch job failures to Slack #2561

Merged
merged 9 commits into from
Mar 2, 2022
Merged

Log batch job failures to Slack #2561

merged 9 commits into from
Mar 2, 2022

Conversation

iamleeg
Copy link
Contributor

@iamleeg iamleeg commented Feb 28, 2022

  • script doesn't depend on reaching an event from lambda
  • copes with being rate limited by Slack
  • Documentation
  • Dockerfile
  • Deploy to ECR
  • Define job in Batch
  • Schedule from EventBridge

@abhidg
Copy link
Contributor

abhidg commented Feb 28, 2022

I'm not familiar with boto3 logs module, but will this only return the latest logs? We wouldn't want earlier logs coming in. Also would be good to hook it up to EventBridge event on Batch job failure -- may need some changes to handle the input event which is usually passed as a JSON payload.

Can add the batch job as a target for https://console.aws.amazon.com/events/home?region=us-east-1#/eventbus/default/rules/batch-job-failure once it can accept the event payload.

@iamleeg
Copy link
Contributor Author

iamleeg commented Feb 28, 2022

I'm not familiar with boto3 logs module, but will this only return the latest logs? We wouldn't want earlier logs coming in.

Do you mean to say that there isn't a separate log stream for each run of a parser? This makes it more difficult, as I need to know what the start and end dates are of the parser run I care about to filter the log stream. Do you know how I can get those?

Also would be good to hook it up to EventBridge event on Batch job failure -- may need some changes to handle the input event which is usually passed as a JSON payload.

Ah OK, I was hoping I'd be able to define the environment variables in the event but I'll look into doing that.

@abhidg
Copy link
Contributor

abhidg commented Feb 28, 2022

Do you mean to say that there isn't a separate log stream for each run of a parser? This makes it more difficult, as I need to know what the start and end dates are of the parser run I care about to filter the log stream. Do you know how I can get those?

Re-reading that bit, it is fine - as there is a separate log stream with the same prefix for each parser (x-x-ingestor-prod/*), which can be obtained from the event.

@@ -0,0 +1,71 @@
# `python-base` sets up all our shared environment variables
FROM python:3.9-slim as python-base
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3.10?

Copy link
Contributor Author

@iamleeg iamleeg Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had problems installing poetry in the image when I based it on python:3.10-slim so I punted it. I will check again, it may have been a transient issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be the new way of installing poetry https://github.com/python-poetry/install.python-poetry.org -- I'm wary of the direct call to install.python-poetry.org, but should be possible to pin to a git tag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. We were calling get-poetry from master too so we probably need to change this throughout the repo.

@iamleeg iamleeg requested review from jim-sheldon and abhidg March 2, 2022 16:11
@iamleeg iamleeg merged commit f4ada3b into main Mar 2, 2022
@iamleeg iamleeg deleted the 1564_error_logs_to_slack branch March 2, 2022 17:39
@abhidg abhidg changed the title 1564 error logs to slack Log batch job failures to Slack Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants