Skip to content

Commit

Permalink
cron check job status
Browse files Browse the repository at this point in the history
  • Loading branch information
Rhett-Ying committed Nov 9, 2023
1 parent a813233 commit e929582
Show file tree
Hide file tree
Showing 2 changed files with 147 additions and 3 deletions.
19 changes: 16 additions & 3 deletions .github/workflows/continuous_integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,12 +47,25 @@ jobs:
python3 -m pip install pytest
pip3 install boto3
- name: Submit Job
if: ${{ github.event_name == 'push' }}
id: submit-job
shell: bash
run: |
echo "Start submitting job - Check"
python3 ./submitJob.py --job-type CI-CPU --name hello_DGL-pytest-check-'${{ github.ref }}' \
--command "${{ env.COMMAND-PYTEST }}" \
--remote https://github.com/'${{ github.repository }}' \
--source-ref '${{ github.ref }}' \
--wait
--source-ref '${{ github.ref }}'
- name: Check batch job status
id: check-job-status
shell: bash
run: |
echo "Start checking job status - Check"
python3 ./checkJobStatus.py --job-id ${{ steps.submit-job.outputs }} \
--job-name hello_DGL-pytest-check-'${{ github.ref }}'
schedule: # execute every 2 minutes
- cron: '*/2 * * * *'
- name: Exit if job status finished
shell: bash
run: |
echo "Start exiting job - Check"
python3 ./exitJob.py --job-id ${{ steps.submit-job.outputs }}
131 changes: 131 additions & 0 deletions checkJobStatus.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# script to submit jobs to AWS Batch, queues and definitions are already existing and set up

Check failure on line 1 in checkJobStatus.py

View workflow job for this annotation

GitHub Actions / Flake8

checkJobStatus.py#L1

Line too long (92 > 79 characters) (E501)

Check failure on line 1 in checkJobStatus.py

View workflow job for this annotation

GitHub Actions / Flake8

checkJobStatus.py#L1

Trailing whitespace (W291)
import argparse
import random

Check failure on line 3 in checkJobStatus.py

View workflow job for this annotation

GitHub Actions / Flake8

checkJobStatus.py#L3

'random' imported but unused (F401)
import re

Check failure on line 4 in checkJobStatus.py

View workflow job for this annotation

GitHub Actions / Flake8

checkJobStatus.py#L4

're' imported but unused (F401)
import sys
import time

Check failure on line 6 in checkJobStatus.py

View workflow job for this annotation

GitHub Actions / Flake8

checkJobStatus.py#L6

'time' imported but unused (F401)
from datetime import datetime

import boto3
from botocore.compat import total_seconds
from botocore.config import Config


job_type_info = {
'CI-CPU': {
'job_definition': 'hello_dgl',
'job_queue': 'hello_dgl',
},
}

parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)

Check failure on line 21 in checkJobStatus.py

View workflow job for this annotation

GitHub Actions / Flake8

checkJobStatus.py#L21

Line too long (88 > 79 characters) (E501)

parser.add_argument('--profile', help='profile name of aws account.', type=str,
default=None)
parser.add_argument('--region', help='Default region when creating new connections', type=str,

Check failure on line 25 in checkJobStatus.py

View workflow job for this annotation

GitHub Actions / Flake8

checkJobStatus.py#L25

Line too long (94 > 79 characters) (E501)
default='us-west-2')
parser.add_argument('--name', help='name of the job', type=str, default='dummy')

Check failure on line 27 in checkJobStatus.py

View workflow job for this annotation

GitHub Actions / Flake8

checkJobStatus.py#L27

Line too long (80 > 79 characters) (E501)
parser.add_argument('--job-type', help='type of job to submit.', type=str,
choices=job_type_info.keys(), default='CI-CPU')
parser.add_argument('--command', help='command to run', type=str,
default='git rev-parse HEAD | tee stdout.log')
parser.add_argument('--wait', help='block wait until the job completes. '
'Non-zero exit code if job fails.', action='store_true')
parser.add_argument('--timeout', help='job timeout in seconds', default=10800, type=int)

Check failure on line 34 in checkJobStatus.py

View workflow job for this annotation

GitHub Actions / Flake8

checkJobStatus.py#L34

Line too long (88 > 79 characters) (E501)

parser.add_argument('--source-ref',
help='ref in hello_DGL main github. e.g. master, refs/pull/500/head',

Check failure on line 37 in checkJobStatus.py

View workflow job for this annotation

GitHub Actions / Flake8

checkJobStatus.py#L37

Line too long (89 > 79 characters) (E501)
type=str, default='main')
parser.add_argument('--remote',
help='git repo address. https://github.com/dglai/hello_dgl.git',

Check failure on line 40 in checkJobStatus.py

View workflow job for this annotation

GitHub Actions / Flake8

checkJobStatus.py#L40

Line too long (84 > 79 characters) (E501)
type=str, default="https://github.com/dglai/hello_dgl.git")
parser.add_argument("--job-id", help="job id", type=str, default=None)
parser.add_argument("--job-name", help="job name", type=str, default=None)

args = parser.parse_args()

print(args)

session = boto3.Session(profile_name=args.profile, region_name=args.region)
config = Config(
retries = dict(

Check failure on line 51 in checkJobStatus.py

View workflow job for this annotation

GitHub Actions / Flake8

checkJobStatus.py#L51

Unexpected spaces around keyword / parameter equals (E251)

Check failure on line 51 in checkJobStatus.py

View workflow job for this annotation

GitHub Actions / Flake8

checkJobStatus.py#L51

Unexpected spaces around keyword / parameter equals (E251)
max_attempts = 5

Check failure on line 52 in checkJobStatus.py

View workflow job for this annotation

GitHub Actions / Flake8

checkJobStatus.py#L52

Unexpected spaces around keyword / parameter equals (E251)

Check failure on line 52 in checkJobStatus.py

View workflow job for this annotation

GitHub Actions / Flake8

checkJobStatus.py#L52

Unexpected spaces around keyword / parameter equals (E251)
)
)

batch, cloudwatch = [session.client(service_name=sn, config=config) for sn in ['batch', 'logs']]

Check failure on line 56 in checkJobStatus.py

View workflow job for this annotation

GitHub Actions / Flake8

checkJobStatus.py#L56

Line too long (96 > 79 characters) (E501)

def printLogs(logGroupName, logStreamName, startTime):

Check failure on line 58 in checkJobStatus.py

View workflow job for this annotation

GitHub Actions / Flake8

checkJobStatus.py#L58

Expected 2 blank lines, found 1 (E302)
kwargs = {'logGroupName': logGroupName,
'logStreamName': logStreamName,
'startTime': startTime,
'startFromHead': True}

lastTimestamp = startTime - 1
while True:
logEvents = cloudwatch.get_log_events(**kwargs)

for event in logEvents['events']:
lastTimestamp = event['timestamp']
timestamp = datetime.utcfromtimestamp(lastTimestamp / 1000.0).isoformat()

Check failure on line 70 in checkJobStatus.py

View workflow job for this annotation

GitHub Actions / Flake8

checkJobStatus.py#L70

Line too long (85 > 79 characters) (E501)
print('[{}] {}'.format((timestamp + '.000')[:23] + 'Z', event['message']))

Check failure on line 71 in checkJobStatus.py

View workflow job for this annotation

GitHub Actions / Flake8

checkJobStatus.py#L71

Line too long (86 > 79 characters) (E501)

nextToken = logEvents['nextForwardToken']
if nextToken and kwargs.get('nextToken') != nextToken:
kwargs['nextToken'] = nextToken
else:
break
return lastTimestamp


def nowInMillis():
endTime = int(total_seconds(datetime.utcnow() - datetime(1970, 1, 1))) * 1000

Check failure on line 82 in checkJobStatus.py

View workflow job for this annotation

GitHub Actions / Flake8

checkJobStatus.py#L82

Line too long (81 > 79 characters) (E501)
return endTime


def main():
spin = ['-', '/', '|', '\\', '-', '/', '|', '\\']
logGroupName = '/aws/batch/job' # This is the group where aws batch logs are stored in Cloudwatch

Check failure on line 88 in checkJobStatus.py

View workflow job for this annotation

GitHub Actions / Flake8

checkJobStatus.py#L88

At least two spaces before inline comment (E261)

Check failure on line 88 in checkJobStatus.py

View workflow job for this annotation

GitHub Actions / Flake8

checkJobStatus.py#L88

Line too long (101 > 79 characters) (E501)

# Printing actions parameters
print("GitHub SourceRef: ", args.source_ref)
print("GitHub Remote: ", args.remote)

jobId = args.job_id
jobName = args.job_name
print(f"Job ID: {jobId}. Job Name: {jobName}")

spinner = 0
running = False
status_set = set()
startTime = 0
logStreamName = None

describeJobsResponse = batch.describe_jobs(jobs=[jobId])
status = describeJobsResponse['jobs'][0]['status']
if status == 'SUCCEEDED' or status == 'FAILED':
if logStreamName:
startTime = printLogs(logGroupName, logStreamName, startTime) + 1
print('=' * 80)
print('Job [{} - {}] {}'.format(jobName, jobId, status))
sys.exit(status == 'FAILED')

elif status == 'RUNNING':
logStreamName = describeJobsResponse['jobs'][0]['container']['logStreamName']

Check failure on line 114 in checkJobStatus.py

View workflow job for this annotation

GitHub Actions / Flake8

checkJobStatus.py#L114

Line too long (85 > 79 characters) (E501)
if not running:
running = True
print('\rJob [{}, {}] is RUNNING.'.format(jobName, jobId))
if logStreamName:
print('Output [{}]:\n {}'.format(logStreamName, '=' * 80))
if logStreamName:
startTime = printLogs(logGroupName, logStreamName, startTime) + 1
elif status not in status_set:
status_set.add(status)
print('\rJob [%s - %s] is %-9s... %s' % (jobName, jobId, status, spin[spinner % len(spin)]),)

Check failure on line 124 in checkJobStatus.py

View workflow job for this annotation

GitHub Actions / Flake8

checkJobStatus.py#L124

Line too long (101 > 79 characters) (E501)
sys.stdout.flush()
spinner += 1
print(f"Job status: {status}")


if __name__ == '__main__':
main()

0 comments on commit e929582

Please sign in to comment.