Skip to content

Commit

Permalink
Fix mobile job rate limit failures (#5770)
Browse files Browse the repository at this point in the history
It looks like AWS imposes a rate limit somewhere on the number of
requests we can submit to them. So, jobs are failing flakily from time
to time, i.e.
https://github.com/pytorch/executorch/actions/runs/11352715938/attempts/1.

Also, iOS job seems to suffer more, so maybe AWS has different rate
limit for different devices?
https://github.com/pytorch/executorch/actions/runs/11357190872/job/31590285863

Let's just slow down a bit here, and also support retry.
  • Loading branch information
huydhn authored Oct 16, 2024
1 parent a43a148 commit 2b66760
Showing 1 changed file with 62 additions and 32 deletions.
94 changes: 62 additions & 32 deletions .github/workflows/mobile_job.yml
Original file line number Diff line number Diff line change
Expand Up @@ -270,11 +270,25 @@ jobs:
working-directory: test-infra
github-token: ${{ secrets.GITHUB_TOKEN }}

- name: Slow down the incoming requests to mitigate AWS rate limit
id: randomize-retry
shell: bash
continue-on-error: true
env:
MAX_WAIT_TIME_IN_SECONDS: 120
run: |
set -ex
# NB: AWS imposes a rate limit somewhere on the number of requests
# we can submit to them. Let's just slow down a bit here
WAIT_TIME_IN_SECONDS=$((RANDOM % MAX_WAIT_TIME_IN_SECONDS))
echo "WAIT_TIME_IN_SECONDS=${WAIT_TIME_IN_SECONDS}" >> "${GITHUB_ENV}"
sleep "${WAIT_TIME_IN_SECONDS}"
- name: Run iOS tests on devices
id: ios-test
if: ${{ inputs.device-type == 'ios' }}
shell: bash
working-directory: test-infra/tools/device-farm-runner
env:
PROJECT_ARN: ${{ inputs.project-arn }}
DEVICE_POOL_ARN: ${{ inputs.device-pool-arn }}
Expand All @@ -288,20 +302,29 @@ jobs:
RUN_ID: ${{ github.run_id }}
RUN_ATTEMPT: ${{ github.run_attempt }}
JOB_ID: ${{ steps.get-job-id.outputs.job-id }}
run: |
set -ex
${CONDA_RUN} python run_on_aws_devicefarm.py \
--project-arn "${PROJECT_ARN}" \
--device-pool-arn "${DEVICE_POOL_ARN}" \
--app "${IPA_ARCHIVE}" \
--ios-xctestrun "${XCTESTRUN_ZIP}" \
--extra-data "${EXTRA_DATA}" \
--test-spec "${TEST_SPEC}" \
--name-prefix "${JOB_NAME}-${DEVICE_TYPE}" \
--workflow-id "${RUN_ID}" \
--workflow-attempt "${RUN_ATTEMPT}" \
--output "ios-artifacts-${JOB_ID}.json"
WORKING_DIRECTORY: test-infra/tools/device-farm-runner
uses: nick-fields/retry@v3.0.0
with:
shell: bash
timeout_minutes: ${{ inputs.timeout }}
max_attempts: 3
retry_wait_seconds: ${{ env.WAIT_TIME_IN_SECONDS || 120 }}
command: |
set -ex
pushd "${WORKING_DIRECTORY}"
${CONDA_RUN} python run_on_aws_devicefarm.py \
--project-arn "${PROJECT_ARN}" \
--device-pool-arn "${DEVICE_POOL_ARN}" \
--app "${IPA_ARCHIVE}" \
--ios-xctestrun "${XCTESTRUN_ZIP}" \
--extra-data "${EXTRA_DATA}" \
--test-spec "${TEST_SPEC}" \
--name-prefix "${JOB_NAME}-${DEVICE_TYPE}" \
--workflow-id "${RUN_ID}" \
--workflow-attempt "${RUN_ATTEMPT}" \
--output "ios-artifacts-${JOB_ID}.json"
popd
- name: Upload iOS artifacts to S3
uses: seemethere/upload-artifact-s3@v5
Expand All @@ -317,8 +340,6 @@ jobs:
- name: Run Android tests on devices
id: android-test
if: ${{ inputs.device-type == 'android' }}
shell: bash
working-directory: test-infra/tools/device-farm-runner
env:
PROJECT_ARN: ${{ inputs.project-arn }}
DEVICE_POOL_ARN: ${{ inputs.device-pool-arn }}
Expand All @@ -332,20 +353,29 @@ jobs:
RUN_ID: ${{ github.run_id }}
RUN_ATTEMPT: ${{ github.run_attempt }}
JOB_ID: ${{ steps.get-job-id.outputs.job-id }}
run: |
set -ex
${CONDA_RUN} python run_on_aws_devicefarm.py \
--project-arn "${PROJECT_ARN}" \
--device-pool-arn "${DEVICE_POOL_ARN}" \
--app "${APP_ARCHIVE}" \
--android-instrumentation-test "${TEST_ARCHIVE}" \
--extra-data "${EXTRA_DATA}" \
--test-spec "${TEST_SPEC}" \
--name-prefix "${JOB_NAME}-${DEVICE_TYPE}" \
--workflow-id "${RUN_ID}" \
--workflow-attempt "${RUN_ATTEMPT}" \
--output "android-artifacts-${JOB_ID}.json"
WORKING_DIRECTORY: test-infra/tools/device-farm-runner
uses: nick-fields/retry@v3.0.0
with:
shell: bash
timeout_minutes: ${{ inputs.timeout }}
max_attempts: 3
retry_wait_seconds: ${{ env.WAIT_TIME_IN_SECONDS || 120 }}
command: |
set -ex
pushd "${WORKING_DIRECTORY}"
${CONDA_RUN} python run_on_aws_devicefarm.py \
--project-arn "${PROJECT_ARN}" \
--device-pool-arn "${DEVICE_POOL_ARN}" \
--app "${APP_ARCHIVE}" \
--android-instrumentation-test "${TEST_ARCHIVE}" \
--extra-data "${EXTRA_DATA}" \
--test-spec "${TEST_SPEC}" \
--name-prefix "${JOB_NAME}-${DEVICE_TYPE}" \
--workflow-id "${RUN_ID}" \
--workflow-attempt "${RUN_ATTEMPT}" \
--output "android-artifacts-${JOB_ID}.json"
popd
- name: Upload Android artifacts to S3
uses: seemethere/upload-artifact-s3@v5
Expand Down

0 comments on commit 2b66760

Please sign in to comment.