Skip to content

PMEM RAS

PMEM RAS #637

Workflow file for this run

# Run RAS test: Unsafe Shutdown Local.
#
# This workflow is run on 'self-hosted' runners.
#
# RAS tests require a different approach compared to the standard tests - they need to
# reboot the runner during the test. Normally, rebooting and continuing the job on GHA
# is not possible, due to losing connection with the runner. To work around this issue,
# an additional runner (not connected to the GH) runs the tests instead.
#
# The general idea of the solution is:
# - First platform [self-hosted runner] functions as the controller [ras_controller],
# - Second platform functions as the test runner [ras_runner],
# - The workflow launches its steps on the controller,
# - The controller will then run an ansible playbook on the second platform [ras_runner],
# with options provided by the workflow,
# - The test runner follows the steps given by the controller,
# running the tests in the process and providing results as output,
# - The controller gathers this output and prints it in GHA job.
#
# The only drawback of this idea is that workflow would always finish successfully.
# The solution was added as an additional step, at the end of the workflow, parsing the output.
#
# More detailed information about the ansible playbook and tests themselves can be found in:
# utils/gha-runners/run-ras-linux.yml
name: PMEM RAS
on:
workflow_dispatch:
schedule:
# run this job every 8 hours
- cron: '0 */8 * * *'
jobs:
linux:
name: PMEM_RAS
if: github.repository == 'pmem/pmdk'
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [[self-hosted, ras_controller]]
env:
WORKDIR: utils/gha-runners
steps:
- name: Clone the git repo
uses: actions/checkout@v3
# Variables, such as $ras_runner are hidden on the controller platform as environmental variables.
# 'sed' command is used to filter out IP addresses from the ansible output, it will show up as the 'ras_runner' instead.
# 'tee' command is used to save the overall output to the file. This file is needed for the next step.
- name: Prepare and run RAS Linux tests via ansible-playbook
run: |
cd $WORKDIR
ansible-playbook -i $ras_runner, run-ras-linux.yml -e "host=all ansible_user=$ras_user" | sed 's/[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}/ras_runner/' | tee playbook_output.txt
# This simple step will look through the output in search of specific fail strings.
# If any phrase is found in the file, the workflow will fail.
- name: Fail the workflow if the playbook finished with a failure
run: |
cd $WORKDIR
if grep -E 'fatal: \[ras_runner\]: FAILED!|failed: \[ras_runner\]' "playbook_output.txt"; then
exit 1
else
exit 0
fi