PMEM RAS #652
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Run RAS test: Unsafe Shutdown Local. | |
# | |
# This workflow is run on 'self-hosted' runners. | |
# | |
# RAS tests require a different approach compared to the standard tests - they need to | |
# reboot the runner during the test. Normally, rebooting and continuing the job on GHA | |
# is not possible, due to losing connection with the runner. To work around this issue, | |
# an additional runner (not connected to the GH) runs the tests instead. | |
# | |
# The general idea of the solution is: | |
# - First platform [self-hosted runner] functions as the controller [ras_controller], | |
# - Second platform functions as the test runner [ras_runner], | |
# - The workflow launches its steps on the controller, | |
# - The controller will then run an ansible playbook on the second platform [ras_runner], | |
# with options provided by the workflow, | |
# - The test runner follows the steps given by the controller, | |
# running the tests in the process and providing results as output, | |
# - The controller gathers this output and prints it in GHA job. | |
# | |
# The only drawback of this idea is that workflow would always finish successfully. | |
# The solution was added as an additional step, at the end of the workflow, parsing the output. | |
# | |
# More detailed information about the ansible playbook and tests themselves can be found in: | |
# utils/gha-runners/run-ras-linux.yml | |
name: PMEM RAS | |
on: | |
workflow_dispatch: | |
schedule: | |
# run this job every 8 hours | |
- cron: '0 */8 * * *' | |
jobs: | |
linux: | |
name: PMEM_RAS | |
if: github.repository == 'pmem/pmdk' | |
runs-on: ${{ matrix.os }} | |
strategy: | |
fail-fast: false | |
matrix: | |
os: [[self-hosted, ras_controller]] | |
env: | |
WORKDIR: utils/gha-runners | |
steps: | |
- name: Clone the git repo | |
uses: actions/checkout@v3 | |
# Variables, such as $ras_runner are hidden on the controller platform as environmental variables. | |
# 'sed' command is used to filter out IP addresses from the ansible output, it will show up as the 'ras_runner' instead. | |
# 'tee' command is used to save the overall output to the file. This file is needed for the next step. | |
- name: Prepare and run RAS Linux tests via ansible-playbook | |
run: | | |
cd $WORKDIR | |
ansible-playbook -i $ras_runner, run-ras-linux.yml -e "host=all ansible_user=$ras_user" | sed 's/[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}/ras_runner/' | tee playbook_output.txt | |
# This simple step will look through the output in search of specific fail strings. | |
# If any phrase is found in the file, the workflow will fail. | |
- name: Fail the workflow if the playbook finished with a failure | |
run: | | |
cd $WORKDIR | |
if grep -E 'fatal: \[ras_runner\]: FAILED!|failed: \[ras_runner\]' "playbook_output.txt"; then | |
exit 1 | |
else | |
exit 0 | |
fi |