-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-16217 test: Update run_local(). #14748
Conversation
Update the current run_local() command to return an object similar to run_remote() to allow them to be used interchangeably. Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
Ticket title is 'ftest: Update run_local() to return same object type as run_remote()' |
Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
src/tests/ftest/util/run_utils.py
Outdated
source_keys = NodeSet.fromlist(hosts) | ||
data_keys = NodeSet() | ||
for _, keys in data: | ||
data_keys.add(NodeSet.fromlist(keys)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI - I recently discovered this to flatten lists. (might need some type adjustments here, but not asking you to change anyway. just FYI)
>>> list_of_lists = [['a','b'], ['c','d']]
>>> default_sum = []
>>> sum(list_of_lists, default_sum)
['a', 'b', 'c', 'd']
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14748/2/execution/node/798/log |
Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14748/3/execution/node/798/log |
- Add collection of command out for timed out run_remote() commands Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14748/6/execution/node/799/log |
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14748/6/execution/node/1020/log |
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14748/6/execution/node/1159/log |
Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14748/7/execution/node/799/log |
Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14748/8/execution/node/823/log |
Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14748/9/execution/node/799/log |
Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14748/10/testReport/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
linting is failing
if not result.passed: | ||
raise RunException(f"Error running {command}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this fails, name, version, release, epoch = result.joined_stdout.split()
would probably be wrong and this function would still return package_info
with weird data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, code updated.
Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14748/13/execution/node/795/log |
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14748/14/execution/node/1158/log |
return run_local(logger, path, check=True, verbose=False).returncode == 0 | ||
except RunException: | ||
return False | ||
return run_local(logger, path, verbose=False).passed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do wonder if the new definition of run_local
is slower than the older, which might explain why the mu_perms test times out now.
In this daily testing run, the same command took 1min 28s, with a 2 min timeout
https://build.hpdd.intel.com/job/daos-stack/job/daos/job/daily-testing/214/artifact/Functional%20on%20EL%208/dfuse/mu_perms.py/job.log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, maybe this should just use
try:
return subprocess.run(path, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True).returncode == 0
except subprocess.CalledProcessError:
return False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be VM slowness like https://daosio.atlassian.net/browse/DAOS-15402 where in:
2024-07-18 20:04:36,287 test L0530 INFO | START 1-./dfuse/mu_perms.py:DfuseMUPerms.test_dfuse_mu_perms;run-container-dfuse-dfuse_with_caching-hosts-pool-server_config-engines-0-storage-0-test_dfuse_mu_perms_cache-verify_perms-5a8a
2024-07-18 20:04:37,835 test L0473 INFO | ==> Step 1: setUp(): Starting servers [elapsed since last step: 1.55s]
2024-07-18 20:05:40,815 test L0473 INFO | ==> Step 2: setUp(): Starting agents [elapsed since last step: 62.98s]
2024-07-18 20:06:01,165 test L0473 INFO | ==> Step 3: setUp(): Setup complete [elapsed since last step: 20.35s]
2024-07-18 20:06:12,590 test L0473 INFO | ==> Step 4: Verifying simple file permissions on /tmp/daos_dfuse_test_dfuse_mu_perms_1/test_file [elapsed since last step: 11.43s]
2024-07-18 20:06:39,888 test L0473 INFO | ==> Step 5: Verifying simple dir permissions on /tmp/daos_dfuse_test_dfuse_mu_perms_1/test_dir [elapsed since last step: 27.30s]
2024-07-18 20:07:07,161 test L0473 INFO | ==> Step 6: Verifying real file permissions on /tmp/daos_dfuse_test_dfuse_mu_perms_1/test_file [elapsed since last step: 27.27s]
2024-07-18 20:09:01,790 test L0473 INFO | ==> Step 7: Verifying real dir permissions on /tmp/daos_dfuse_test_dfuse_mu_perms_1/test_dir [elapsed since last step: 114.63s]
2024-07-18 20:09:40,306 test L0473 INFO | ==> Step 8: Creating directory: %s [elapsed since last step: 38.52s]
2024-07-18 20:09:40,704 test L0473 INFO | ==> Step 9: Giving ownership to daos_test_user_x2 [elapsed since last step: 0.40s]
2024-07-18 20:09:41,456 test L0473 INFO | ==> Step 10: Verifying real file permissions on /tmp/daos_dfuse_test_dfuse_mu_perms_1/dir1/test_file [elapsed since last step: 0.75s]
2024-07-18 20:11:41,625 test L0473 INFO | ==> Step 11: tearDown(): Called after test completion (test timeout: 570s, elapsed: 425.34s, remaining: 144.66s) [elapsed since last step: 120.17s]
2024-07-18 20:11:55,161 test L0933 ERROR| ERROR 1-./dfuse/mu_perms.py:DfuseMUPerms.test_dfuse_mu_perms;run-container-dfuse-dfuse_with_caching-hosts-pool-server_config-engines-0-storage-0-test_dfuse_mu_perms_cache-verify_perms-5a8a -> CommandFailure: verify_perms.py failed on: wolf-102vm3
We see that the starting the servers took 62.98s - longer than the norm. Maybe we should run the command with a larger or no timeout?
The HW Large failures in https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-14748/14/testReport/ can be attributed to a HW issue with wolf-222:
and also https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-14748/14/artifact/Functional%20Hardware%20Large/daos_test/dfs.py which failed to start the engines on wolf-222. Issues with the cluster containing wolf-222 were also seen in https://daosio.atlassian.net/browse/DAOS-16280. |
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14748/15/execution/node/810/log |
Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14748/17/testReport/ |
The only failure is |
Update the current run_local() command to return an object similar to run_remote() to allow them to be used interchangeably. increase verify_perms.py timeout. Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
Update the current run_local() command to return an object similar to run_remote() to allow them to be used interchangeably. increase verify_perms.py timeout. Also include #14848 Use subprocess.run instead of run_local so test output is printed while running. Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>
Update the current run_local() command to return an object similar to run_remote() to allow them to be used interchangeably. increase verify_perms.py timeout. Also include #14848 Use subprocess.run instead of run_local so test output is printed while running. Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>
Update the current run_local() command to return an object similar to run_remote() to allow them to be used interchangeably. increase verify_perms.py timeout. Also include #14848 Use subprocess.run instead of run_local so test output is printed while running. Also include #14870 Handle any avocado run raised exception Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>
Update the current run_local() command to return an object similar to run_remote() to allow them to be used interchangeably. increase verify_perms.py timeout. Also include #14848 Use subprocess.run instead of run_local so test output is printed while running. Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>
Update the current run_local() command to return an object similar to run_remote() to allow them to be used interchangeably. increase verify_perms.py timeout. Also include #14882 Use subprocess.run() for run_local() Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>
Update the current run_local() command to return an object similar to run_remote() to allow them to be used interchangeably. increase verify_perms.py timeout. Also include #14882 Use subprocess.run() for run_local() Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>
Update the current run_local() command to return an object similar to run_remote() to allow them to be used interchangeably. increase verify_perms.py timeout. Also include #14882 Use subprocess.run() for run_local() Signed-off-by: Dalton Bohning <dalton.bohning@intel.com> Co-authored-by: Phil Henderson <phillip.henderson@intel.com>
Update the current run_local() command to return an object similar to run_remote() to allow them to be used interchangeably.
Skip-unit-tests: true
Skip-fault-injection-test: true
Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms
Required-githooks: true
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: