DAOS-16217 test: Update run_local(). #14748

phender · 2024-07-11T19:41:02Z

Update the current run_local() command to return an object similar to run_remote() to allow them to be used interchangeably.

Skip-unit-tests: true
Skip-fault-injection-test: true
Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms

Required-githooks: true

Before requesting gatekeeper:

Two review approvals and any prior change requests have been resolved.
Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
Commit messages follows the guidelines outlined here.
Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

Update the current run_local() command to return an object similar to run_remote() to allow them to be used interchangeably. Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>

github-actions · 2024-07-11T19:41:23Z

Ticket title is 'ftest: Update run_local() to return same object type as run_remote()'
Status is 'In Progress'
Labels: 'daos_framework'
https://daosio.atlassian.net/browse/DAOS-16217

Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>

src/tests/ftest/util/run_utils.py

daltonbohning · 2024-07-11T20:25:28Z

src/tests/ftest/util/run_utils.py

+    source_keys = NodeSet.fromlist(hosts)
+    data_keys = NodeSet()
+    for _, keys in data:
+        data_keys.add(NodeSet.fromlist(keys))


Just FYI - I recently discovered this to flatten lists. (might need some type adjustments here, but not asking you to change anyway. just FYI)

>>> list_of_lists = [['a','b'], ['c','d']] >>> default_sum = [] >>> sum(list_of_lists, default_sum) ['a', 'b', 'c', 'd']

src/tests/ftest/util/run_utils.py

src/tests/ftest/verify_perms.py

daosbuild1 · 2024-07-11T22:48:35Z

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14748/2/execution/node/798/log

Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>

daosbuild1 · 2024-07-12T01:50:14Z

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14748/3/execution/node/798/log

- Add collection of command out for timed out run_remote() commands Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>

Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>

daosbuild1 · 2024-07-13T01:31:29Z

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14748/6/execution/node/799/log

daosbuild1 · 2024-07-15T18:16:59Z

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14748/6/execution/node/1020/log

daosbuild1 · 2024-07-15T19:59:59Z

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14748/6/execution/node/1159/log

Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>

daosbuild1 · 2024-07-15T22:30:42Z

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14748/7/execution/node/799/log

Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>

daosbuild1 · 2024-07-16T19:55:10Z

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14748/8/execution/node/823/log

Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>

daosbuild1 · 2024-07-17T00:56:34Z

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14748/9/execution/node/799/log

Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>

daosbuild1 · 2024-07-17T05:15:16Z

Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14748/10/testReport/

daltonbohning

linting is failing

daltonbohning · 2024-07-17T16:06:26Z

src/tests/ftest/process_core_files.py

-            if not result.passed:
-                raise RunException(f"Error running {command}")


If this fails, name, version, release, epoch = result.joined_stdout.split() would probably be wrong and this function would still return package_info with weird data

Thanks, code updated.

Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>

daosbuild1 · 2024-07-17T21:19:52Z

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14748/13/execution/node/795/log

daosbuild1 · 2024-07-20T02:12:32Z

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14748/14/execution/node/1158/log

daltonbohning · 2024-07-22T17:04:56Z

src/tests/ftest/verify_perms.py

-            return run_local(logger, path, check=True, verbose=False).returncode == 0
-        except RunException:
-            return False
+        return run_local(logger, path, verbose=False).passed


I do wonder if the new definition of run_local is slower than the older, which might explain why the mu_perms test times out now.
In this daily testing run, the same command took 1min 28s, with a 2 min timeout
https://build.hpdd.intel.com/job/daos-stack/job/daos/job/daily-testing/214/artifact/Functional%20on%20EL%208/dfuse/mu_perms.py/job.log

That said, maybe this should just use

try: return subprocess.run(path, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True).returncode == 0 except subprocess.CalledProcessError: return False

It could be VM slowness like https://daosio.atlassian.net/browse/DAOS-15402 where in:

2024-07-18 20:04:36,287 test L0530 INFO | START 1-./dfuse/mu_perms.py:DfuseMUPerms.test_dfuse_mu_perms;run-container-dfuse-dfuse_with_caching-hosts-pool-server_config-engines-0-storage-0-test_dfuse_mu_perms_cache-verify_perms-5a8a 2024-07-18 20:04:37,835 test L0473 INFO | ==> Step 1: setUp(): Starting servers [elapsed since last step: 1.55s] 2024-07-18 20:05:40,815 test L0473 INFO | ==> Step 2: setUp(): Starting agents [elapsed since last step: 62.98s] 2024-07-18 20:06:01,165 test L0473 INFO | ==> Step 3: setUp(): Setup complete [elapsed since last step: 20.35s] 2024-07-18 20:06:12,590 test L0473 INFO | ==> Step 4: Verifying simple file permissions on /tmp/daos_dfuse_test_dfuse_mu_perms_1/test_file [elapsed since last step: 11.43s] 2024-07-18 20:06:39,888 test L0473 INFO | ==> Step 5: Verifying simple dir permissions on /tmp/daos_dfuse_test_dfuse_mu_perms_1/test_dir [elapsed since last step: 27.30s] 2024-07-18 20:07:07,161 test L0473 INFO | ==> Step 6: Verifying real file permissions on /tmp/daos_dfuse_test_dfuse_mu_perms_1/test_file [elapsed since last step: 27.27s] 2024-07-18 20:09:01,790 test L0473 INFO | ==> Step 7: Verifying real dir permissions on /tmp/daos_dfuse_test_dfuse_mu_perms_1/test_dir [elapsed since last step: 114.63s] 2024-07-18 20:09:40,306 test L0473 INFO | ==> Step 8: Creating directory: %s [elapsed since last step: 38.52s] 2024-07-18 20:09:40,704 test L0473 INFO | ==> Step 9: Giving ownership to daos_test_user_x2 [elapsed since last step: 0.40s] 2024-07-18 20:09:41,456 test L0473 INFO | ==> Step 10: Verifying real file permissions on /tmp/daos_dfuse_test_dfuse_mu_perms_1/dir1/test_file [elapsed since last step: 0.75s] 2024-07-18 20:11:41,625 test L0473 INFO | ==> Step 11: tearDown(): Called after test completion (test timeout: 570s, elapsed: 425.34s, remaining: 144.66s) [elapsed since last step: 120.17s] 2024-07-18 20:11:55,161 test L0933 ERROR| ERROR 1-./dfuse/mu_perms.py:DfuseMUPerms.test_dfuse_mu_perms;run-container-dfuse-dfuse_with_caching-hosts-pool-server_config-engines-0-storage-0-test_dfuse_mu_perms_cache-verify_perms-5a8a -> CommandFailure: verify_perms.py failed on: wolf-102vm3

We see that the starting the servers took 62.98s - longer than the norm. Maybe we should run the command with a larger or no timeout?

phender · 2024-07-24T21:00:36Z

The HW Large failures in https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-14748/14/testReport/ can be attributed to a HW issue with wolf-222:

2024-07-19 06:43:59,000 process          L0416 DEBUG| [stderr] ERROR: Errors:
2024-07-19 06:43:59,000 process          L0416 DEBUG| [stderr]   Hosts    Error                                                                 
2024-07-19 06:43:59,001 process          L0416 DEBUG| [stderr]   -----    -----                                                                 
2024-07-19 06:43:59,001 process          L0416 DEBUG| [stderr]   wolf-222 storage: code = 304 description = "NVMe SSD [0000:65:00.0] not found" 
2024-07-19 06:43:59,001 process          L0416 DEBUG| [stderr]   wolf-222 storage: code = 304 description = "NVMe SSD [0000:a5:00.0] not found"

and also https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-14748/14/artifact/Functional%20Hardware%20Large/daos_test/dfs.py which failed to start the engines on wolf-222.

Issues with the cluster containing wolf-222 were also seen in https://daosio.atlassian.net/browse/DAOS-16280.

daosbuild1 · 2024-07-25T05:55:49Z

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14748/15/execution/node/810/log

Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>

daosbuild1 · 2024-07-25T21:18:01Z

Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14748/17/testReport/

daltonbohning · 2024-07-30T15:16:44Z

The only failure is test_always_fails, which is expected to fail to verify failure paths
https://build.hpdd.intel.com/blue/organizations/jenkins/daos-stack%2Fdaos/detail/PR-14748/17/tests

Update the current run_local() command to return an object similar to run_remote() to allow them to be used interchangeably. increase verify_perms.py timeout. Signed-off-by: Phil Henderson <phillip.henderson@intel.com>

Update the current run_local() command to return an object similar to run_remote() to allow them to be used interchangeably. increase verify_perms.py timeout. Also include #14848 Use subprocess.run instead of run_local so test output is printed while running. Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>

Update the current run_local() command to return an object similar to run_remote() to allow them to be used interchangeably. increase verify_perms.py timeout. Also include #14848 Use subprocess.run instead of run_local so test output is printed while running. Also include #14870 Handle any avocado run raised exception Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>

Update the current run_local() command to return an object similar to run_remote() to allow them to be used interchangeably. increase verify_perms.py timeout. Also include #14848 Use subprocess.run instead of run_local so test output is printed while running. Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>

Update the current run_local() command to return an object similar to run_remote() to allow them to be used interchangeably. increase verify_perms.py timeout. Also include #14882 Use subprocess.run() for run_local() Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>

Update the current run_local() command to return an object similar to run_remote() to allow them to be used interchangeably. increase verify_perms.py timeout. Also include #14882 Use subprocess.run() for run_local() Signed-off-by: Dalton Bohning <dalton.bohning@intel.com> Co-authored-by: Phil Henderson <phillip.henderson@intel.com>

phender requested review from a team as code owners July 11, 2024 19:41

Fix searching run_local output.

7746e61

Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>

daltonbohning reviewed Jul 11, 2024

View reviewed changes

Applying feddback and fixes.

b58d973

Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>

phender added 3 commits July 12, 2024 16:45

Fix docstring.

e4ce9fe

Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>

phender added 2 commits July 16, 2024 12:05

Merge branch 'master' into pahender/DAOS-16217

4c7045c

Fix joined_* comapre.

6ec9255

Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>

phender requested a review from daltonbohning July 17, 2024 02:41

daltonbohning reviewed Jul 17, 2024

View reviewed changes

phender added 2 commits July 17, 2024 13:15

Apply feedback.

7159532

Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr HarnessUnitTest HarnessCoreFilesTest SoakSmoke DfuseMUPerms test_always_fails Allow-unstable-test: true Required-githooks: true Signed-off-by: Phil Henderson <phillip.henderson@intel.com>

phender requested a review from daltonbohning July 17, 2024 17:59

daltonbohning reviewed Jul 22, 2024

View reviewed changes

phender requested a review from mjean308 July 24, 2024 21:24

mjean308 previously approved these changes Jul 24, 2024

View reviewed changes

phender added 2 commits July 25, 2024 14:45

phender dismissed stale reviews from mjean308 and daltonbohning via 544ff18 July 25, 2024 18:53

daltonbohning approved these changes Jul 29, 2024

View reviewed changes

daltonbohning requested a review from mjean308 July 30, 2024 15:16

mjean308 approved these changes Jul 30, 2024

View reviewed changes

daltonbohning requested a review from a team July 30, 2024 16:13

daltonbohning added the forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. label Jul 30, 2024

daltonbohning merged commit ed96c9d into master Jul 30, 2024
45 of 47 checks passed

daltonbohning deleted the pahender/DAOS-16217 branch July 30, 2024 16:15

cdavis28 mentioned this pull request Aug 12, 2024

Merge upstream/release/2.6 into upstream/google/2.6 #14916

Merged

mjmac mentioned this pull request Nov 13, 2024

mjmac/DAOS 16787 google 2.6 #15498

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DAOS-16217 test: Update run_local(). #14748

DAOS-16217 test: Update run_local(). #14748

phender commented Jul 11, 2024

github-actions bot commented Jul 11, 2024

daltonbohning Jul 11, 2024

daosbuild1 commented Jul 11, 2024

daosbuild1 commented Jul 12, 2024

daosbuild1 commented Jul 13, 2024

daosbuild1 commented Jul 15, 2024

daosbuild1 commented Jul 15, 2024

daosbuild1 commented Jul 15, 2024

daosbuild1 commented Jul 16, 2024

daosbuild1 commented Jul 17, 2024

daosbuild1 commented Jul 17, 2024

daltonbohning left a comment

daltonbohning Jul 17, 2024

phender Jul 17, 2024

daosbuild1 commented Jul 17, 2024

daosbuild1 commented Jul 20, 2024

daltonbohning Jul 22, 2024

daltonbohning Jul 22, 2024

phender Jul 24, 2024

phender commented Jul 24, 2024 •

edited

Loading

daosbuild1 commented Jul 25, 2024

daosbuild1 commented Jul 25, 2024

daltonbohning commented Jul 30, 2024

		if not result.passed:
		raise RunException(f"Error running {command}")

DAOS-16217 test: Update run_local(). #14748

DAOS-16217 test: Update run_local(). #14748

Conversation

phender commented Jul 11, 2024

Before requesting gatekeeper:

Gatekeeper:

github-actions bot commented Jul 11, 2024

daltonbohning Jul 11, 2024

Choose a reason for hiding this comment

daosbuild1 commented Jul 11, 2024

daosbuild1 commented Jul 12, 2024

daosbuild1 commented Jul 13, 2024

daosbuild1 commented Jul 15, 2024

daosbuild1 commented Jul 15, 2024

daosbuild1 commented Jul 15, 2024

daosbuild1 commented Jul 16, 2024

daosbuild1 commented Jul 17, 2024

daosbuild1 commented Jul 17, 2024

daltonbohning left a comment

Choose a reason for hiding this comment

daltonbohning Jul 17, 2024

Choose a reason for hiding this comment

phender Jul 17, 2024

Choose a reason for hiding this comment

daosbuild1 commented Jul 17, 2024

daosbuild1 commented Jul 20, 2024

daltonbohning Jul 22, 2024

Choose a reason for hiding this comment

daltonbohning Jul 22, 2024

Choose a reason for hiding this comment

phender Jul 24, 2024

Choose a reason for hiding this comment

phender commented Jul 24, 2024 • edited Loading

daosbuild1 commented Jul 25, 2024

daosbuild1 commented Jul 25, 2024

daltonbohning commented Jul 30, 2024

phender commented Jul 24, 2024 •

edited

Loading