Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-16040 test: Agent failure Aurora support - Use EC object class #14590

Merged
merged 4 commits into from
Jul 8, 2024

Conversation

shimizukko
Copy link
Contributor

@shimizukko shimizukko commented Jun 17, 2024

Use EC_16P2GX instead of SX.
In the test yaml, use base block and child block to support different IOR parameters.

Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-func-hw-test-medium-md-on-ssd: false
Test-tag: test_agent_failure test_agent_failure_isolation

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate owners.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

Use EC_16P2G1 instead of SX.
In the test yaml, use base block and child block to support different
IOR parameters.

Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-func-hw-test-medium-md-on-ssd: false
Test-tag: test_agent_failure test_agent_failure_isolation
Signed-off-by: Makito Kano <makito.kano@intel.com>
Copy link

Ticket title is 'Agent failure test - Support Aurora execution'
Status is 'Open'
https://daosio.atlassian.net/browse/DAOS-16040

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14590/1/testReport/

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14590/1/testReport/

@shimizukko shimizukko requested a review from rpadma2 June 19, 2024 23:56
rpadma2
rpadma2 previously approved these changes Jun 21, 2024
Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-func-hw-test-medium-md-on-ssd: false
Test-tag: test_agent_failure test_agent_failure_isolation
Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-func-hw-test-medium-md-on-ssd: false
Test-tag: test_agent_failure test_agent_failure_isolation
Signed-off-by: Makito Kano <makito.kano@intel.com>
Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-func-hw-test-medium-md-on-ssd: false
Test-tag: test_agent_failure test_agent_failure_isolation
Signed-off-by: Makito Kano <makito.kano@intel.com>
@shimizukko shimizukko marked this pull request as ready for review June 27, 2024 23:47
@shimizukko shimizukko requested review from a team as code owners June 27, 2024 23:47
Comment on lines +53 to +59
ior_wo_rf:
<<: *ior_base
dfs_oclass: SX
ior_with_ec:
<<: *ior_base
dfs_oclass: EC_2P2GX # CI
# dfs_oclass: EC_16P2GX # Aurora
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, due to how avocado parses this syntax, your extra yaml would have to look like this

ior_wo_rf:
  <<:
    dfs_oclass: OVERRIDE
    transfer_size: OVERRIDE
ior_with_ec:
  <<:
    dfs_oclass: OVERRIDE
    transfer_size: OVERRIDE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean. I'm using the default extra yaml and getting the expected IOR command.

/scratchbox/daos/install-latest-ecb/ior/bin/ior -a DFS -b 100G -k -v -w
-W -D 60 -o /test_file_2 -t 10G --dfs.chunk_size 1048576 --dfs.cont TestContainer_1 --dfs.dir_oclass SX --dfs.oclass EC_16P2GX --dfs.pool TestPool_1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean cxi.yaml? How do you tell the test to use EC_16P2GX instead of EC_2P2GX?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just uncomment line 59 (EC_16P2GX) and comment line 58 (EC_2P2GX). I don't have to touch cxi.yaml.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to use an extra yaml like these
https://github.com/daos-stack/aurora-tools/tree/master/pbs_scripts/extra_yaml
Otherwise we have to write documentation on how/what to change and have to update the file every time we run.
And if we're running out of a shared source build, we can't just update the installed config without messing up someone else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I'll create an extra yaml for this test. Thanks.

src/tests/ftest/deployment/agent_failure.yaml Outdated Show resolved Hide resolved
@shimizukko shimizukko requested a review from dinghwah June 29, 2024 01:33
@shimizukko shimizukko requested a review from a team July 4, 2024 07:18
@daltonbohning daltonbohning added the forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. label Jul 8, 2024
@daltonbohning daltonbohning merged commit 4a5ae3f into master Jul 8, 2024
45 of 46 checks passed
@daltonbohning daltonbohning deleted the makito/DAOS-16040 branch July 8, 2024 15:21
grom72 pushed a commit to grom72/daos that referenced this pull request Jul 25, 2024
…aos-stack#14590)

Use EC_16P2G1 instead of SX.
In the test yaml, use base block and child block to support different
IOR parameters.

Use EC_2P2GX for CI

Signed-off-by: Makito Kano <makito.kano@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed.
Development

Successfully merging this pull request may close these issues.

5 participants