Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-14669 test: switch tcp;ofi_rxm testing to tcp #13365

Merged
merged 2 commits into from
Mar 21, 2024
Merged

Conversation

soumagne
Copy link
Collaborator

Test-provider: ofi+tcp
Test-provider-hw-medium: ofi+tcp
Test-provider-hw-large: ofi+tcp
Test-tag: full_regression
Required-githooks: true

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate watchers.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Copy link

github-actions bot commented Nov 20, 2023

Bug-tracker data:
Ticket title is 'Enable regular testing of tcp provider (without rxm)'
Status is 'Open'
Labels: 'triaged'
https://daosio.atlassian.net/browse/DAOS-14669

@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13365/1/testReport/

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13365/1/execution/node/1162/log

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13365/2/execution/node/1156/log

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13365/3/execution/node/1156/log

@soumagne soumagne force-pushed the soumagne/tcp_test branch from 206dd43 to a252d15 Compare March 8, 2024 23:17
Copy link

github-actions bot commented Mar 8, 2024

Ticket title is 'Enable regular testing of tcp provider (without rxm)'
Status is 'Open'
Labels: 'triaged'
https://daosio.atlassian.net/browse/DAOS-14669

Test-provider: ofi+tcp
Test-provider-hw-medium: ofi+tcp
Test-provider-hw-large: ofi+tcp
Required-githooks: true
Allow-unstable-test: true

Signed-off-by: Jerome Soumagne <jerome.soumagne@intel.com>
@soumagne soumagne force-pushed the soumagne/tcp_test branch from a252d15 to 2e93024 Compare March 8, 2024 23:23
@soumagne soumagne marked this pull request as ready for review March 14, 2024 17:54
@soumagne soumagne requested review from a team as code owners March 14, 2024 17:54
Copy link
Contributor

@daltonbohning daltonbohning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to run these two modified tests

Features: ConfigGenerateOutput ConfigGenerateRun

FYI I'm close to having that automated

src/tests/ftest/util/network_utils.py Outdated Show resolved Hide resolved
Copy link
Contributor

@frostedcmos frostedcmos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor stuff inline

src/tests/ftest/cart/no_pmix_group_test.c Show resolved Hide resolved
src/tests/ftest/cart/utest/utest_portnumber.c Show resolved Hide resolved
src/tests/ftest/cart/utest/utest_portnumber.c Show resolved Hide resolved
src/tests/ftest/util/dmg_utils.py Show resolved Hide resolved
@@ -107,7 +107,7 @@ def __init__(self, filename, common_yaml):
# is set for the running process. If group look up fails or user
# is not member, use uid return from user lookup.
#
default_provider = os.environ.get("CRT_PHY_ADDR_STR", "ofi+tcp;ofi_rxm")
default_provider = os.environ.get("CRT_PHY_ADDR_STR", "ofi+tcp")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not part of your change, but this should be also reading out D_PROVIDER .. ugh..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah agree but nothing has been converted yet (even if you make it sound like it should have been? a grep of CRT_PHY_ADDR_STR shows me tons of files still have it) so I won't be addressing this in this PR. That should be done properly in a separate PR.

Test-provider: ofi+tcp
Test-provider-hw-medium: ofi+tcp
Test-provider-hw-large: ofi+tcp
Features: ConfigGenerateOutput ConfigGenerateRun
Required-githooks: true

Signed-off-by: Jerome Soumagne <jerome.soumagne@intel.com>
@soumagne soumagne requested a review from a team March 21, 2024 16:23
@mchaarawi
Copy link
Contributor

some of the hw medium tests did not run with tcp provider and ran with verbs:
https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-13365/8/artifact/Functional%20Hardware%20Medium%20Verbs%20Provider/daos_test/suite.py/job.log
@daltonbohning @phender is there a tcp version of daos_test suite that runs in weekly?

@daltonbohning
Copy link
Contributor

some of the hw medium tests did not run with tcp provider and ran with verbs: https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-13365/8/artifact/Functional%20Hardware%20Medium%20Verbs%20Provider/daos_test/suite.py/job.log @daltonbohning @phender is there a tcp version of daos_test suite that runs in weekly?

Ah, right. That's because the "Verbs Provider" stage hardcodes verbs.
There is a branch that runs pr and daily with tcp:
https://build.hpdd.intel.com/blue/organizations/jenkins/daos-stack%2Fdaos/detail/provider-testing-tcp/103/pipeline/52

I think for this PR we could push a commit that temporarily updates this:

daos/Jenkinsfile

Line 1179 in 1349dbf

provider: 'ofi+verbs;ofi_rxm',

Or maybe just push a separate PR since this one already has a clean run. Thoughts @phender?

@mchaarawi
Copy link
Contributor

some of the hw medium tests did not run with tcp provider and ran with verbs: https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-13365/8/artifact/Functional%20Hardware%20Medium%20Verbs%20Provider/daos_test/suite.py/job.log @daltonbohning @phender is there a tcp version of daos_test suite that runs in weekly?

Ah, right. That's because the "Verbs Provider" stage hardcodes verbs. There is a branch that runs pr and daily with tcp: https://build.hpdd.intel.com/blue/organizations/jenkins/daos-stack%2Fdaos/detail/provider-testing-tcp/103/pipeline/52

I think for this PR we could push a commit that temporarily updates this:

daos/Jenkinsfile

Line 1179 in 1349dbf

provider: 'ofi+verbs;ofi_rxm',

Or maybe just push a separate PR since this one already has a clean run. Thoughts @phender?

yea we don't want to run
Functional%20Hardware%20Medium%20Verbs%20Provider test stage.
It sounds too complicated of a process to run a test stage with a different provider. I thought we have a test stage that we can manually add with a commit pragma that would run
FunctionalHardwareMediumTCProvider ?

@phender
Copy link
Contributor

phender commented Mar 21, 2024

some of the hw medium tests did not run with tcp provider and ran with verbs: https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-13365/8/artifact/Functional%20Hardware%20Medium%20Verbs%20Provider/daos_test/suite.py/job.log @daltonbohning @phender is there a tcp version of daos_test suite that runs in weekly?

The Functional Hardware Medium Verbs Provider stage always runs with verbs via https://github.com/daos-stack/daos/blob/master/Jenkinsfile#L1179. It is not overridable, nor would we want it to because then the stage name wouldn't make sense. This stage only runs the daos_test/suite.py test and that test is currently run with tcp in the provider-testing-tcp branch. This branch makes use of the https://github.com/daos-stack/daos/blob/master/src/tests/ftest/util/network_utils.py#L25 alias to currently run with ofi+tcp;ofi_rxm (it uses --provider=ofi+tcp) - which is handled in this PR.

@phender
Copy link
Contributor

phender commented Mar 21, 2024

some of the hw medium tests did not run with tcp provider and ran with verbs: https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-13365/8/artifact/Functional%20Hardware%20Medium%20Verbs%20Provider/daos_test/suite.py/job.log @daltonbohning @phender is there a tcp version of daos_test suite that runs in weekly?

Ah, right. That's because the "Verbs Provider" stage hardcodes verbs. There is a branch that runs pr and daily with tcp: https://build.hpdd.intel.com/blue/organizations/jenkins/daos-stack%2Fdaos/detail/provider-testing-tcp/103/pipeline/52
I think for this PR we could push a commit that temporarily updates this:

daos/Jenkinsfile

Line 1179 in 1349dbf

provider: 'ofi+verbs;ofi_rxm',

Or maybe just push a separate PR since this one already has a clean run. Thoughts @phender?

I thought we have a test stage that we can manually add with a commit pragma that would run FunctionalHardwareMediumTCProvider ?

That stage is only defined in the https://build.hpdd.intel.com/job/daos-stack/job/daos/job/provider-testing-tcp branch. In order to run that stage with these code changes we need the RPMs built from this PR available in artifactory so we can specify them in the CI_RPM_TEST_VERSION build parameter.

Alternatively, for testing purposes we could add a Functional Hardware Medium TCP Provider stage to the master Jenkinsfile, e.g.

                        'Functional Hardware Medium TCP Provider': getFunctionalTestStage(
                            name: 'Functional Hardware Medium TCP Provider',
                            pragma_suffix: '-hw-medium-tcp-provider',
                            label: params.FUNCTIONAL_HARDWARE_MEDIUM_TCP_PROVIDER_LABEL,
                            next_version: next_version,
                            stage_tags: 'hw,medium,provider',
                            default_tags: startedByTimer() ? 'pr daily_regression' : 'pr',
                            default_nvme: 'auto',
                            provider: cachedCommitPragma('Test-provider-tcp', 'ofi+tcp'),
                            run_if_pr: false,
                            run_if_landing: false,
                            job_status: job_status_internal
                        ),

With this definition it would only run if the commit message contained Skip-func-hw-test-medium-tcp-provider: false

@mchaarawi
Copy link
Contributor

That stage is only defined in the https://build.hpdd.intel.com/job/daos-stack/job/daos/job/provider-testing-tcp branch. In order to run that stage with these code changes we need the RPMs built from this PR available in artifactory so we can specify them in the CI_RPM_TEST_VERSION build parameter.

that sounds pretty complicated TBH. We should have a simple option to just add a pragma to run any tests with any provider (which sounds like we do for one medium stage, but not the other which runs daos_test). anyway let's land this PR for now, even though it's missing results for daos_test with the tcp provider.

@mchaarawi mchaarawi merged commit e2083fb into master Mar 21, 2024
48 of 50 checks passed
@mchaarawi mchaarawi deleted the soumagne/tcp_test branch March 21, 2024 19:04
jolivier23 pushed a commit that referenced this pull request Apr 16, 2024
Signed-off-by: Jerome Soumagne <jerome.soumagne@intel.com>
jolivier23 pushed a commit that referenced this pull request Apr 16, 2024
Signed-off-by: Jerome Soumagne <jerome.soumagne@intel.com>
Signed-off-by: Jeff Olivier <jeffolivier@google.com>
jolivier23 added a commit that referenced this pull request Apr 17, 2024
disable CODEOWNERS for google branch
disable upstream hardware tests on branch by default
remove bad merge block
fix ordering of imports
Rename google-changeId.py
set option for dynamic fuse

Backports included here for test fixes
DAOS-15429 test: Fix Go unit tests (#13981)
DAOS-13490 test: Update valgrind suppressions. (#13142)
DAOS-15159 test: add a supression for new valgrind warning in NLT (#13782)
DAOS-14669 test: switch tcp;ofi_rxm testing to tcp (#13365)
DAOS-15548 test: add new valgrind suppression for daos tool (#14081)

Signed-off-by: Jeff Olivier <jeffolivier@google.com>
Signed-off-by: Michael MacDonald <mjmac@google.com>
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Signed-off-by: Mohamad Chaarawi <mohamad.chaarawi@intel.com>
Signed-off-by: Jerome Soumagne <jerome.soumagne@intel.com>
jolivier23 added a commit that referenced this pull request May 21, 2024
disable CODEOWNERS for google branch
disable upstream hardware tests on branch by default
remove bad merge block
fix ordering of imports
Rename google-changeId.py
set option for dynamic fuse

Backports included here for test fixes
DAOS-15429 test: Fix Go unit tests (#13981)
DAOS-13490 test: Update valgrind suppressions. (#13142)
DAOS-15159 test: add a supression for new valgrind warning in NLT (#13782)
DAOS-14669 test: switch tcp;ofi_rxm testing to tcp (#13365)
DAOS-15548 test: add new valgrind suppression for daos tool (#14081)

Required-githooks: true

Change-Id: Ifc50889fd7aada1ae1666ed928b7edc9293da5b7
Signed-off-by: Jeff Olivier <jeffolivier@google.com>
Signed-off-by: Michael MacDonald <mjmac@google.com>
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Signed-off-by: Mohamad Chaarawi <mohamad.chaarawi@intel.com>
Signed-off-by: Jerome Soumagne <jerome.soumagne@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants