Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-13085: update libfabric to 1.18.0 #12115

Merged
merged 3 commits into from
Jun 7, 2023
Merged

DAOS-13085: update libfabric to 1.18.0 #12115

merged 3 commits into from
Jun 7, 2023

Conversation

soumagne
Copy link
Collaborator

@soumagne soumagne commented May 9, 2023

Apply also patch for prov/verbs QP error state recovery (DAOS-12991)

Required-githooks: true

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate watchers.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

@soumagne soumagne requested a review from frostedcmos May 9, 2023 20:04
@github-actions
Copy link

github-actions bot commented May 9, 2023

Bug-tracker data:
Errors are component not formatted correctly,Ticket number suffix is not a number. See https://daosio.atlassian.net/wiki/spaces/DC/pages/11133911069/Commit+Comments,Unable to load ticket data
https://daosio.atlassian.net/browse/DAOS-13085:

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

@soumagne soumagne force-pushed the soumagne/ofi_118 branch from e1c1d02 to 034c733 Compare May 9, 2023 21:11
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

frostedcmos
frostedcmos previously approved these changes May 9, 2023
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-12115/5/execution/node/992/log

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-12115/5/testReport/(root)/

@soumagne soumagne marked this pull request as draft May 17, 2023 20:54
@soumagne soumagne force-pushed the soumagne/ofi_118 branch from 3e070de to 14f9fed Compare May 17, 2023 21:07
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-12115/6/execution/node/306/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-12115/6/execution/node/358/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15.4 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-12115/6/execution/node/328/log

@soumagne soumagne force-pushed the soumagne/ofi_118 branch from 14f9fed to f73c28a Compare May 17, 2023 21:19
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-12115/7/execution/node/1034/log

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-12115/7/execution/node/1201/log

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Scan Leap 15.4 RPMs completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-12115/8/execution/node/946/log

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-12115/8/execution/node/1004/log

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Scan Leap 15.4 RPMs completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-12115/9/execution/node/867/log

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-12115/9/execution/node/1012/log

Apply also patch for prov/verbs QP error state recovery (DAOS-12991)

Apply patch for tcp busy spin issue with 1.18.0

Update valgrind suppression files with sendmsg trace

Remove libfabric pinning and allow for 1.18 builds

Required-githooks: true
Allow-unstable-test: true
PR-repos: libfabric@PR-69

Signed-off-by: Jerome Soumagne <jerome.soumagne@intel.com>
@soumagne soumagne force-pushed the soumagne/ofi_118 branch from 8784195 to aea5a25 Compare June 5, 2023 17:41
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-12115/10/execution/node/1089/log

Allow-unstable-test: true
PR-repos: libfabric@PR-69
Required-githooks: true

Signed-off-by: Jerome Soumagne <jerome.soumagne@intel.com>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

branch="${branch%:*}"
fi
fi
local repo_url="${JENKINS_URL}"job/daos-stack/job/"${repo}"/job/"${branch//\//%252F}"/"${build_number}"/artifact/artifacts/$DISTRO_NAME/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(style) line over 100 characters

@daosbuild1
Copy link
Collaborator

@soumagne
Copy link
Collaborator Author

soumagne commented Jun 6, 2023

RPM update in daos-stack/libfabric#69 must be merged first.

jolivier23
jolivier23 previously approved these changes Jun 6, 2023
@@ -171,7 +171,7 @@ Section: net
Architecture: any
Multi-Arch: same
Depends: ${shlibs:Depends}, ${misc:Depends}, openmpi-bin,
ipmctl (>=03.00.00.0468), libfabric (>= 1.15.1-1), libfabric (<< 1.18), spdk-tools (>= 22.01.2)
ipmctl (>=03.00.00.0468), libfabric (>= 1.15.1-1), spdk-tools (>= 22.01.2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

@@ -152,7 +152,7 @@ Package: daos-client
Section: net
Architecture: any
Multi-Arch: same
Depends: ${shlibs:Depends}, ${misc:Depends}, openmpi-bin, libfabric (>= 1.15.1-1), libfabric (<< 1.18)
Depends: ${shlibs:Depends}, ${misc:Depends}, openmpi-bin, libfabric (>= 1.15.1-1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's actually the library you want to pin here:

Suggested change
Depends: ${shlibs:Depends}, ${misc:Depends}, openmpi-bin, libfabric (>= 1.15.1-1)
Depends: ${shlibs:Depends}, ${misc:Depends}, openmpi-bin, libfabric1 (>= 1.15.1-1)

or are there userspace tools that daos-server needs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah CI relies on the fi_info tools for now. In the near future I'm planning to remove the entire dependency on libfabric from DAOS server so I would not read too much into it at this point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it just CI that relies on those tools or does the DAOS product rely on them also?

Copy link
Collaborator Author

@soumagne soumagne Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DAOS testing / ftest that requires fi_info. Aside from that, it's probably better to make sure users have it also in general.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this really should be a Depends: on a daostests package, yes?

Copy link
Collaborator Author

@soumagne soumagne Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, if we wanted to do that, yes. I will make a PR relatively soon that removes the direct dependency on libfabric (and UCX) from DAOS packages (by making the control plane no longer directly query from them) so I would expect a lot more of simplification at that time.

ci/provisioning/post_provision_config_common_functions.sh Outdated Show resolved Hide resolved
This reverts commit 92c2402.

Required-githooks: true

Signed-off-by: Jerome Soumagne <jerome.soumagne@intel.com>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@soumagne soumagne requested a review from brianjmurrell June 6, 2023 18:10
@jolivier23 jolivier23 merged commit f5fa8db into master Jun 7, 2023
@jolivier23 jolivier23 deleted the soumagne/ofi_118 branch June 7, 2023 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants