Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-14408 pmdk: build with NDCTL 63.1 to enable full RAS support in PMDK. #32

Draft
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

grom72
Copy link
Contributor

@grom72 grom72 commented Sep 18, 2023

PMDK must be built with NDCTL enabled to support PMem hardware error detection.

@grom72 grom72 requested a review from a team as a code owner September 18, 2023 19:37
@grom72 grom72 changed the title DAOS-14408 pmsk: build with NDCTL 63.1 to enable full RAS support in PMDK. DAOS-14408 pmdk: build with NDCTL 63.1 to enable full RAS support in PMDK. Sep 18, 2023
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
pmdk.spec Outdated Show resolved Hide resolved
@brianjmurrell
Copy link
Contributor

Just a head's up on the release/2.2 testing on CentOS 7… One of the tests produced a core file but the stack trace from it is going to be missing due to:

2023/09/19 09:06:41 DEBUG                      run_local: Running on wolf-102vm1: sudo dnf -y --enablerepo=*debug* install gdb python3-debuginfo systemd-debuginfo-219-78.el7_9.7 ndctl-debuginfo-65-6.el7_9 mercury-debuginfo-2.3.1~rc1-1.el7 hdf5-debuginfo-1.13.1-1.el7 argobots-debuginfo-1.1-3.el7 libfabric-debuginfo-1.17.1-1.el7 hdf5-vol-daos-mpich-debuginfo-1.1.0~rc4-1.el7 hdf5-vol-daos-mpich-tests-debuginfo-1.1.0~rc4-1.el7 ior-debuginfo-3.3.0-20.gd3574d5.el7
2023/09/19 09:06:45 DEBUG                      run_local:   wolf-102vm1 (rc=1):
2023/09/19 09:06:45 DEBUG                      run_local:     Last metadata expiration check: 0:00:21 ago on Tue 19 Sep 2023 09:06:23 AM UTC.
2023/09/19 09:06:45 DEBUG                      run_local:     Package gdb-7.6.1-120.el7.x86_64 is already installed.
2023/09/19 09:06:45 DEBUG                      run_local:     Package python3-debuginfo-3.6.8-19.el7_9.x86_64 is already installed.
2023/09/19 09:06:45 DEBUG                      run_local:     No match for argument: hdf5-vol-daos-mpich-debuginfo-1.1.0~rc4-1.el7
2023/09/19 09:06:45 DEBUG                      run_local:     No match for argument: hdf5-vol-daos-mpich-tests-debuginfo-1.1.0~rc4-1.el7
2023/09/19 09:06:45 DEBUG                      run_local:     Error: Unable to find a match: hdf5-vol-daos-mpich-debuginfo-1.1.0~rc4-1.el7 hdf5-vol-daos-mpich-tests-debuginfo-1.1.0~rc4-1.el7

which as you can see was a failure to install a number of debuginfo packages including that for ndctl. When the above step fails, no stack trace is generated. This is due to a bug in the handling of debuginfo package install for CentOS 7 which has never been deemed important enough to fix.

I'm honestly not sure what your path forward there is other than to either fix that bug yourself or talk to your manager about the importance of having it fixed for your investigation of why the core file was generated.

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Copy link
Contributor

@brianjmurrell brianjmurrell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this is still failing tests even on master (but many more tests even on release/2.4)?

Upgrade to version 2.0.1.
This is the version that allows enabling NDCTL without
risk of stack over-usage in argobots ULT.

Obsolete libraries lbpmemblk and libpmemlog has been removed.

Exclude examples and benchmarks build via env variables
(BUILD_EXAMPLES=n BUILD_BENCHMARKS=n) instead of patch.

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
@grom72 grom72 requested a review from osalyk December 6, 2023 21:57
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
This reverts commit 55433ad.
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
@grom72 grom72 marked this pull request as draft December 14, 2023 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants