Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge release/2.6 into google/2.6 #15663

Closed
wants to merge 33 commits into from
Closed

Conversation

juszhan1
Copy link
Collaborator

phender and others added 30 commits November 6, 2024 20:06
Tag second test build for 2.6.2.

faults-enabled: false

Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
2.6.2 release notes document

Signed-off-by: Phil Henderson <phillip.henderson@hpe.com>
- Enable write access to the Security section of Github project

- Use GHA cache to avoid Trivy scan failures due to overuse of CVEs database results in database download failure
Upgrade `trivy-action` to version 0.28.0 where the caching mechanism is enabled by default.
Enable debug option in Trivy to be prepared for detail scan failures analysis

Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
Test deployment/ior_per_rank fails with 'No space' on some CI
clusters. Reduce the requested pool size to accommodate nodes
with smaller storage capacity.

Signed-off-by: James A. Nunez <james.nunez@intel.com>
Split the erasurecode/multiple_failure.py into two separate tests to
reduce the possibility of a large number of ERR messages in the server
log file from preventing other test variants from failing dure to out of
space errors.

Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
#15411)

Loop retrying the check for the pool free space after destroying half of the containers. If the check doesn't pass within 60 seconds, then fail the test.

Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
… (#15540)

Support calling register cleanup methods for tests based upon the Test
and TestWithoutServers classes. Also remove stopping agents as part of
calling TestWithServers.stop_servers() since DAOS-6873 is no longer an
issue.

Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
Do not raise an exception if parsing empty json output.

Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
)

Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>
#15420) (#15457)

The object placement algorithm was changed by DAOS-16445. As a result,
data are written to targets more uniformly while the amount of
leftover data after container destroy/garbage collection in each target
remains the same. i.e., Data are written to more targets while the
cleanup method in each target hasn't been improved, which results in
higher aggregate leftover data.

To handle larger amount of leftover data in SCM, increase the threshold
to 1.5MB.

Signed-off-by: Makito Kano <makito.kano@intel.com>
In special massive failure case -
1. some engines down and triggered rebuild.
2. one engine participated the rebuild, not finished yet, it down again,
   the #failures exceeds pool RF and will not change pool map.
3. That engine restarted by administrator.

In that case should recover the rebuild task on the engine, to simplify it now just
abort and retry the global rebuild task.
No such issue by the typical recover approach that restart the whole
system including the PS leader.

another backport commit -
947c76d DAOS-16175 container: fix a case for cont_iv_hdl_fetch (#15395)

Signed-off-by: Xuezhao Liu <xuezhao.liu@intel.com>
Fix stopping timed out processes run by a JobManager class by only
searching for and killing the command executable being run by clush,
orterun, mpirun, etc. Add a new harness/cmocka.py test to verify the
stopping of the processes with a test timeout.

Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
…) (#15595)

Update soak to support using an internal job scheduler.

Signed-off-by: Maureen Jean <maureen.jean@intel.com>
Co-authored-by: mjean308 <48688872+mjean308@users.noreply.github.com>
Update flake8 to 7.1.1.
Adjust githook to work with newer flake8.
Also tested to be backwards compatible with flake8<6

Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>
Add a section on handling unavailable engines.

Signed-off-by: Li Wei <wei.g.li@intel.com>
clear the sc_ec_agg_active flag more proactively.

Signed-off-by: Xuezhao Liu <xuezhao.liu@intel.com>
- If failed to reply, skip rpc early buffer release

Signed-off-by: Alexander A Oganezov <alexander.a.oganezov@intel.com>
Use -r so if no scons or non-scons files are grep'ed, flake8 does not
run.

Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>
Add the use of reusable workflows and actions to reduce the amount of
duplicated code in this repository as well as dependency repositories.

Run Bullseye workflow on schedule (#15574)
Saturdays at midnight, UTC.

Accept and propagate a run-gha variable (#15576)
For the case where daos is being used as a downstream test.

Test inputs context before trying to use it.

Fixes: SRE-2570 DAOS-16262

Signed-off-by: Brian J. Murrell <brian.murrell@intel.com>
- Set Go minimum version to 1.21 in rpm and debian packaging spec files.
- Update scons Go version check to use version in go.mod.
- Add a reminder in go.mod file so we remember the packaging files when
  bumping the minimum Go version in the future.
- Update Ubuntu 22.04 Dockerfile to get an appropriate version of Go.

Signed-off-by: Kris Jacque <kristin.jacque@hpe.com>
…b26 (#15477)

For collective RPC, when handle failure cases during crt_req_send(),
its reference may has been released via crt_rpc_complete_and_unlock()
that is triggered by crt_corpc_complete(). Under such case, we should
check whether the RPC is completed or not before calling RPC_DECREF()
to avoid releasing the RPC reference repeatedly.

The patch also initializes some local variable for CHK RPC to avoid
accessing invalid DRAM when handle failed collective CHK RPC.

Some enhancement for CR test logic.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Update netty-buffer to 4.1.115

Signed-off-by: Jeff Olivier <jeffolivier@google.com>
Co-authored-by: Jeff Olivier <jeffolivier@google.com>
* Fix compiling issues in gcc 14

Signed-off-by: Jinshan Xiong <jinshanx@google.com>
Co-authored-by: Dalton Bohning <dalton.bohning@intel.com>
Co-authored-by: Jeff Olivier <jeffolivier@google.com>
Update mantic (EOL) to oracular.
Update 22.04 LTS to 24.04 LTS.

Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>
…#15598)

Update some tests to use unique dfuse mount directory by letting the
framework generate one.

Remove mount_dir from run_ior_multiple_variants since it is no longer
needed and this level of fine control should be handled per test
ideally.

Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>
Remove workflows/version-checks.yml now that dependabot checks this.

Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>
)

* Add a suppression for Go runtime function racefuncenter.
* Add suppression for rt0_go CGo malloc

Signed-off-by: Kris Jacque <kris.jacque@intel.com>
At build time any more, as of e01970d.

Signed-off-by: Brian J. Murrell <brian.murrell@intel.com>
verify daos_server_helper on server instead of the runner/client.
misc cleanup

Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>
With tcp provider, using many sockets can cause significant
file descriptor usage.  Bump the soft limit, if possible
and warn if it appears insufficient.
Valgrind sets hard limit to soft limit, so work around that in NLT.

Signed-off-by: Jeff Olivier <jeffolivier@google.com>
daltonbohning and others added 3 commits December 19, 2024 07:29
Add a requirement to protobufc for building daos control binaries.

Signed-off-by: Dalton Bohning <dalton.bohning@hpe.com>
…15642)

merge yamllint and clang-format into linting workflow so all lint checks
are grouped together.

Make yaml-lint required but clang-format optional until stable.

Signed-off-by: Dalton Bohning <dalton.bohning@hpe.com>
…/2.6

Required-githooks: true
Change-Id: I10ffc1413bec864f7e72578a298e71d460258b55
Copy link

Errors are component not formatted correctly,Ticket number prefix incorrect,PR title is malformatted. See https://daosio.atlassian.net/wiki/spaces/DC/pages/11133911069/Commit+Comments,Unable to load ticket data
https://daosio.atlassian.net/browse/Merge

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scorecard found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

@juszhan1 juszhan1 requested a review from jolivier23 December 23, 2024 18:56
@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15663/2/testReport/

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15663/2/execution/node/1216/log

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15663/3/execution/node/490/log

@juszhan1 juszhan1 closed this Dec 27, 2024
@juszhan1 juszhan1 deleted the juszhan/google/2.6 branch December 27, 2024 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.