-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mjmac/DAOS 16787 google 2.6 #15498
Closed
Closed
mjmac/DAOS 16787 google 2.6 #15498
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Contributor
mjmac
commented
Nov 13, 2024
- DAOS-15859 build: Move master to 2.7 test builds. (DAOS-15859 build: Move master to 2.7 test builds. #14407)
- DAOS-15856 test: speedup rebuild/basic (DAOS-15856 test: speedup rebuild/basic #14398)
- DAOS-15860 cq: update pylint to 3.2.2 (DAOS-15860 cq: update pylint to 3.2.2 #14406)
- DAOS-15834 client: disable interception before exec() (DAOS-15834 client: disable interception before exec() #14405)
- DAOS-15794 tools: Add --health-only flag to (dmg|daos) pool query (DAOS-15794 tools: Add --health-only flag to (dmg|daos) pool query #14297)
- DAOS-15829 object: fix potential DRAM leak when retry after DTX refresh (DAOS-15829 object: fix potential DRAM leak when retry after DTX refresh #14394)
- DAOS-15873 test: Fix wal checkpoint metric test (DAOS-15873 test: Fix wal checkpoint metric test #14416)
- DAOS-15858 test: Increase test timeout for test_lost_majority_ps_replicas (DAOS-15858 test: Increase timeout for test_lost_majority_ps_replicas #14411)
- DAOS-15915 cq: fix codespell errors (DAOS-15915 cq: fix codespell errors #14439)
- DAOS-15916 dfs: Seg fault with DFS scanner (DAOS-15916 dfs: Seg fault with DFS scanner #14449)
- DAOS-15808 test: Fix test_dmg_storage_query_device_state (DAOS-15808 test: Fix test_dmg_storage_query_device_state #14440)
- DAOS-15862 client: declare num_fd_dup2ed atomic and lock_fd_dup2ed as rwlock (DAOS-15862 client: declare num_fd_dup2ed atomic and lock_fd_dup2ed as rwlock #14408)
- SRE-2229 utils: Replace /bin/sh with /bin/bash in githooks (SRE-2229 utils: Replace /bin/sh with /bin/bash in githooks #14457)
- DAOS-15884 cart: Cleanup (DAOS-15884 cart: Cleanup #14417)
- DAOS-12648 spdk: workaround for SPDK issue 2683, AMD iommu support (Provide workaround for SPDK issue 2683, AMD iommu support #14166)
- DAOS-15076 engine: Fixes and Performance improvements (DAOS-15076 engine: Fixes and Performance improvements #14404)
- DAOS-15498 rebuild: reprobe in migrate_obj_iter_cb (DAOS-15498 rebuild: reprobe in migrate_obj_iter_cb #14458)
- DAOS-15935 client: search section "[stack]" instead of "[heap]" (DAOS-15935 client: search section "[stack]" instead of "[heap]" #14472)
- DAOS-14539 client: fix pil4dfs strncmp compile in some envs (DAOS-14539 client: fix pil4dfs strncmp compile in some envs #14041)
- DAOS-15801 test: Add aio ioengine to pil4dfs_fio.py functional test (DAOS-15801 test: Add aio ioengine to pil4dfs_fio.py functional test #14375)
- DAOS-15868 test: Remove the mpi_type parameter. (DAOS-15868 test: Remove the mpi_type parameter. #14453)
- DAOS-15287 test: Add register tearDown steps for job managers (DAOS-15287 test: Add register tearDown steps for job managers #14245)
- DAOS-13887 cart: Improve context destroy sequence (DAOS-13887 cart: Improve context destroy sequence #12594)
- DAOS-14679 pool: Report on stopping sp_stopping (DAOS-14679 pool: Report on stopping sp_stopping #14374)
- DAOS-15687 test: Fix daos_per tests. (DAOS-15687 test: Fix daos_perf tests. #14489)
- DAOS-15948 test: Include the varaint number in log files (DAOS-15948 test: Include the varaint number in log files #14488)
- DAOS-15920 object: check modification for CPD RPC (DAOS-15920 object: check modification for CPD RPC #14455)
- SRE-2180 ci: Update jira_query.py with 2.6 release values (SRE-2180 ci: Update jira_query.py with 2.6 release values #14438)
- DAOS-623 cq: correct TARGET for merge commits (DAOS-623 cq: correct TARGET for merge commits #14384)
- DAOS-15959 cart: Add valgrind suppressions for Go runtime (DAOS-15959 cart: Add valgrind suppressions for Go runtime #14507)
- DAOS-15850 control: Pass up errors when failing to read cert dir (DAOS-15850 control: Pass up errors when failing to read cert dir #14475)
- DAOS-15842 control: Reject re-join with different control addr (DAOS-15842 control: Reject re-join with different control addr #14470)
- DAOS-15805 control: Adjust SPDK config entry order to avoid hotplug race (DAOS-15805 control: Adjust SPDK config entry order to avoid hotplug race #14480)
- DAOS-15885 vos: check DTX visibility when probe for EV iteration (DAOS-15885 vos: check DTX visibility when probe for EV iteration #14418)
- DAOS-623 build: Update packaging to… (DAOS-623 build: Update packaging to… #14513)
- DAOS-15589 test: Pass the CR tests if contents of mount point are removed on MD-on-SSD cluster (DAOS-15589 test: Pass the CR tests if contents of mount point are removed on MD-on-SSD cluster #14350)
- DAOS-15931 rebuild: fix data corruption caused by partial parity rebuild epoch (DAOS-15931 rebuild: fix data corruption caused by partial parity rebuild epoch #14512)
- DAOS-15963 test: option to disable ULT stack dump on failure (DAOS-15963 test: option to disable ULT stack dump on failure #14506)
- DAOS-15950 cart: Add utility to dump daos errocodes (DAOS-15950 cart: Add utility to dump daos errocodes #14491)
- DAOS-15919 test: wait more time after system restart (DAOS-15919 tests: wait more time after system restart #14471)
- DAOS-15991 cq: update pylint to 3.2.3 (DAOS-15991 cq: update pylint to 3.2.3 #14530)
- DAOS-15972 pool: Address Coverity 2555665,2555666 (DAOS-15972 pool: Address Coverity 2555665,2555666 #14515)
- DAOS-15052 test: Update expected wal checkpoint metrics (DAOS-15052 test: Update expected wal checkpoint metrics #14520)
- DAOS-15391 pool: destroy container during reintegration discard (DAOS-15391 pool: destroy container during reintegration discard #13958)
- DAOS-15968 container: Fix CAPA fetch on NS master (DAOS-15968 container: Fix CAPA fetch on NS master #14511)
- DAOS-15849 control: Add client uid map to agent config (DAOS-15849 control: Add client uid map to agent config #14381)
- DAOS-10250 control: Get enabled and disabled ranks with dmg pool query (DAOS-10250 control: Get enabled and disabled ranks with dmg pool query #14436)
- DAOS-15617 test: Find mpi4py test in /usr/bin (DAOS-15617 test: Find mpi4py test in /usr/bin #14163)
- DAOS-15966 control: Remove dmg storage query device-health (DAOS-15966 control: Remove dmg storage query device-health #14508)
- DAOS-15961 cart: Reorganize how envs are handled (DAOS-15961 cart: Reorganize how envs are handled #14504)
- DAOS-15851 ci: Use CI_PR_REPOS parameter if provided (DAOS-15851 ci: Use CI_PR_REPOS parameter if provided #14387)
- DAOS-15852 test: more timing samples for co_op_dup_timing() (DAOS-15852 test: more timing samples for co_op_dup_timing() #14497)
- DAOS-16000 test: fix get_host_log_files ignoring error (DAOS-16000 test: fix get_host_log_files ignoring error #14540)
- DAOS-15738 control: Update required Go build version (DAOS-15738 control: Update required Go build version #14413)
- DAOS-16011 : Valgrind suppressions due to update Go to 1.22.3 (DAOS-16011 : Valgrind suppressions due to update Go to 1.22.3 #14564)
- DAOS-15966 docs: Remove references to device-health cmd (DAOS-15966 docs: Remove references to device-health cmd #14572)
- DAOS-15955 test: increase clush fanout for run_remote (DAOS-15955 test: increase clush fanout for run_remote #14509)
- DAOS-15913 doc: remove depecated --label flag (DAOS-15913 doc: remove depecated --label flag #14546)
- DAOS-14630 docs: Document MD-on-SSD configuration (DAOS-14630 docs: Document MD-on-SSD configuration #14559)
- DAOS-14517 test: Fixing test_daos_vol_bigio (DAOS-14517 test: Fixing test_daos_vol_bigio #14523)
- DAOS-15956 control: Limit race between taking leadership and join (DAOS-15956 control: Limit race between taking leadership and join #14541)
- DAOS-16015 mgmt: Fix Coverity 2555784 (DAOS-16015 mgmt: Fix Coverity 2555784 #14576)
- DAOS-16026 pydaos: fix offset issue in oit_mark (DAOS-16026 pydaos: fix offset issue in oit_mark #14570)
- DAOS-16052 test: Skip wal tests if fault injection feature is disabled (DAOS-16052 test: Skip wal tests if fault injection feature is disabled #14600)
- DAOS-14968 test: Update MOFED version in CI. (DAOS-14968 test: Update MOFED version in CI. #13864)
- DAOS-16020 cart: Fix bad macro (DAOS-16020 cart: Fix bad macro #14569)
- DAOS-15992 client: set st_blksize in ostatx_cb() and add unit test (DAOS-15992 client: set st_blksize in ostatx_cb() and add unit test #14534)
- DAOS-16001 placement: fix cases for delay_rebuild (DAOS-16001 placement: fix cases for delay_rebuild #14557)
- DAOS-15751 bio: change default bio_max_async_sz to 32k (DAOS-15751 bio: change default bio_max_async_sz to 32k #14568)
- DAOS-16091 umem: reset rc properly (DAOS-16091 umem: reset rc properly #14621)
- DAOS-16087 test: disable dfs pipeline test in release build (DAOS-16087 test: disable dfs pipeline test in release build #14627)
- DAOS-16021 test: Increase timeout for test_daos_container (DAOS-16021 test: Increase timeout for test_daos_container #14583)
- DAOS-15919 chk: rollback generation if failed to start checker (DAOS-15919 chk: rollback generation if failed to start checker #14577)
- DAOS-15999 test: Update led check method for soak_utils.py (DAOS-15999 test: Update led check method for soak_utils.py #14607)
- DAOS-13944 test: variants for performance pil4dfs (DAOS-13994 test: variants for performance pil4dfs #14533)
- DAOS-16016 test: Coverity 2555785 fix and run test_rebuild_35 (DAOS-16016 test: Coverity 2555785 fix and run test_rebuild_35 #14619)
- DAOS-16075 ci: Valgrind issues related to go 1.22.3 (DAOS-16075 ci: Valgrind issues related to go 1.22.3 #14615)
- DAOS-16081 test: Skipping fault injection tests in release builds (DAOS-16081 test: Skipping fault injection tests in release builds #14635)
- DAOS-16084 test: make FI required for pool/cont retry tests (DAOS-16084 test: make FI required for pool/cont retry tests #14634)
- DAOS-16111 rebuild: uniform identifier in logs part 1 (DAOS-16111 rebuild: uniform identifier in logs part 1 #14383)
- DAOS-16070 tools: Include BuildInfo field in version output (DAOS-16070 tools: Include BuildInfo field in version output #14609)
- DAOS-16122 test: update pylint to 3.2.4 (DAOS-16122 test: update pylint to 3.2.4 #14651)
- DAOS-16039 object: fix EC aggregation wrong peer address (DAOS-16039 object: fix EC aggregation wrong peer address #14593)
- DAOS-16103 test: Increase test_ms_failover test timeout (DAOS-16103 test: Increase test_ms_failover test timeout #14656)
- DAOS-16101 mgmt: enhanced interoperability for MGMT module (DAOS-16101 mgmt: enhanced interoperability for MGMT module #14638)
- DAOS-16009 rebuild: fix O_TRUNC file size related handling (DAOS-16009 rebuild: fix O_TRUNC file size related handling #14649)
- DAOS-15930 test: extend_simple add FI required macro call (DAOS-15930 test: extend_simple add FI required macro call #14667)
- DAOS-623 test: update suppressions for NLT (DAOS-623 test: update suppressions for NLT #14694)
- DAOS-16107 cart: submit vs timeout list (DAOS-16107 cart: submit vs timeout list #14674)
- DAOS-15874 control: Add optional credential cache to agent (DAOS-15874 control: Add optional credential cache to agent #14412)
- DAOS-15962 bio: error message cleanup (DAOS-15962 bio: error message cleanup #14663)
- DAOS-16134 cq: update pylint to 3.2.5 (DAOS-16134 cq: update pylint to 3.2.5 #14684)
- DAOS-16040 test: Agent failure Aurora support - Use EC object class (DAOS-16040 test: Agent failure Aurora support - Use EC object class #14590)
- DAOS-16161 object: should bulk GET for ds_obj_ec_rep_handler (DAOS-16161 object: should bulk GET for ds_obj_ec_rep_handler #14703)
- DAOS-16168 build: Ignore scons version deprecation (DAOS-16168 build: Ignore scons version deprecation #14715)
- DAOS-15863 container: fix container destroy error (DAOS-15863 container: fix container destroy error #13852)
- DAOS-16076 test: Automate dmg scale test to be run on Aurora (DAOS-16076 test: Automate dmg scale test to be run on Aurora #14616)
- DAOS-9247 control,bio: Add PCIe link speed and width to NVMe health stats (DAOS-9247 control,bio: Add PCIe link speed and width to NVMe health stats #14395)
- DAOS-16141 client: strlen() to replace sizeof() for a string variable (DAOS-16141 client: strlen() to replace sizeof() for a string variable #14687)
- DAOS-16078 common: fix a Coverity issue (DAOS-16078 common: fix a Coverity issue #14704)
- DAOS-16153 nlt: set server nlt tmpfs size to 8g (DAOS-16153 nlt: set server nlt tmpfs size to 8g #14733)
- Revert "DAOS-16153 nlt: set server nlt tmpfs size to 8g (DAOS-16153 nlt: set server nlt tmpfs size to 8g #14733)" (Revert "DAOS-16153 nlt: set server nlt tmpfs size to 8g" #14749)
- DAOS-14311 pool: add pool warmup RPC + bulk on connect (DAOS-14311 pool: add pool warmup RPC + bulk on connect #14669)
- DAOS-16204 test: remove sudo requirements from io_sys_admin.py (DAOS-16204 test: remove sudo requirements from io_sys_admin.py #14732)
- DAOS-16214 container: Ensure Correct Protocol Version is Passed to cont_create_in_get_data() (DAOS-16214 container: Ensure Correct Protocol Version is Passed to co… #14754)
- DAOS-16228 test: update suppressions for NLT again (DAOS-16228 test: update suppressions for NLT again #14762)
- DAOS-16215 test: add pciutils-devel requirement for client-tests (DAOS-16215 test: add pciutils-devel requirement for client-tests #14746)
- DAOS-16037 pool: Fix upgrade for svc_ops (DAOS-16037 pool: Fix upgrade for svc_ops #14753)
- DAOS-16159 control: Accept UCX full transport names (DAOS-16159 control: Accept UCX full transport names #14768)
- DAOS-16210 vos: Interoperability fix for flat dkey (DAOS-16210 vos: Interoperability fix for flat dkey #14769)
- DAOS-15998 pool: Let all pool_svc rest on ds_pool (DAOS-15998 pool: Let all pool_svc rest on ds_pool #14681)
- DAOS-15825 control: Add NUMA/fabric map to GetAttachInfo payload (DAOS-15825 control: Add NUMA/fabric map to GetAttachInfo payload #14628)
- DAOS-16226 cart: URI_LOOKUP rpc to inherit the original RPC timeout (DAOS-16226 cart: URI_LOOKUP rpc to inherit the original RPC timeout #14777)
- DAOS-16038 pool: Enable pool list for clients (DAOS-16038 pool: Enable pool list for clients #14672)
- DAOS-16036 bio: bump max WAL tx size to 16MB (DAOS-16036 bio: bump max WAL tx size to 16MB #14589)
- DAOS-15967 control: Raise RAS event if link speed|width is downgraded (DAOS-15967 control: Raise RAS event if link speed|width is downgraded #14665)
- DAOS-16234 test: add a 10 ms sleep in the dfs time test
- DAOS-16170 dtx: handle ds_pool_lookup failure when DTX resync (DAOS-16170 dtx: handle ds_pool_lookup failure when DTX resync #14734)
- DAOS-16005 object: check resent coll_punch on leader and relay engine (DAOS-16005 object: check resent coll_punch on leader and relay engine #14659)
- DAOS-16151 cart: add no-sync option to self_test (DAOS-16151 cart: add no-sync option to self_test #14697)
- DAOS-16146 cart: Backport metrics from google/2.4 (DAOS-16146 cart: Backport metrics from google/2.4 #14692)
- DAOS-16008 cart: cart_ctl fixes (DAOS-16008 cart: cart_ctl fixes #14686)
- DAOS-16254 pool: Fix rfcheck_ult infinite looping (DAOS-16254 pool: Fix rfcheck_ult infinite looping #14789)
- DAOS-15914 dtx: control DTX RPC to reduce network load (DAOS-15914 dtx: control DTX RPC to reduce network load #14476)
- DAOS-15064 mgmt: Show all ranks in daos system query (DAOS-15064 mgmt: Show all ranks in daos system query #14764)
- DAOS-16162 doc: add information on MPI-IO daos: prefix (DAOS-16162 doc: add information on MPI-IO daos: prefix #14794)
- DAOS-16237 control: Add OID to filesystem get-attr (DAOS-16237 control: Add OID to filesystem get-attr #14774)
- DAOS-16253 control: Add chmod to daos utility (DAOS-16253 control: Add chmod to daos utility #14787)
- DAOS-16103 control: Avoid deadlock during leadership change (DAOS-16103 control: Avoid deadlock during leadership change #14781)
- DAOS-16250 control: Allow control client version override (DAOS-16250 control: Allow control client version override #14785)
- DAOS-16261 cq: update pylint to 3.2.6 (DAOS-16261 cq: update pylint to 3.2.6 #14801)
- DAOS-16263 cq: merge yamllint and clang-format into linting (DAOS-16263 cq: merge yamllint and clang-format into linting #14803)
- DAOS-13078 test: update ior_per_rank.py to work in CI (DAOS-13078 test: update ior_per_rank.py to work in CI #14800)
- DAOS-15953 test: update online_drain to use config values (DAOS-15953 test: update online_drain to use config values #14498)
- DAOS-15613 test: build current SHA in daos_build.py (DAOS-15613 test: build current SHA in daos_build.py #14796)
- DAOS-15778 test: remove DataMoverTestBase.posix_local_test_paths (DAOS-15778 test: remove DataMoverTestBase.posix_local_test_paths #14802)
- DAOS-16127 tools: Add daos health check command (DAOS-16127 tools: Add daos health check command #14730)
- DAOS-16223 control: Config generate to not output auto values (DAOS-16223 control: Config generate to not output auto values #14778)
- DAOS-15946 control: Checker enablement misc fixes (DAOS-15946 control: Checker enablement misc fixes #14700)
- DAOS-623 vos: Add visible iterator option to vos command line (DAOS-623 vos: Add visible iterator option to vos command line #14653)
- DAOS-16238 utils: Add valgrind suppressions for Go runtime (DAOS-16238 utils: Add valgrind suppressions for Go runtime #14823)
- DAOS-16082 test: Repeatedly query device after SysXS device is set to… (DAOS-16082 test: Repeatedly query device after SysXS device is set to… #14736)
- DAOS-16035 rebuild: create VOS cont when no record need to be rebuilt (DAOS-16035 rebuild: create VOS cont for reintegrate cases #14819)
- DAOS-16217 test: Update run_local(). (DAOS-16217 test: Update run_local(). #14748)
- DAOS-16013 tools: Query all pool targets by default (DAOS-16013 tools: Query all pool targets by default #14783)
- DAOS-16264 common: Fix incorrect assertion (DAOS-16264 common: Fix incorrect assertion #14809)
- DAOS-16286 client: intercept fdatasync() (DAOS-16286 client: intercept fdatasync() #14835)
- DAOS-14422 control: Make pool proto variable names consistent (DAOS-14422 proto: Make pool proto variable names consistent #14838)
- DAOS-15825 control: Improve GetAttachInfo NUMA fabric map (DAOS-15825 control: Improve GetAttachInfo NUMA fabric map #14791)
- DAOS-16211 vos: Ensure we delete exact entry (DAOS-16211 vos: Ensure we delete exact entry #14763)
- DAOS-16295 test: fix streaming stdout when running launch.py locally (DAOS-16295 test: fix streaming stdout when running launch.py locally #14848)
- DAOS-623 ci: Add a workflow for Trivy scan (DAOS-623 ci: Add a workflow for Trivy scan #14623)
- DAOS-16089 object: more check when transfer bitmap for coll punch (DAOS-16089 object: more check when transfer bitmap for coll punch #14743)
- DAOS-16300 cart: Parse domain properly for multi-interface case (DAOS-16300 cart: Parse domain properly for multi-interface case #14864)
- DAOS-16218 test: Support running agent as user (DAOS-16218 test: Support running agent as user #14751)
- DAOS-16010 control: add support to override rync binary and add args (DAOS-16010 control: add support to override rync binary and add args #14772)
- DAOS-15997 build: bump mercury version to 2.4.0rc4 (DAOS-15997 build: bump mercury version to 2.4.0rc4 #14873)
- DAOS-11866 build: Add --build-deps=fetch option (DAOS-11866 build: Add --build-deps=fetch option #14125)
- DAOS-16282 control: Fix some test assumptions about JSON (DAOS-16282 control: Fix some test assumptions about JSON #14832)
- DAOS-16305 build: Simplify hermetic builds (DAOS-16305 build: Simplify hermetic builds #14874)
- DAOS-16288 cart: coverity fixes for 16287,16288,16289 (DAOS-16288 cart: coverity fixes for 16287,16288,16289 #14843)
- DAOS-16313 build: Add explicit linker flags to lib/daos/api (DAOS-16313 build: Add explicit linker flags to lib/daos/api #14889)
- DAOS-16316 test: Disable runtime dir creation for test (DAOS-16316 test: Disable runtime dir creation for test #14892)
- DAOS-16279 test: bump expected WAL replay time (DAOS-16279 test: bump expected WAL replay time #14833)
- DAOS-16217 test: Use subprocess.run() for run_local() (DAOS-16217 test: Use subprocess.run() for run_local() #14882)
- DAOS-15960 tests: Improvements for io_sys_admin test (DAOS-15960 tests: Improvements for io_sys_admin test #14503)
- DAOS-16317 cart: Simplify crt_get_info_string (DAOS-16317 cart: Simplify crt_get_info_string #14896)
- DAOS-16229 test: support dynamic stonewall file (DAOS-16229 test: support dynamic stonewall file #14771)
- DAOS-16310 test: consolidate il and ioil tags (DAOS-16310 test: consolidate il and ioil tags #14881)
- DAOS-16308 container: create ULT for ds_cont_tgt_snapshots_update (DAOS-16308 container: create ULT for ds_cont_tgt_snapshots_update #14888)
- DAOS-16323 vos: Fix conditinal update perf bug (DAOS-16323 vos: Fix conditional update perf bug #14901)
- DAOS-15914 cart: add D_MRECV_BUF env var to control number of multi-r… (DAOS-15914 cart: add D_MRECV_BUF env var to control number of multi-r… #14662)
- DAOS-16330 cart: Reduce error logs during timeouts (DAOS-16330 cart: Reduce error logs during timeouts #14905)
- DAOS-16219 test: Remove sudo usage from fault injection files (DAOS-16219 test: Remove sudo usage from fault injection files #14898)
- DAOS-16089 object: IO handler should check obj_bulk_args::result (DAOS-16089 object: IO handler should check obj_bulk_args::result #14894)
- DAOS-16301 test: Increase verify_perms.py timeout (DAOS-16301 test: Increase verify_perms.py timeout #14910)
- DAOS-16354 cart: Don't retry URI_LOOKUPs for PROTO_QUERY rpc (DAOS-16354 cart: Don't retry URI_LOOKUPs for PROTO_QUERY rpc #14922)
- DAOS-16306 common: Fix Use-After-Free issue in LRU Cache (DAOS-16306 common: Fix Use-After-Free issue in LRU Cache #14906)
- DAOS-16291 bio: auto detect faulty for an unplugged device (DAOS-16291 bio: auto detect faulty for an unplugged device #14850)
- DAOS-16307 client: Defer creating network context in child processes (DAOS-16307 client: Defer creating network context in child processes #14875)
- DAOS-16368 doc: Fix typo in dfuse documentation (DAOS-16368 doc: Fix typo in dfuse documentation #14951)
- DAOS-16299 gurt: Unused includes in errno.c (DAOS-16299 gurt: Unused includes in errno.c #14857)
- DAOS-16097 vos: assign persistent DTX entry in vos_dtx_prepared (DAOS-16097 vos: assign persistent DTX entry in vos_dtx_prepared #14708)
- DAOS-16333 test: Disable the debug mask during rebuild in soak testing (DAOS-16333 test: Disable the debug mask during rebuild in soak testing #14913)
- DAOS-16304 tools: Create libdaos_self_test (DAOS-16304 tools: Create libdaos_self_test #14950)
- DAOS-16340 cart: Fix order of finalize in cart_ctl (DAOS-16340 cart: Fix order of finalize in cart_ctl #14969)
- DAOS-16352 control: Handle cases with static ifaces (DAOS-16352 control: Handle cases with static ifaces #14953)
- DAOS-16381 test: Run IOR with HDF5-VOL with multiple object classes (DAOS-16381 test: Run IOR with HDF5-VOL with multiple object classes #14964)
- DAOS-16131 client: intercept mmap() with trampoline method (DAOS-16131 client: intercept mmap() with trampoline method #14742)
- DAOS-16098 doc: Add a guide for setting up DAOS using QEMU (DAOS-16098 doc: Add a guide for setting up DAOS using QEMU #14648)
- DAOS-14317 vos: vos_obj_hold() rework (DAOS-14317 vos: vos_obj_hold() rework #14701)
- DAOS-15439 cart: Support free-form provider specification for UCX. (DAOS-15439 cart: Support free-form provider specification for UCX. #14911)
- DAOS-16265 test: Add debug for common test dir between variants (DAOS-16265 test: Add debug for common test dir between variants #14818)
- DAOS-15996 test: enhance output of test_daos_rebuild_ec (DAOS-15996 test: enhance output of test_daos_rebuild_ec #14949)
- DAOS-15137 test: enhance output of test_daos_extend_simple (DAOS-15137 test: enhance output of test_daos_extend_simple #14948)
- DAOS-16391 test: ignore rc from run_local diff (DAOS-16391 test: ignore rc from run_local diff #14978)
- DAOS-14544 rsvc: crt tree type usage fix (DAOS-14544 rsvc: crt tree type usgae fix #14834)
- DAOS-16407 cart: coverity 2555825 fix (DAOS-16407 cart: coverity 2555825 fix #14994)
- DAOS-16371 il: do not return unsupported for fputs if ioil is loaded (DAOS-16371 il: do not return unsupported for fputs if ioil is loaded #14989)
- DAOS-16251 engine: Misc fixes and cleanups (DAOS-16251 engine: Misc fixes and cleanups #14983)
- DAOS-16251 tests: Fix various buffer overflows (DAOS-16251 tests: Fix various buffer overflows #15003)
- DAOS-16406 test: file_count_test_base.py - Don't obtain dfuse mount_dir from test yaml (DAOS-16406 test: file_count_test_base.py - Don't obtain dfuse mount_d… #14993)
- DAOS-16347 object: partial coll-punch because of CPU yield (DAOS-16347 object: partial coll-punch because of CPU yield #15000)
- DAOS-15575 dfs: replace DAOS_TX_NONE with dfs th (DAOS-15575 dfs: replace DAOS_TX_NONE with dfs th #14094)
- DAOS-16334 test: (DAOS-16334 test: Fix permission denied removing temporary test files #15022)
- DAOS-16366 test: Use agent/server config files from test directory (DAOS-16366 test: Use agent/server config files from test directory #14944)
- DAOS-16251 mgmt: Fix use-after-free in pool_list (DAOS-16251 mgmt: Fix use-after-free in pool_list #15014)
- DAOS-14348 client: GC for pil4dfs dentry cache (DAOS-14348 client: GC for pil4dfs dentry cache #14995)
- DAOS-16451 telemetry: Adjust type of (_sum|_sumsquares) (DAOS-16451 telemetry: Adjust type of (_sum|_sumsquares) #15018)
- DAOS-16245 control: Fix dmg cont set-owner (DAOS-16245 control: Fix dmg cont set-owner #14945)
- DAOS-16365 client: intercept MPI_Init() to avoid nested call (DAOS-16365 client: intercept MPI_Init() to avoid nested call #14992)
- DAOS-16462 test: remove server manager srv_timeout (DAOS-16462 test: remove server manager srv_timeout #15029)
- DAOS-16463 test: remove get_host_log_files (DAOS-16463 test: remove get_host_log_files #15030)
- DAOS-16457 test: remove display_memory_info (DAOS-16457 test: remove display_memory_info #15031)
- DAOS-16315 mercury: update to 2.4.0rc5 (DAOS-16315 mercury: update to 2.4.0rc5 #15015)
- DAOS-16465 vos: fix misused ip_hdl (DAOS-16465 vos: fix misused ip_hdl #15034)
- DAOS-16385 dtx: fix DRAM leak during handle DTX collective RPC (DAOS-16385 dtx: fix DRAM leak during handle DTX collective RPC #15010)
- DAOS-16484 test: Exclude local host in default interface selection (DAOS-16484 test: Exclude local host in default interface selection #15049)
- DAOS-15800 client: create cart context on specific interface (DAOS-15800 client: create cart context on specific interface #14804)
- DAOS-16445 client: Add function to cycle OIDs non-sequentially (DAOS-16445 client: Add function to cycle OIDs non-sequentially #14999)
- DAOS-16251 dtx: Fix dtx_req_send user-after-free (DAOS-16251 dtx: Fix dtx_req_send user-after-free #15035)
- DAOS-16304 tools: Add daos health net-test command (DAOS-16304 tools: Add daos health net-test command #14980)
- DAOS-16272 dfs: fix get_info returning incorrect oclass (DAOS-16272 dfs: fix get_info returning incorrect oclass #15048)
- DAOS-15863 container: fix a race for container cache (DAOS-15863 container: fix a race for container cache #15038)
- DAOS-16471 test: Reduce targets for ioctl_pool_handles.py (DAOS-16471 test: Reduce targets for ioctl_pool_handles.py #15063)
- DAOS-16483 vos: handle empty DTX when vos_tx_end (DAOS-16483 vos: handle empty DTX when vos_tx_end #15053)
- DAOS-16271 mercury: Add patch to avoid seg fault in key resolve. (DAOS-16271 mercury: Add patch to avoid seg fault in key resolve. #15067)
- DAOS-16484 test: Support mixed speeds when selecting a default interface (DAOS-16484 test: Support mixed speeds when selecting a default interface #15050)
- DAOS-16446 test: HDF5-VOL test - Set object class and container prope… (DAOS-16446 test: HDF5-VOL test - Set object class and container prope… #15004)
- DAOS-16447 test: set D_IL_REPORT per test (DAOS-16447 test: set D_IL_REPORT per test #15012)
- DAOS-16450 test: auto run dfs tests when dfs is modified (DAOS-16450 test: auto run dfs tests when dfs is modified #15017)
- DAOS-16510 cq: update pylint to 3.2.7 (DAOS-16510 cq: update pylint to 3.2.7 #15072)
- DAOS-16509 test: replace IorTestBase.execute_cmd with run_remote (DAOS-16509 test: replace IorTestBase.execute_cmd with run_remote #15070)
- DAOS-16458 object: fix invalid DRAM access in obj_bulk_transfer (DAOS-16458 object: fix invalid DRAM access in obj_bulk_transfer #15026)
- DAOS-16486 object: return proper error on stale pool map (DAOS-16486 object: return proper error on stale pool map #15064)
- DAOS-16514 vos: fix coverity issue (DAOS-16514 vos: fix coverity issue #15083)
- DAOS-16467 rebuild: add DAOS_POOL_RF ENV for massive failure case (DAOS-16467 rebuild: add DAOS_POOL_RF ENV for massive failure case #15037)
- DAOS-16508 csum: retry a few times on checksum mismatch on update (DAOS-16508 csum: retry a few times on checksum mismatch on update #15069)
- DAOS-10877 vos: gang allocation for huge SV (DAOS-10877 vos: gang allocation for huge SV #14790)
- DAOS-16304 tools: Adjust default RPC size for net-test (DAOS-16304 tools: Adjust default RPC size for net-test #15091)
- SRE-2408 ci: Increase timeout (to 15 minutes) for system restore (SRE-2408 ci: Increase timeout (to 15 minutes) for system restore #14926)
- DAOS-16251 object: Fix obj_ec_singv_split overflow (DAOS-16466 object: Fix obj_ec_singv_split overflow #15045)
- DAOS-16460 test: Improve get_service_file() (DAOS-16560 test: Improve get_service_file() #15116)
- DAOS-16540 test: include extra yaml for soak md on ssd (DAOS-16540 test: include extra yaml for md on ssd #15104)
- DAOS-16468 test: ior/small.py - Decrease crt_timeout to 10 (DAOS-16468 test: ior/small.py - Decrease crt_timeout to 10 #15100)
- DAOS-16480 test: Increase expected range for dirty_pages metric (DAOS-16480 test: Increase expected range for dirty_pages metric #15097)
- DAOS-16567 test: remove unused IorCommand.log_metrics (DAOS-16567 test: remove unused IorCommand.log_metrics #15128)
- DAOS-15776 test: remove DataMoverTestBase.create_pool (DAOS-15776 test: remove DataMoverTestBase.create_pool #15079)
- DAOS-12859 test: use pool and container labels (pass 3) (DAOS-12859 test: use pool and container labels (pass 3) #13210)
- DAOS-16027 test: Adding daos_test REBUILD31-34 subtests (DAOS-16027 test: Adding daos_test REBUILD31-34 subtests #14584)
- DAOS-15644 test: remove control_method dmg (DAOS-15644 test: remove control_method dmg #15081)
- DAOS-16550 test: use correct stonewall file with mdtest (DAOS-16550 test: use correct stonewall file with mdtest #15109)
- DAOS-16482 control: Increase min hugepages for single tgt count (DAOS-16482 control: Increase min hugepages for single tgt count #15115)
- DAOS-623 test: Support running independent io sys admin steps (DAOS-16584 test: Support running independent io sys admin steps #15134)
- DAOS-16566 test: Update server/multiengine_persocket.py (DAOS-16566 test: Update server/multiengine_persocket.py #15127)
- DAOS-16570 control: Break up hwprov package (DAOS-16570 control: Break up hwprov package #15137)
- DAOS-16487 control: Require hostname for nvme set-faulty & replace (DAOS-16487 control: Require hostname for nvme set-faulty & replace #15074)
- DAOS-16495 test: Use the test env control config file w/ dmg (DAOS-16495 test: Use the test env control config file w/ dmg #15094)
- DAOS-16292 control: Allow optional pool UUID for Create API (DAOS-16292 control: Allow optional pool UUID for Create API #15142)
- DAOS-14248 engine: strengthen signals handling (DAOS-14248 engine: strengthen signals handling #13031)
- DAOS-16336 test: bump io test timeout (DAOS-16336 test: bump io test timeout #15073)
- DAOS-16548 test: add ftest lint check for invalid test_ tag (DAOS-16548 test: add ftest lint check for invalid test_ tag #15106)
- DAOS-16559 container: return EBUSY for container being destroyed (DAOS-16559 container: return EBUSY for container being destroyed #15154)
- DAOS-16607 control: Update vendored version of grpc-go (DAOS-16607 control: Update vendored version of grpc-go #15161)
- DAOS-16487 test: fix dmg c helper for set-faulty changes (DAOS-16487 test: fix dmg c helper for set-faulty changes #15151)
- DAOS-16346 tests: fix pmemcheck vea errors (DAOS-16346 tests: fix pmemcheck vea errors #14954)
- DAOS-16589 test: Add Hardware Medium VMD test stage. (DAOS-16589 test: Add Hardware Medium VMD test stage. #15152)
- DAOS-16611 java: Bump com.google.protobuf:protobuf-java (Build(deps): Bump com.google.protobuf:protobuf-java from 3.16.3 to 3.25.5 in /src/client/java/daos-java #15160)
- DAOS-16169 test: Skip recovery tests requiring fault injection (DAOS-16169 test: Skip recovery tests requiring fault injection #15159)
- DAOS-16613 cq: update pylint to 3.3.0 (DAOS-16613 cq: update pylint to 3.3.0 #15165)
- DAOS-16329 chk: maintenance mode after checking pool with dryrun (DAOS-16329 chk: maintenance mode after checking pool with dryrun #14984)
- DAOS-16563 client: mark pool/cont handle as g2l after fork (DAOS-16563 client: mark pool/cont handle as g2l after fork #15125)
- DAOS-16251 tests: Fix various memory issues (DAOS-16251 tests: Fix various memory issues #15147)
- DAOS-16153 test: Do not run NLT fi tests for release builds. (DAOS-16153 test: Do not run NLT fi tests for release builds. #15171)
- DAOS-15682 dfuse: Perform reads in larger chunks. (DAOS-15682 dfuse: Perform reads in larger chunks. #14212)
- DAOS-16629 build: Allow separate test builds (DAOS-16629 build: Allow separate test builds #15188)
- DAOS-16589 test: Support Functional Hardware Medium VMD stage (DAOS-16589 test: Support Functional Hardware Medium VMD stage #15166)
- DAOS-15583 client: introduce whitelist mode into libpil4dfs (DAOS-15583 client: introduce whitelist mode into libpil4dfs #14812)
- DAOS-16298 test: improve get_clush_command timeout (DAOS-16298 test: improve get_clush_command timeout #15113)
- DAOS-16621 build: Fix Go versions in rpm/deb packaging (DAOS-16621 build: Fix Go versions in rpm/deb packaging #15174)
- DAOS-16628 client: reset eq counter to zero after fork() in IL (DAOS-16628 client: reset eq counter to zero after fork() in IL #15187)
- DAOS-16634 cart: Test DAOS with Mercury UCX updates. (DAOS-16634 cart: Test DAOS with Mercury UCX updates. #15205)
- DAOS-16626 cq: update pylint to 3.3.1 (DAOS-16626 cq: update pylint to 3.3.1 #15182)
- DAOS-14419 control: Display disabled ranks by default (DAOS-14419 control: Display disabled ranks by default #15112)
- DAOS-16585 tests: Improve NLT checking of ioil metrics. (DAOS-16585 tests: Improve NLT checking of ioil metrics. #15179)
- DAOS-16268 test: daos_test/rebuild.py tests not reporting failed pool creation (DAOS-16268 test: daos_test/rebuild.py tests not reporting failed pool creation #15110)
- DAOS-16577 test: remove hw tag from deployment/disk_failure.py (DAOS-16577 test: remove hw tag from some manual tests #15138)
- DAOS-16585 test: disable NLT fstat test for non-redhat systems. (DAOS-16585 test: disable NLT fstat test for non-redhat systems. #15233)
- DAOS-16590 test: misc ftest/performance updates (DAOS-16590 test: misc ftest/performance updates #15144)
- DAOS-16667 client: bump hadoop-common from 3.3.6 to 3.4.0 (Build(deps-dev): Bump org.apache.hadoop:hadoop-common from 3.3.6 to 3.4.0 in /src/client/java/hadoop-daos #15194)
- DAOS-16350 test: decrease pool size for ior_per_rank (DAOS-16350 test: decrease pool size for ior_per_rank #15183)
- DAOS-16488 chk: take sd_lock before accessing VOS sys_db (DAOS-16488 chk: take sd_lock before accessing VOS sys_db #15207)
- DAOS-16572 rebuild: properly assign global_dtx_resync_version in IV (DAOS-16572 rebuild: properly assign global_dtx_resync_version in IV #15185)
- SRE-2509 common: do not skip tests on empty commits (SRE-2509 common: do not skip tests on empty commits #15199)
- SRE-2454 common: fix license to follow SPDX rules (SRE-2454 common: fix license to follow SPDX rules #15133)
- DAOS-14408 common: enable NDCTL for DCPM (DAOS-14408 common: enable NDCTL for DCPM #14371)
- DAOS-16672 control: Move storage_query pretty printers into own file (DAOS-16672 control: Move storage_query pretty printers into own file #15281)
- DAOS-16556 client: call fstat() before mmap() to update file status in kernel (DAOS-16556 client: call fstat() before mmap() to update file status in kernel #15244)
- DAOS-16653 pool: Batch crt events (DAOS-16653 pool: Batch crt events #15230)
- DAOS-16637 pydaos: disable atfork handler (DAOS-16637 pydaos: disable atfork handler #15292)
- DAOS-16673 common: ignore Hadoop 3.4.0 related CVE (DAOS-16673 common: ignore Hadoop 3.4.0 related CVE #15284)
- DAOS-16674 vos: create sysdb lock with recursive attribute (DAOS-16674 vos: create sysdb lock with recursive attribute #15290)
- DAOS-13938 dfuse: adjust offset in readdir cache entry list (DAOS-13938 dfuse: adjust offset in readdir cache entry list #15190)
- DAOS-15596 pkg: Update argobots to 1.2 (DAOS-15596 pkg: Update argobots to 1.2 #15181)
- DAOS-16574 vos: shrink DTX table blob size (DAOS-16574 vos: shrink DTX table blob size #15220)
- DAOS-16693 telemetry: Avoid race between init/read (DAOS-16693 telemetry: Avoid race between init/read #15306)
- DAOS-16650 control: dmg system exclude, update group version (DAOS-16650 control: dmg system exclude, update group version #15288)
- DAOS-16469 dtx: optimize DTX CoS cache (DAOS-16469 dtx: optimize DTX CoS cache #15089)
- DAOS-10146 build: Support for running ansible ftest externally (DAOS-10146 build: Support of GCP VMs with ansible ftest #15311)
- DAOS-16265 test: Fix erasurecode/rebuild_fio.py out of space (DAOS-16265 test: Fix erasurecode/rebuild_fio.py out of space #15020)
- DAOS-16694 dtx: avoid DTX leak during dtx_status_handle (DAOS-16694 dtx: avoid DTX leak during dtx_status_handle #15310)
- DAOS-16720 cq: pin isort to v1.1.0 (DAOS-16720 cq: pin isort to v1.1.0 #15338)
- DAOS-16696 cart: Fix rc in error path (DAOS-16696 cart: Fix rc in error path #15313)
- DAOS-16645 cart: Bump file descriptor limit (DAOS-16645 cart: Bump file descriptor limit #15224)
- DAOS-16568 tests: Better cmdline control of NLT tests (DAOS-16568 tests: Better cmdline control of NLT tests #15141)
- DAOS-16698 control: Bump system_ram_reserved for MD-on-SSD pool create (DAOS-16698 control: Bump system_ram_reserved for MD-on-SSD pool create #15334)
- DAOS-16653 docs: Fix CRT_EVENT_DELAY description (DAOS-16653 docs: Fix CRT_EVENT_DELAY description #15351)
- DAOS-14262 cart: add ability to select traffic class for SWIM context (DAOS-14262 cart: add ability to select traffic class for SWIM context #14893)
- DAOS-16265 test: Split erasurecode/multiple_failure.py (DAOS-16265 test: Split erasurecode/multiple_failure.py #15355)
- DAOS-16704 test: Increase timeout for pool_list_consolidation.py (DAOS-16704 test: Increase timeout for pool_list_consolidation.py #15358)
- DAOS-16687 control: Handle missing PCIe caps in storage query usage (DAOS-16687 control: Handle missing PCIe caps in storage query usage #15296)
- DAOS-13292 build: Don't need UCX libraries … (DAOS-13292 build: Don't need UCX libraries … #15016)
- DAOS-13997 control: Allow labels for fault domain levels (DAOS-13997 control: Allow labels for fault domain levels #15173) (DAOS-13997 control: Allow labels for fault domain levels (#15173) #15315)
- DAOS-15943 test: Remove server logging from pre-teardown (DAOS-15943 test: Remove server logging from pre-teardown #15282)
- DAOS-11516 cart: relax progress and HG thread safety using thread mode (DAOS-11516 cart: relax progress and HG thread safety using thread mode #15368)
- DAOS-16469 container: Lower log level for cont_aggregate_interval (DAOS-16469 container: Lower log level for cont_aggregate_interval #15388)
- DAOS-14572 pydaos: Add a pydaos.DaosErrorCode class to mapping errors. (DAOS-14572 pydaos: Add a pydaos.DaosErrorCode class to mapping errors. #15222)
- DAOS-16721 object: fix coll RPC for obj with sparse layout (DAOS-16721 object: fix coll RPC for obj with sparse layout #15375)
- DAOS-16669 test: fix pool list ftest (DAOS-16669 test: fix pool list ftest #15373)
- DAOS-16697 cart: crt_reply_send_input_free() (DAOS-16697 cart: crt_reply_send_input_free() #15314)
- DAOS-15429 test: Fix racy unit test (DAOS-15429 test: Fix racy unit test #15384)
- DAOS-16096 test: Add retry loop for comparing free pool space (DAOS-16096 test: Add retry loop for comparing free pool space #15289)
- DAOS-16175 container: fix a case for cont_iv_hdl_fetch (DAOS-16175 container: fix a case for cont_iv_hdl_fetch #15395)
- DAOS-16722 client: to intercept PMPI_Init() in libpil4dfs (DAOS-16722 client: to intercept PMPI_Init() in libpil4dfs #15336)
- DAOS-16710 test: Check for total space instead of free space. (DAOS-16710 test: Check for total space instead of free space. #15385)
- DAOS-16709 test: Handle decoding empty json output (DAOS-16709 test: Handle decoding empty json output #15397)
- DAOS-16741 ci: Enable change of origin (DAOS-16744 ci: Enable change of origin #15380)
- DAOS-16211 vos: Avoid race condition with discard (DAOS-16741 vos: Avoid race condition with discard #15370)
- DAOS-16702 rebuild: restart rebuild for a massive failure case (DAOS-16702 rebuild: restart rebuild for a massive failure case #15343)
- SRE-2329 build: Fix checkout 'null' from version control (SRE-2329 build: Fix checkout 'null' from version control #15367)
- DAOS-16685 dfuse: Change event queue poll to use NOWAIT. (DAOS-16685 dfuse: Change event queue poll to use NOWAIT. #15377)
- DAOS-16477 mgmt: return suspect engines for pool healthy query (DAOS-16477 mgmt: return suspect engines for pool healthy query #15196)
- Revert "DAOS-16477 mgmt: return suspect engines for pool healthy query (DAOS-16477 mgmt: return suspect engines for pool healthy query #15196)" (Revert "DAOS-16477 mgmt: return suspect engines for pool healthy query" #15436)
- DAOS-13559 vos: MD-on-SSD phase2 landing (DAOS-13559 vos: MD-on-SSD phase2 landing #15429)
- DAOS-623 common: Remove extraneous quotes from error string. (DAOS-623 common: Remove quotes from error string. #15400)
- DAOS-16639 object: fix assertion (DAOS-16639 object: fix assertion #15329)
- DAOS-16729 dfuse: Remove deprecated single-threaded option. (DAOS-16729 dfuse: Remove deprecated single-threaded option. #15345)
- DAOS-15162 build: update to libfabric 1.22.0 (DAOS-15162 build: update to libfabric 1.22.0 #15401)
- SRE-2505 ci: Fix Trivy scan upload to the Security tab (SRE-2505 ci: Fix Trivy scan upload to the Security tab #15201)
- DAOS-16365 client: intercept dlsym() and zeInit() to avoid nested call (DAOS-16365 client: intercept dlsym() and zeInit() to avoid nested call #14932)
- DAOS-16721 dtx: handle potential DTX ID reusing trouble (DAOS-16721 dtx: handle potential DTX ID reusing trouble #15408)
- DAOS-16752 build: update mercury to 2.4.0 (DAOS-16752 build: update mercury to 2.4.0 #15413)
- DAOS-16670 test: container/multiple_delete.py - Increase SCM leftover… (DAOS-16670 test: container/multiple_delete.py - Increase SCM leftover… #15420)
- DAOS-16781 client: Allow daos_metrics read via pid (DAOS-16781 client: Allow daos_metrics read via pid #15448)
- DAOS-623 build: Add arm path when looking for fuse (DAOS-623 build: Add arm path when looking for fuse #15467)
- DAOS-16386 utils: Add DDB Feature and RM_POOL Support (DAOS-16386 utils: Add DDB Feature and RM_POOL Support #15062)
- DAOS-16736 dfuse: Add a common struct for active IE data. (DAOS-16736 dfuse: Add a common struct for active IE data. #15362)
- DAOS-16765 test: pool/verify_space.py - Increase timeout (DAOS-16765 test: pool/verify_space.py - Increase timeout #15453)
- DAOS-16797 build: Create 2.8 TB1 (DAOS-16797 build: Create 2.8 TB1 #15479)
- DAOS-7203 control: Add histogram support to Prometheus exporter (DAOS-7203 control: Add histogram support to Prometheus exporter #5382)
- DAOS-16374 vos: integer overflow on evt recx trace (DAOS-16374 vos: integer overflow on evt recx trace #15439)
- DAOS-16327 control: Update dmg storage query usage for MD-on-SSD P2 (DAOS-16327 control: Update dmg storage query usage for MD-on-SSD P2 #15418)
- DAOS-16749 vos: OI iterator for phase2 pool (DAOS-16749 vos: OI iterator for phase2 pool #15465)
- DAOS-16713 vos: initialize checkpoint stats (DAOS-16713 vos: initialize checkpoint stats #15454)
- DAOS-16791 control: Add include_fabric_ifaces to agent config (DAOS-16791 control: Add include_fabric_ifaces to agent config #15470)
- DAOS-16365 client: use D_ASPRINTF and fix format issues (DAOS-16365 client: use D_ASPRINTF and fix format issues #15463)
- DAOS-16276 doc: Address engine unavailability (DAOS-16276 doc: Address engine unavailability #15456)
- DAOS-16635 control: Pass meta_sz to pool extend+reintegrate API (DAOS-16635 control: Pass meta_sz to pool extend+reintegrate API #15459)
- DAOS-16662 test: update some tests to use unique dfuse mount (DAOS-16662 test: update some tests to use unique dfuse mount #15242)
- DAOS-16787 utils: Suppress NLT valgrind false positives (DAOS-16787 utils: Suppress NLT valgrind false positives #15478)
We added two metrics around the quota code but they never made it into master branch. Signed-off-by: Jeff Olivier <jeffolivier@google.com>
- Change cart_ctl to no longer fail if pre-ping fails in daos env - Set individual cmd pings to be 3 seconds - Change cart_ctl invocation to use 'no sync' option, avoiding pinging all ranks, including possibly dead ones Signed-off-by: Alexander A Oganezov <alexander.a.oganezov@intel.com>
wenbaoxu@163.com found that pool_svc_rfcheck_ult may enter an infinite loop due to -DER_NOTLEADERs when the PS leader is stepping down: src/rdb/rdb_raft.c:3146 rdb_raft_wait_applied() 564e1a52[1]: waiting for entry 0 to be applied src/container/srv_container.c:5050 ds_cont_rdb_iterate() 564e1a52 container iter: rc -2008 src/pool/srv_pool.c:6405 pool_svc_rfcheck_ult() 564e1a52 check rf with -2008 and retry [...] This patch adds a check for whether the "sched" is canceled in pool_svc_rfcheck_ult before sleeping and retrying, and fixes a nonsense dss_sleep(0) call. Signed-off-by: Li Wei <wei.g.li@intel.com>
1. Send DTX batched commit RPCs step by step Currently, for each DTX batched commit operation, it will handle at most 512 DTX entries that may generate DTX commit RPCs to thousands of DAOS targets. We will not send out the batched RPCs all together, instead, we will send them step by step. After each step, the logic will yield and wait until replied, and then next batched RPCs. That can avoid holding too much system resources for relative long time. It is also helpful to reduce the whole system network peak load and the pressure on related targets. 2. Cleanup stale DTX based on global RPC timeout Originally, DTX cleanup logic will be triggered if the life for some stale DTX exceeds the fixed threshold DTX_CLEANUP_THD_AGE_UP (90 sec) that maybe smaller than global default RPC timeout, as to related DTX refresh RPC for cleanup logic maybe send out too early before related modification RPC(s) timeout. It increases network load unnecessarily. Then we adjust the DTX cleanup threshold based on global default RPC timeout value, and give related DTX leader sometime after default RPC timeout to commit or abort the DTX. If the DTX is still prepared after that, then trigger DTX cleanup to handle potential stale DTX entries. 3. Reorg DTX CoS logic Reduce the RPCs caused by potential repeated DTX commit. More clear names for DTX CoS API. Signed-off-by: Fan Yong <fan.yong@intel.com>
A bug was limiting the number of ranks shown to the number of MS replicas instead. - Show all rank fabric URIs. - Added MS ranks list to sys info. - Add MS ranks to daos system query output. Signed-off-by: Kris Jacque <kris.jacque@intel.com> Co-authored-by: Cedric Koch-Hofer <94527853+knard-intel@users.noreply.github.com>
add MPICH and IMPI infos on how to set daos: prefix Signed-off-by: Michael Hennecke <michael.hennecke@intel.com>
Makes it a little easier to query the layout Signed-off-by: Jeff Olivier <jeffolivier@google.com>
Add support for a new daos fs chmod command to adjust the mode of a file. Signed-off-by: Colin Howes <chowes@google.com> Signed-off-by: Jeff Olivier <jeffolivier@google.com> Co-authored-by: Colin Howes <chowes@google.com>
This patch addresses a corner case where a leadership change may occur while a pool is in the Destroying state. During checkPools, if the pool is in a Destroying state, the MS will attempt to destroy the pool. The top level PoolDestroy method attempted to wait for leadership step-up to finish, but was needed in order for step-up to complete. It is okay to skip the leadership check for PoolDestroy in this case. Leadership was already checked at the beginning of checkPools. Signed-off-by: Kris Jacque <kris.jacque@intel.com>
Allow the control client to report a version other than the static value embedded at build time. Enables some nonstandard use cases where Control API users will take responsibility for version interoperability without code changes. Change-Id: I0ce6ddbf2c742ce9c8dab9e21e37cd5ea8c5f5b3 Signed-off-by: Michael MacDonald <mjmac@google.com>
update pylint to 3.2.6 Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>
merge yamllint and clang-format into linting workflow so all lint checks are grouped together. Make yaml-lint required but clang-format optional until stable. Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>
use math.inf to disable threshold checks in CI so the test can run as a smoke test Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>
Get the targets and ranks from the config instead of hardcoding. Also use self.random instead of direct random. Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>
Try to build with the SHA under test if possible in the dfuse/daos_build.py test. Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>
Perform basic system health checks from the client perspective. Checks the following: * Client/Server versions * Key library versions and paths * Connected sytem information * Pool status for all pools to which the user has access * Container status for all containers in the checked pools Change-Id: I9154ee7f3632996e0e67ad6f320874e1df2e0d23 Signed-off-by: Michael MacDonald <mjmac@google.com>
Config generate should not output auto calculated values update ftest to not expect scm_size in config generate yaml output Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
Ignore NotReplica errors when checking if enabled. Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
For VOS command test, add a visible iterator option so one can manually verify that VOS iterates correctly for that case. Signed-off-by: Jeff Olivier <jeffolivier@google.com>
Added some more Go runtime suppressions. I've added counterparts for all functions already on the list, as well as the false positives that have been recently spotted. Signed-off-by: Kris Jacque <kris.jacque@intel.com>
#14736) After SysXS device is set to faulty, the device state transition may take a while until its state becomes EVICTED. Wait for 10 sec before querying the device state. If the state isn't EVICTED, query again. Also set only one SysXS device faulty because there's no point of setting second SysXS device to faulty. (based on the developer feedback). Check whether any of the engines is down at the end of the test Signed-off-by: Makito Kano <makito.kano@intel.com>
…#14819) For EC object rebuild, some ext not exist on some shards, for this case create VOS container when no record need to be rebuilt. To avoid following IO cannot find container and fail at obj_ioc_init() -> cont_child_lookup(). Another case is in cont_snap_update_one() create the vos cont if non-exist. Signed-off-by: Xuezhao Liu <xuezhao.liu@intel.com>
Update the current run_local() command to return an object similar to run_remote() to allow them to be used interchangeably. increase verify_perms.py timeout. Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
Running `(daos|dmg) pool query-targets` with just a rank argument should query all targets on that rank. Signed-off-by: Michael MacDonald <mjmac@google.com>
When the LRU cache is performing eviction, new lookups should fail. Currently, this logic is implemented on the caller’s side. Let's move this logic to the DAOS LRU side to return DER_SHUTDOWN if LRU is evicting and remove incorrect assertion. Signed-off-by: Wang Shilong <shilong.wang@intel.com>
Signed-off-by: Lei Huang <lei.huang@intel.com>
Correct protobuf field names to be consistent within the pool.proto file and remove meta-blob size references in MD-on-SSD phase-1 code. Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
- Instead of using protobuf map, represent the map as an array where the index == NUMA node ID. - Always include domain in NUMA fabric map. Previously domain was missing when it was the same as the interface name. Signed-off-by: Kris Jacque <kris.jacque@intel.com>
When the change was made to allow partial overwrite for rebuild, it broke delete such that delete would remove the newest extent rather than the exact one we requested. Also, don't ignore errors when processing removals Signed-off-by: Jeff Olivier <jeffolivier@google.com>
* DAOS-13701: Memory bucket allocator API definition (#13152) - New umem macros are exported to do the allocation within memory bucket. umem internally now calls the modified backend allocator routines with memory bucket id passed as argument. - umem_get_mb_evictable() and dav_get_zone_evictable() are added to support allocator returning preferred zone to be used as evictable memory bucket for current allocations. Right now these routines always return zero. - The dav heap runtime is cleaned up to make provision for memory bucket implementation. * DAOS-13703 umem: umem cache APIs for phase II (#13138) Four sets of umem cache APIs will be exported for md-on-ssd phase II: 1. Cache initialization & finalization - umem_cache_alloc() - umem_cache_free() 2. Cache map, load and pin - umem_cache_map(); - umem_cache_load(); - umem_cache_pin(); - umem_cache_unpin(); 3. Offset and memory address converting - umem_cache_off2ptr(); - umem_cache_ptr2off(); 4. Misc - umem_cache_commit(); - umem_cache_reserve(); * DAOS-14491: Retain support for phase-1 DAV heap (#13158) The phase-2 DAV allocator is placed under the subdirectory src/common/dav_v2. This allocator is built as a standalone shared library and linked to the libdaos_common_pmem library. The umem will now support one more mode DAOS_MD_BMEM_V2. Setting this mode in umem instance will result in using phase-2 DAV allocator interfaces. * DAOS-15681 bio: store scm_sz in SMD (#14330) In md-on-ssd phase 2, the scm_sz (VOS file size) could be smaller than the meta_sz (meta blob size), then we need to store an extra scm_sz in SMD, so that on engine start, this scm_sz could be retrieved from SMD for VOS file re-creation. To make the SMD compatible with pmem & md-on-ssd phase 1, a new table named "meta_pool_ex" is introduced for storing scm_sz. * DAOS-14422 control: Update pool create UX for MD-on-SSD phase2 (#14740) Show MD-on-SSD specific output on pool create and add new syntax to specify ratio between SSD capacity reserved for MD in new DAOS pool and the (static) size of memory reserved for MD in the form of VOS index files (previously held on SCM but now in tmpfs on ramdisk). Memory-file size is now printed when creating a pool in MD-on--SSD mode. The new --{meta,data}-size params can be specified in decimal or binary units e.g. GB or GiB and refer to per-rank allocations. These manual size parameters are only for advanced use cases and in most situations the --size (X%|XTB|XTiB) syntax is recommended when creating a pool. --meta-size param is bytes to use for metadata on SSD and --data-size is for data on SSD (similar to --nvme-size). The new --mem-ratio param is specified as a percentage with up to two decimal places precision. This defines the proportion of the metadata capacity reserved on SSD (i.e. --meta-size) that will be used when allocating the VOS-index (one blob and one memory file per target). Enable MD-on-SSD phase2 pool creation requires envar DAOS_MD_ON_SSD_MODE=3 to be set in server config file. * DAOS-14317 vos: initial changes for the phase2 object pre-load (#15001) - Introduced new durable format 'vos_obj_p2_df' for the md-on-ssd phase2 object, at most 4 evict-able bucket IDs could be stored. - Changed vos_obj_hold() & vos_obj_release() to pin or unpin object respectively. - Changed the private data of VOS dkey/akey/value trees from 'vos_pool' to 'vos_object', the private data will be used for allocating/reserving from the evict-able bucket. - Move the vos_obj_hold() call from vos_update_end() to vos_update_begin() for the phase2 pool, reserve value from the object evict-able bucket. * DAOS-14316 vos: object preload for GC (#15059) - Use the reserved vos_gc_item.it_args to store 2 bucket IDs for GC_OBJ, GC_DKEY and GC_AKEY, so that GC drain will be able to tell the what buckets need be pinned by looking up bucket numbers stored in vos_obj_df. - Once GC drain needs to pin a different bucket, it will have to commit current tx; unpin current bucket; pin required bucket; start new tx; - Forge a dummy object as the private data for the btree opened by GC, so that the 'ti_destroy' hack could be removed. - Store evict-able bucket ID persistently for newly created object, this was missed in prior PR. * DAOS-14315 vos: Pin objects for DTX commit & CPD RPC (#15118) Introduced two new VOS APIs vos_pin_objects() & vos_unpin_objects() for pin or unpin objects. Changed DTX commit/abort & CPD RPC handler code to ensure objects pinned before starting local transaction. - Bug fix in vos_pmemobj_create(), the actual scm_size should be passed to bio_mc_create(). - Use vos_obj_acquire() instead of vos_obj_hold() in vos_update_begin() to avoid the complication of object ilog adding in ts_set. We could simplify it in future cleanup PRs. - Handle concurrent object bucket alloting & loading. * DAOS-16160 control: Update pool create --size % opt for MD-on-SSD p2 (#14957) Update calculation of usable pool META and DATA component sizes for MD-on-SSD phase-2 mode; when meta-blob-size > vos-file-size. - Use mem-ratio when making NVMe size adjustments to calculate usable pool capacity from raw stats. - Use mem-ratio when auto-sizing to determine META component from percentage of usable rank-RAM-disk capacity. - Apportion cluster count reductions to SSDs based on number of assigned targets to take account of target striping across a tier. - Fix pool query ftest. - Improve test coverage for meta and rdb size calculations. * DAOS-16763 common: Tunable to control max NEMB (#15422) A new tunable, DAOS_MD_ON_SSD_NEMB_PCT is introuced, to define the percentage of memory cache that non-evictable memory buckets can expand to. This tunable will be read during pool creation and persisted, ensuring that each time the pool is reopened, it retains the value set during its creation. Signed-off-by: Niu Yawei <yawei.niu@intel.com> Signed-off-by: Tom Nabarro <tom.nabarro@intel.com> Signed-off-by: Sherin T George <sherin-t.george@hpe.com> Co-authored-by: Tom Nabarro <tom.nabarro@intel.com> Co-authored-by: sherintg <sherin-t.george@hpe.com>
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Invalid hole extent might be left by process_hole_ult(), so let's skip it. Signed-off-by: Di Wang <ddiwang@google.com>
This only serves to add confusion at this point. Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Signed-off-by: Jerome Soumagne <jerome.soumagne@intel.com>
Enable write access to the Security section of the Github project Use GHA cache to avoid Trivy scan failures due to overuse of CVEs database results in database download failure Upgrade trivy-action to version 0.28.0 where the caching mechanism is enabled by default. Enable the debug option in Trivy to be prepared for detailed scan failure analysis Signed-off-by: Tomasz Gromadzki <tomasz.gromadzki@intel.com>
#14932) libfabric loads libze_loader.so which calls zeInit(). We observed deadlock due to nested calls when daos_init() is called inside zeInit(). We intercept dlsym() and zeInit() to avoid calling daos_init() inside zeInit(). dlsym(RTLD_NEXT, ) checks returning address to determine caller's module. To maintain expected behavior of dlsym(RTLD_NEXT, ) with our interception, new_dlsym() is implemented with assembly code to use jmp instruction instead of call. dlsym() has been moved from libdl.so to libc.so since version 2.34. Signed-off-by: Lei Huang <lei.huang@intel.com>
The patch contains the following improvements: 1. When VOS level logic returns -DER_TX_RESATRT, the object level RPC handler should set 'RESEND' flag then restart the transaction with newer epoch. Because dtx_abort() logic cannot guarantee all former prepared DTX entries (on all related participants) can be aborted, especially if the former one failed for some network trouble, that may cause restarted transaction hit -DER_TX_ID_REUSED unexpectedly. 2. Compare the epoch for DTX entries with the same transaction ID for distinguishing potential reused TX ID more accurately. 3. Add DTX entry into DTX CoS cache if cannot commit it synchronously. Then subsequent batched commit logic can handle it. 4. If server complains suspected TX ID reusing, then reports -EIO to related application instead of assertion on client. 5. Control DTX related warning message frequency to avoid log flood. 6. Collect more information when generate some error/warning message. Signed-off-by: Fan Yong <fan.yong@intel.com>
Signed-off-by: Jerome Soumagne <jerome.soumagne@intel.com>
#15420) The object placement algorithm was changed by DAOS-16445. As a result, data are written to targets more uniformly while the amount of leftover data after container destroy/garbage collection in each target remains the same. i.e., Data are written to more targets while the cleanup method in each target hasn't been improved, which results in higher aggregate leftover data. To handle larger amount of leftover data in SCM, increase the threshold to 1.5MB. Signed-off-by: Makito Kano <makito.kano@intel.com>
In cases where the client telemetry has been manually enabled, daos_metrics should be able to read it as long as the client's PID is known and the user has read access to the shared memory segment. Moves the daos_metrics utility into the common daos package for use from both server and client sides. Signed-off-by: Michael MacDonald <mjmac@google.com>
When STATIC_FUSE=1 is set, the arm build on ubuntu fails because it can't find libfuse3.a. Just add the expected path to the list of search paths. Signed-off-by: Jeff Olivier <jeffolivier@google.com>
This PR enhances the DDB functionality for CR purposes with the following updates: 1. Pool Behavior Control: Administrators can now control certain vos pool behaviors, such as skipping vos pool loading or setting a vos pool to immutable mode. 2. Manual Pool Shard Removal: A new command ddb rm_pool <vos_pool> has been introduced, allowing administrators to manually remove pool shards. 3. SPDK Environment Initialization Bug Fix: Fixed an issue where spdk_env_init() would fail during reinitialization. These updates aim to improve system flexibility and stability, providing administrators with more robust management capabilities. Signed-off-by: Wang Shilong <shilong.wang@intel.com>
Create a active_inode struct and allocate it for all inodes which have more than one open handle. This allows us to share state/caching data across open handles easier and to better support concurrent readers. Future work here will improve performance for concurrent readers when caching is used, and allow us to make the in-memory inode struct smaller which will save memory. Signed-off-by: Ashley Pittman ashley.m.pittman@intel.com
In the past few passing runs, the test had ~100 sec test time remaining at the end with 600 sec timeout. This means the test usually takes ~500 sec. Set the timeout to normal test duration * 1.5 = 750 sec Signed-off-by: Makito Kano <makito.kano@intel.com>
Bump version to 2.7.101 faults-enabled: false Signed-off-by: Phil Henderson <phillip.henderson@intel.com>
Update the Prometheus exporter to support passthrough histograms from native DAOS telemetry format. Fixes a few bugs and inefficiencies in the native histogram implementation. Signed-off-by: Michael MacDonald <mjmac@google.com>
The evt recx trace is used for vos aggregation debugging, and it's currently reset on akey iteration callback, but the akey iteration callback could be skipped in some cases, for example, when evt aggregation hit an aborted recx, it'll start over in evtree level without the recx trace reset, that could lead to integer overflow on the 'int ap_trace_count'. This patch moved the ap_trace_count reset to merge window open/close to ensure the evt recx trace always being reset properly. Signed-off-by: Niu Yawei <yawei.niu@intel.com>
…15418) Update dmg storage query usage for MD-on-SSD P2 Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
To minimize bucket eviction/load when iterating objects, vos_iterate_obj() is introduced to iterate objects in bucket ID order instead of OI order. The caller of vos_iterate_obj() needs to provide a filter callback to call the vos_bkt_iter_skip() properly. Applied the vos_iterate_obj() for EC & VOS aggregation. Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Initialize checkpoint stats to zero. Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Provide an inverse to the existing exclude_fabric_ifaces directive. In some cases, a given environment will only have a small number of valid interfaces, so it is simpler to specify that rather than having to exclude all of the invalid interfaces. Signed-off-by: Michael MacDonald <mjmac@google.com>
Fix a few minor remaining issues in previous PR intercepting dlsym and zeInit. Use D_ASPRINTF when possible Remove unneeded newline, debugging output, and parentheses. Signed-off-by: Lei Huang <lei.huang@intel.com>
Add a section on handling unavailable engines. Signed-off-by: Li Wei <wei.g.li@intel.com>
store mem-ratio in control-plane pool-service remove unused tgt_dev param from extend create reintegrate and create ranks ds api bump DAOS_MGMT_VERSION 3->4 Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
Update some tests to use unique dfuse mount directory by letting the framework generate one. Remove mount_dir from run_ior_multiple_variants since it is no longer needed and this level of fine control should be handled per test ideally. Signed-off-by: Dalton Bohning <dalton.bohning@intel.com>
* Add a suppression for Go runtime function racefuncenter. * Add suppression for rt0_go CGo malloc Signed-off-by: Kris Jacque <kris.jacque@intel.com>
Errors are component not formatted correctly,Ticket number prefix incorrect,PR title is malformatted. See https://daosio.atlassian.net/wiki/spaces/DC/pages/11133911069/Commit+Comments,Unable to load ticket data |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.