Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-16160 control: Update pool create --size % opt for MD-on-SSD p2 #14957

Merged

Conversation

tanabarr
Copy link
Contributor

@tanabarr tanabarr commented Aug 19, 2024

Update calculation of usable pool META and DATA component sizes for
MD-on-SSD phase-2 mode; when meta-blob-size > vos-file-size.

  • Use mem-ratio when making NVMe size adjustments to calculate usable
    pool capacity from raw stats.
  • Use mem-ratio when auto-sizing to determine META component from
    percentage of usable rank-RAM-disk capacity.
  • Apportion cluster count reductions to SSDs based on number of
    assigned targets to take account of target striping across a tier.

Required-githooks: true

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate owners.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

@tanabarr tanabarr self-assigned this Aug 19, 2024
Copy link

github-actions bot commented Aug 19, 2024

Ticket title is 'Correctly implement dmg pool create --size option for MD-on-SSD phase-II'
Status is 'In Review'
Labels: 'md_on_ssd2,usability'
https://daosio.atlassian.net/browse/DAOS-16160

@tanabarr tanabarr added control-plane work on the management infrastructure of the DAOS Control Plane meta-on-ssd Metadata on SSD Feature go Pull requests that update Go code usability Changes specific to user facing tools or behaviour. labels Aug 19, 2024
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14957/1/execution/node/1551/log

@tanabarr tanabarr force-pushed the tanabarr/control-display-poolquery-mdonssd branch 3 times, most recently from 2bc9a2d to 778c787 Compare August 21, 2024 22:11
Base automatically changed from tanabarr/control-display-poolquery-mdonssd to feature/vos_on_blob_p2 September 8, 2024 18:12
@tanabarr tanabarr force-pushed the tanabarr/control-size-poolcreate-mdonssd branch from 8748182 to 33fc7e5 Compare September 20, 2024 16:39
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14957/3/execution/node/1508/log

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14957/3/execution/node/1492/log

@tanabarr
Copy link
Contributor Author

tanabarr commented Sep 22, 2024

The following is with a single host with dual engines where bdev roles META and DATA are not shared. Two pools are created with VOS index file size equal to half the meta-blob size (--mem-ratio 50%). Both pools use roughly half the original capacity available (first using 50% and the second 100% of the remainder).

Rough calculations: dmg storage scan shows that for each rank, one 800GB SSD is assigned for each tier (first: WAL+META, second: DATA). df -h /mnt/daos* reports usable ramdisk capacity for each rank is 66GiB.

  • Expected Data storage would then be 400GB for a 50% capacity first pool and 100% capacity second pool per-rank.
  • Expected Meta storage at 50% mem-ratio would be 66GiB*2 = 132GiB == 141GB giving ~70GB for 50% first and 100% second pools.
  • Expected Memory file size (aggregated) is 66GiB/2 = 35GB for 50% first and 100% second pools.

Note that the "Total memory-file size: 140 GB" reported is incorrect in dmg pool query output and it should instead report 70GB per-pool. This should be addressed by https://daosio.atlassian.net/browse/DAOS-16209.

$ dmg pool create bob --size 50% --mem-ratio 50%

Pool created with 14.86%,85.14% storage tier ratio
--------------------------------------------------
  UUID             : 47060d94-c689-4981-8c89-011beb063f8f
  Service Leader   : 0
  Service Ranks    : [0-1]
  Storage Ranks    : [0-1]
  Total Size       : 940 GB
  Metadata Storage : 140 GB (70 GB / rank)
  Data Storage     : 800 GB (400 GB / rank)
  Memory File Size : 70 GB (35 GB / rank)

$ dmg pool create bob2 --size 100% --mem-ratio 50%

Pool created with 14.47%,85.53% storage tier ratio
--------------------------------------------------
  UUID             : bdbef091-f0f8-411d-8995-f91c4efc690f
  Service Leader   : 1
  Service Ranks    : [0-1]
  Storage Ranks    : [0-1]
  Total Size       : 935 GB
  Metadata Storage : 135 GB (68 GB / rank)
  Data Storage     : 800 GB (400 GB / rank)
  Memory File Size : 68 GB (34 GB / rank)

$ dmg pool query bob

Pool 47060d94-c689-4981-8c89-011beb063f8f, ntarget=32, disabled=0, leader=0, version=1, state=Ready
Pool health info:
- Rebuild idle, 0 objs, 0 recs
Pool space info:
- Target count:32
- Total memory-file size: 140 GB
- Metadata storage:
  Total size: 140 GB
  Free: 131 GB, min:4.1 GB, max:4.1 GB, mean:4.1 GB
- Data storage:
  Total size: 800 GB
  Free: 799 GB, min:25 GB, max:25 GB, mean:25 GB

$ dmg pool query bob2

Pool bdbef091-f0f8-411d-8995-f91c4efc690f, ntarget=32, disabled=0, leader=1, version=1, state=Ready
Pool health info:
- Rebuild idle, 0 objs, 0 recs
Pool space info:
- Target count:32
- Total memory-file size: 135 GB
- Metadata storage:
  Total size: 135 GB
  Free: 127 GB, min:4.0 GB, max:4.0 GB, mean:4.0 GB
- Data storage:
  Total size: 800 GB
  Free: 799 GB, min:25 GB, max:25 GB, mean:25 GB

Next is with a single host with dual engines where bdev roles WAL, META and DATA are shared.

Single pool with VOS index file size equal to the meta-blob size (--mem-ratio 100%).

$ dmg pool create bob --size 100% --mem-ratio 100%

Pool created with 5.93%,94.07% storage tier ratio
-------------------------------------------------
  UUID             : bad54f1d-8976-428b-a5dd-243372dfa65c
  Service Leader   : 1
  Service Ranks    : [0-1]
  Storage Ranks    : [0-1]
  Total Size       : 2.4 TB
  Metadata Storage : 140 GB (70 GB / rank)
  Data Storage     : 2.2 TB (1.1 TB / rank)
  Memory File Size : 140 GB (70 GB / rank)

Rough calculations: 1.2TB of usable space is returned from storage scan and because roles are shared required META (70GB) is reserved so only 1.1TB is provided for data.

Logging shows:

DEBUG 2024/09/24 15:44:38.554431 pool.go:1139: added smd device c7da7391-9077-4eb6-9f4a-a3d656166236 (rank 1, ctrlr 0000:d8:00.0, roles "data,meta,wal") as usable: device state="NORMAL", smd-size 623 GB (623307128832), ctrlr-total-free 623 GB (623307128832)
DEBUG 2024/09/24 15:44:38.554516 pool.go:1139: added smd device 18c7bf45-7586-49ba-93c0-cbc08caed901 (rank 1, ctrlr 0000:d9:00.0, roles "data,meta,wal") as usable: device state="NORMAL", smd-size 554 GB (554050781184), ctrlr-total-free 1.2 TB (1177357910016)
DEBUG 2024/09/24 15:44:38.554603 pool.go:1246: based on minimum available ramdisk capacity of 70 GB and mem-ratio 1.00 with 70 GB of reserved metadata capacity, the maximum per-rank sizes for a pool are META=70 GB (69792169984 B) DATA=1.1 TB (1107565740032 B)

Now the same as above but with a single pool with VOS index file size equal to a quarter of the meta-blob size (--mem-ratio 25%).

$ dmg pool create bob --size 100% --mem-ratio 25%

Pool created with 23.71%,76.29% storage tier ratio
--------------------------------------------------
  UUID             : 999ecf55-474e-4476-9f90-0b4c754d4619
  Service Leader   : 0
  Service Ranks    : [0-1]
  Storage Ranks    : [0-1]
  Total Size       : 2.4 TB
  Metadata Storage : 558 GB (279 GB / rank)
  Data Storage     : 1.8 TB (898 GB / rank)
  Memory File Size : 140 GB (70 GB / rank)

Rough calculations: 1.2TB of usable space is returned from storage scan and because roles are shared required META (279GB) is reserved so only ~900GB is provided for data.

Logging shows:

DEBUG 2024/09/24 16:16:00.172719 pool.go:1246: based on minimum available ramdisk capacity of 70 GB and mem-ratio 0.25 with 279 GB of reserved metadata capacity, the maximum per-rank sizes for a pool are META=279 GB (279168679936 B) DATA=898 GB (898189230080 B)

Now with 6 ranks and a single pool with VOS index file size equal to a half of the meta-blob size (--mem-ratio 50%).

$ dmg pool create bob --size 100% --mem-ratio 50%

Pool created with 11.86%,88.14% storage tier ratio
--------------------------------------------------
  UUID             : 4fa38199-23a9-4b4d-aa9a-8b9838cad1d6
  Service Leader   : 1
  Service Ranks    : [0-2,4-5]
  Storage Ranks    : [0-5]
  Total Size       : 7.1 TB
  Metadata Storage : 838 GB (140 GB / rank)
  Data Storage     : 6.2 TB (1.0 TB / rank)
  Memory File Size : 419 GB (70 GB / rank)

Rough calculations: 1177 GB of usable space is returned from storage scan and because roles are shared required META (140 GB) is reserved so only 1037 GB is provided for data (per-rank).

Logging shows:

DEBUG 2024/09/24 16:40:41.570331 pool.go:1139: added smd device c921c7b9-5f5c-4332-a878-0ebb8191c160 (rank 1, ctrlr 0000:d8:00.0, roles "data,meta,wal") as usable: device state="NORMAL", smd-size 623 GB (623307128832), ctrlr-total-free 623 GB (623307128832)
DEBUG 2024/09/24 16:40:41.570447 pool.go:1139: added smd device a071c3cf-5de1-4911-8549-8c5e8f550554 (rank 1, ctrlr 0000:d9:00.0, roles "data,meta,wal") as usable: device state="NORMAL", smd-size 554 GB (554050781184), ctrlr-total-free 1.2 TB (1177357910016)
DEBUG 2024/09/24 16:40:41.570549 pool.go:1246: based on minimum available ramdisk capacity of 70 GB and mem-ratio 0.50 with 140 GB of reserved metadata capacity, the maximum per-rank sizes for a pool are META=140 GB (139584339968 B) DATA=1.0 TB (1037773570048 B)

@tanabarr tanabarr force-pushed the tanabarr/control-size-poolcreate-mdonssd branch from 33fc7e5 to 94b9cf6 Compare September 23, 2024 16:48
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14957/4/execution/node/1416/log

@tanabarr tanabarr force-pushed the tanabarr/control-size-poolcreate-mdonssd branch from 94b9cf6 to 2cd0529 Compare September 24, 2024 14:05
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14957/5/execution/node/1511/log

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14957/5/testReport/

Features: pool control
Required-githooks: true

Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
@tanabarr tanabarr force-pushed the tanabarr/control-size-poolcreate-mdonssd branch from 2cd0529 to 0b46a05 Compare September 25, 2024 15:26
@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14957/6/testReport/

@tanabarr tanabarr marked this pull request as ready for review September 25, 2024 19:46
@tanabarr tanabarr requested review from a team as code owners September 25, 2024 19:46
@tanabarr tanabarr removed request for a team September 25, 2024 19:46
@tanabarr tanabarr marked this pull request as ready for review September 30, 2024 14:10
Copy link
Contributor

@knard38 knard38 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

md_size = mp.GetUsableBytes() / uint64(ei.GetTargetCount())
metaBytes = mp.GetUsableBytes() / uint64(ei.GetTargetCount())
if memRatio > 0 {
metaBytes = uint64(float64(metaBytes) / float64(memRatio))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not fully understanding this part. Did you mean to multiply by memRatio rather than divide? Or is the intention to make metaBytes larger?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the intention is to use MemRatio fraction to project the effective meta-blob (per-target) by dividing the VOS-file size by the fraction. In MD-on-SSD phase-1 metaBytes == scmBytes (VOS-file size) . I will add a comment in the subsequent PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, understood. Thanks for the explanation. I think a comment in this area will be helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment added

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14957/8/execution/node/1510/log

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14957/8/execution/node/1478/log

@tanabarr tanabarr requested a review from kjacque October 1, 2024 09:55
Copy link
Contributor

@kjacque kjacque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No other issues noted on my side.

@tanabarr tanabarr requested a review from kjacque October 2, 2024 10:54
Required-githooks: true

Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
Features: control pool
Required-githooks: true

Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
…abarr/control-size-poolcreate-mdonssd

Features: pool control
Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
Features: control pool
Required-githooks: true

Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
@tanabarr tanabarr requested review from a team as code owners October 7, 2024 12:37
@tanabarr
Copy link
Contributor Author

tanabarr commented Oct 7, 2024

Increased test coverage with MD-on-SSD tests for meta/rdb size adjustments/computation and mem-ratio fraction case:

    --- PASS: TestServer_bdevScan/scan_remote;_bdev_with_md-on-ssd_roles_in_config;_no_health_flag (0.01s)
    --- PASS: TestServer_bdevScan/scan_remote;_bdev_with_md-on-ssd_roles_in_config;_no_md_info_in_smd_devs (0.01s)
    --- PASS: TestServer_bdevScan/scan_remote;_bdev_with_md-on-ssd_roles_in_config;_no_meta_flag (0.01s)
    --- PASS: TestServer_bdevScan/scan_remote;_bdev_with_md-on-ssd_roles_in_config;_no_request_flags;_adjustments_skipped (0.00s)
    --- PASS: TestServer_bdevScan/scan_remote;_bdev_with_md-on-ssd_roles_in_config;_nvme_capacity_adjusted (0.01s)
    --- PASS: TestServer_bdevScan/scan_remote;_bdev_with_md-on-ssd_roles_in_config;_phase-2_scan_(mem-ratio_in_req) (0.01s)
    --- PASS: TestServer_bdevScan/scan_remote;_bdev_with_md-on-ssd_roles_in_config;_separate_data_role (0.01s)
    --- PASS: TestServer_bdevScan/scan_remote;_bdevs_in_config (0.01s)
    --- PASS: TestServer_bdevScan/scan_remote;_bdevs_in_config;_adjustment_skipped_as_no_meta_flag_in_req (0.01s)
    --- PASS: TestServer_bdevScan/scan_remote;_bdevs_in_config;_missing_mount_in_config (0.00s)
    --- PASS: TestServer_bdevScan/scan_remote;_bdevs_in_config;_vmd_enabled (0.00s)
    --- PASS: TestServer_bdevScan/scan_remote;_bdevs_in_config;_zero_namespaces (0.00s)
    --- PASS: TestServer_bdevScan/scan_remote;_both_engine_scans_fail (0.00s)
    --- PASS: TestServer_bdevScan/scan_remote;_collate_results_from_multiple_engines (0.02s)
    --- PASS: TestServer_bdevScan/scan_remote;_filter_results_based_on_request_basic_flag (0.00s)
    --- PASS: TestServer_bdevScan/scan_remote;_partial_results_with_one_failed_engine_scan (0.02s)

Copy link
Contributor

@daltonbohning daltonbohning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ftest LGTM

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14957/9/execution/node/1161/log

Features: control pool
Required-githooks: true

Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14957/10/execution/node/1185/log

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14957/11/execution/node/1514/log

@tanabarr
Copy link
Contributor Author

CI failed due to the following issues:

…abarr/control-size-poolcreate-mdonssd

Test-tag: control pool pr
Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
…abarr/control-size-poolcreate-mdonssd

Test-tag: control pool pr
Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
@daosbuild1
Copy link
Collaborator

Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14957/12/testReport/

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14957/13/testReport/

@tanabarr
Copy link
Contributor Author

ListVerbose dmg tests that are failing with this change will be addressed in https://daosio.atlassian.net/browse/DAOS-16328

There is a bug in how the list of device target identifiers are returned for SMD devices on storage scan, this results in some under-subscription in pool create auto mode. this inaccuracy will be addressed in https://daosio.atlassian.net/browse/DAOS-16327

@tanabarr tanabarr merged commit 1dc5250 into feature/vos_on_blob_p2 Oct 16, 2024
56 of 58 checks passed
@tanabarr tanabarr deleted the tanabarr/control-size-poolcreate-mdonssd branch October 16, 2024 21:12
gnailzenh pushed a commit that referenced this pull request Nov 4, 2024
* DAOS-13701: Memory bucket allocator API definition (#13152)

- New umem macros are exported to do the allocation within
  memory bucket. umem internally now calls the modified backend
  allocator routines with memory bucket id passed as argument.
- umem_get_mb_evictable() and dav_get_zone_evictable() are
  added to support allocator returning preferred zone to be
  used as evictable memory bucket for current allocations. Right
  now these routines always return zero.
- The dav heap runtime is cleaned up to make provision for
  memory bucket implementation.

* DAOS-13703 umem: umem cache APIs for phase II (#13138)

Four sets of umem cache APIs will be exported for md-on-ssd phase II:

1. Cache initialization & finalization
   - umem_cache_alloc()
   - umem_cache_free()

2. Cache map, load and pin
   - umem_cache_map();
   - umem_cache_load();
   - umem_cache_pin();
   - umem_cache_unpin();

3. Offset and memory address converting
   - umem_cache_off2ptr();
   - umem_cache_ptr2off();
  
4. Misc
   - umem_cache_commit();
   - umem_cache_reserve();

* DAOS-14491: Retain support for phase-1 DAV heap (#13158)

The phase-2 DAV allocator is placed under the subdirectory
src/common/dav_v2. This allocator is built as a standalone shared
library and linked to the libdaos_common_pmem library. 
The umem will now support one more mode DAOS_MD_BMEM_V2. Setting
this mode in umem instance will result in using phase-2 DAV allocator
interfaces.
  
* DAOS-15681 bio: store scm_sz in SMD (#14330)

In md-on-ssd phase 2, the scm_sz (VOS file size) could be smaller
than the meta_sz (meta blob size), then we need to store an extra
scm_sz in SMD, so that on engine start, this scm_sz could be
retrieved from SMD for VOS file re-creation.

To make the SMD compatible with pmem & md-on-ssd phase 1, a new
table named "meta_pool_ex" is introduced for storing scm_sz.

* DAOS-14422 control: Update pool create UX for MD-on-SSD phase2 (#14740)

Show MD-on-SSD specific output on pool create and add new syntax to
specify ratio between SSD capacity reserved for MD in new DAOS pool
and the (static) size of memory reserved for MD in the form of VOS
index files (previously held on SCM but now in tmpfs on ramdisk).
Memory-file size is now printed when creating a pool in MD-on--SSD
mode.

The new --{meta,data}-size params can be specified in decimal or
binary units e.g. GB or GiB and refer to per-rank allocations. These
manual size parameters are only for advanced use cases and in most
situations the --size (X%|XTB|XTiB) syntax is recommended when
creating a pool. --meta-size param is bytes to use for metadata on
SSD and --data-size is for data on SSD (similar to --nvme-size).

The new --mem-ratio param is specified as a percentage with up to two
decimal places precision. This defines the proportion of the metadata
capacity reserved on SSD (i.e. --meta-size) that will be used when
allocating the VOS-index (one blob and one memory file per target).

Enable MD-on-SSD phase2 pool creation requires envar
DAOS_MD_ON_SSD_MODE=3 to be set in server config file.

* DAOS-14317 vos: initial changes for the phase2 object pre-load (#15001)

- Introduced new durable format 'vos_obj_p2_df' for the md-on-ssd phase2
  object, at most 4 evict-able bucket IDs could be stored.

- Changed vos_obj_hold() & vos_obj_release() to pin or unpin object
  respectively.

- Changed the private data of VOS dkey/akey/value trees from 'vos_pool' to
  'vos_object', the private data will be used for allocating/reserving from
  the evict-able bucket.

- Move the vos_obj_hold() call from vos_update_end() to vos_update_begin()
  for the phase2 pool, reserve value from the object evict-able bucket.

* DAOS-14316 vos: object preload for GC (#15059)

- Use the reserved vos_gc_item.it_args to store 2 bucket IDs for
  GC_OBJ, GC_DKEY and GC_AKEY, so that GC drain will be able to tell the
  what buckets need be pinned by looking up bucket numbers stored in
  vos_obj_df.

- Once GC drain needs to pin a different bucket, it will have to commit
  current tx; unpin current bucket; pin required bucket; start new tx;

- Forge a dummy object as the private data for the btree opened by GC,
  so that the 'ti_destroy' hack could be removed.

- Store evict-able bucket ID persistently for newly created object, this
  was missed in prior PR.

* DAOS-14315 vos: Pin objects for DTX commit & CPD RPC (#15118)

Introduced two new VOS APIs vos_pin_objects() & vos_unpin_objects()
for pin or unpin objects. Changed DTX commit/abort & CPD RPC handler
code to ensure objects pinned before starting local transaction.

- Bug fix in vos_pmemobj_create(), the actual scm_size should be passed
   to bio_mc_create().
- Use vos_obj_acquire() instead of vos_obj_hold() in vos_update_begin() to
  avoid the complication of object ilog adding in ts_set. We could simplify it
  in future cleanup PRs.
- Handle concurrent object bucket alloting & loading.

* DAOS-16160 control: Update pool create --size % opt for MD-on-SSD p2 (#14957)

Update calculation of usable pool META and DATA component sizes for
MD-on-SSD phase-2 mode; when meta-blob-size > vos-file-size.

- Use mem-ratio when making NVMe size adjustments to calculate usable
  pool capacity from raw stats.
- Use mem-ratio when auto-sizing to determine META component from
  percentage of usable rank-RAM-disk capacity.
- Apportion cluster count reductions to SSDs based on number of
  assigned targets to take account of target striping across a tier.
- Fix pool query ftest.
- Improve test coverage for meta and rdb size calculations.

* DAOS-16763 common: Tunable to control max NEMB (#15422)

A new tunable, DAOS_MD_ON_SSD_NEMB_PCT is introuced, to define the
percentage of memory cache that non-evictable memory buckets can
expand to. This tunable will be read during pool creation and
persisted, ensuring that each time the pool is reopened,
it retains the value set during its creation.

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Signed-off-by: Tom Nabarro <tom.nabarro@intel.com>
Signed-off-by: Sherin T George <sherin-t.george@hpe.com>
Co-authored-by: Tom Nabarro <tom.nabarro@intel.com>
Co-authored-by: sherintg <sherin-t.george@hpe.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
control-plane work on the management infrastructure of the DAOS Control Plane go Pull requests that update Go code meta-on-ssd Metadata on SSD Feature usability Changes specific to user facing tools or behaviour.
Development

Successfully merging this pull request may close these issues.

5 participants