Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-15627 dtx: redunce stack usage for DTX resync to avoid overflow #14189

Merged
merged 1 commit into from
May 7, 2024

Conversation

Nasf-Fan
Copy link
Contributor

Dynamically allcated some large variables that are for DTX iteration and resync to avoid potential ULT stack overflow.

The patch also adjusts DTX re-index ULT stop logic to avoid being stopped by DTX resync too early before completed DTX re-index.

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate owners.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

Copy link

Ticket title is 'osa/offline_reintegration.py:OSAOfflineReintegration.test_osa_offline_reintegration_without_checksum - /usr/bin/daos_engine exited: signal: aborted (core dumped)'
Status is 'In Review'
Labels: 'ci_impact,daily_test'
https://daosio.atlassian.net/browse/DAOS-15627

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14189/4/execution/node/1409/log

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14189/4/execution/node/1455/log

Dynamically allcated some large variables that are for DTX iteration
and resync to avoid potential ULT stack overflow.

The patch also adjusts DTX re-index ULT stop logic to avoid being
stopped by DTX resync too early before completed DTX re-index.

Signed-off-by: Fan Yong <fan.yong@intel.com>
@Nasf-Fan Nasf-Fan marked this pull request as ready for review April 23, 2024 01:35
@Nasf-Fan Nasf-Fan requested review from a team as code owners April 23, 2024 01:35
@Nasf-Fan Nasf-Fan requested review from NiuYawei and liuxuezhao and removed request for a team and NiuYawei April 23, 2024 01:35
* by DTX may be not released because DTX resync was in-progress at that time. When arrive
* here, DTX resync must has completed globally. Let's release related resource.
*/
if (unlikely(cont_child->sc_dtx_delay_reset == 1)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems the DTX resync possibly not caused by rebuild? (DTX resync may start, and no rebuild happended).
if the rebuild not started or even disabled, which part can do this kind of resource freeing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the DTX resync is triggered by opening container, then the DTX table cannot be released since the opened container will use it. Related resource will be released via DTX aggregation step by step.

@@ -1743,9 +1747,15 @@ stop_dtx_reindex_ult(struct ds_cont_child *cont)
if (dtx_cont_opened(cont))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this happen? I think we should assert(!dtx_cont_opened()), right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked dtx_cont_close() closer, looks the a "dtx_cont_open()" could be called before "stop_dtx_reindex_ult()", then we need to check dtx_cont_openend() here.

Maybe it's better to move this check out of stop_dtx_reintdex_ult(). (like checking dtx_cont_opened() before calling stop_dtx_reindex_ult() in dtx_cont_close()), and put an assert here, since the stop_dtx_reindex_ult() is called in cont_child_stop() as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stop_dtx_reindex_ult() may be called after DTX resync completed globally, then there may be race with container open. It may be not impossible to adjust related logic to reduce race, but checking here is more safe and simple.

*/
if (unlikely(cont_child->sc_dtx_delay_reset == 1)) {
stop_dtx_reindex_ult(cont_child, true);
vos_dtx_cache_reset(cont_child->sc_hdl, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still looks complicated to me. cont_child_stop() is only called when:

  1. Destroy container.
  2. Destroy pool.
  3. Shutdown engine.

So long running ULT which accessing the container needs be aborted in cont_child_stop(). Why don't we simply move all the stuff in cont_child_destroy_one() into cont_child_stop()? That includes:

  1. Stop resyncing ULT.
  2. Stop reindex ULT.
  3. Stop scrubbing ULT.
  4. Stop rebuild container scan.

In current implementation, these cleanup are duplicated (or missed) in pool destroy & engine shutdown cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion with Fanyong, I now understand the 'dtx_delay_reset' flag better.

It looks to me that "dtx cont resync" and "dtx cont open" both need to access the 'cache' being generated by "dtx cont reindex", and we don't enforce the order of "stop resync", "stop reindex" and "cont close".

Then my proposal is to introduce a user reference for the 'cache', and "dtx cont resync", "dtx cont reindex", "dtx cont open" all need to hold a refcount of the "cache", once the refcount drops to zero, the cache will be reset. Does it sound simpler? Of course, I'm fine with current approach either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just discussed with niu about related logic. Some of above cleanup work is necessary, but not related with the this patch and also will not simplify the patch. We can handle them in other ticket(s) later.

@@ -815,6 +815,10 @@ cont_child_stop(struct ds_cont_child *cont_child)
* never be started at all
*/
cont_child->sc_stopping = 1;

/* Stop DTX reindex by force. */
stop_dtx_reindex_ult(cont_child, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it is ok to stop re-index ULT before stop re-sync ULT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we decide to force stop the re-index ULT, it means that we want to stop the container, under such case, it does not care about the in-processing DTX resync that may fail out.

*/
if (unlikely(cont_child->sc_dtx_delay_reset == 1)) {
stop_dtx_reindex_ult(cont_child, true);
vos_dtx_cache_reset(cont_child->sc_hdl, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion with Fanyong, I now understand the 'dtx_delay_reset' flag better.

It looks to me that "dtx cont resync" and "dtx cont open" both need to access the 'cache' being generated by "dtx cont reindex", and we don't enforce the order of "stop resync", "stop reindex" and "cont close".

Then my proposal is to introduce a user reference for the 'cache', and "dtx cont resync", "dtx cont reindex", "dtx cont open" all need to hold a refcount of the "cache", once the refcount drops to zero, the cache will be reset. Does it sound simpler? Of course, I'm fine with current approach either.

@@ -1658,7 +1658,7 @@ ds_cont_local_open(uuid_t pool_uuid, uuid_t cont_hdl_uuid, uuid_t cont_uuid,
DF_UUID": %d\n", DP_UUID(cont_uuid), hdl->sch_cont->sc_open);

hdl->sch_cont->sc_open--;
dtx_cont_close(hdl->sch_cont);
dtx_cont_close(hdl->sch_cont, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why it needs a 'force' flag for this dtx_cont_close().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller of dtx_cont_close() may destroy the container after calling dtx_cont_close(), then it will use "force" mode.

@@ -1743,9 +1747,15 @@ stop_dtx_reindex_ult(struct ds_cont_child *cont)
if (dtx_cont_opened(cont))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked dtx_cont_close() closer, looks the a "dtx_cont_open()" could be called before "stop_dtx_reindex_ult()", then we need to check dtx_cont_openend() here.

Maybe it's better to move this check out of stop_dtx_reintdex_ult(). (like checking dtx_cont_opened() before calling stop_dtx_reindex_ult() in dtx_cont_close()), and put an assert here, since the stop_dtx_reindex_ult() is called in cont_child_stop() as well.

@Nasf-Fan
Copy link
Contributor Author

After discussion with Fanyong, I now understand the 'dtx_delay_reset' flag better.

It looks to me that "dtx cont resync" and "dtx cont open" both need to access the 'cache' being generated by "dtx cont reindex", and we don't enforce the order of "stop resync", "stop reindex" and "cont close".

Then my proposal is to introduce a user reference for the 'cache', and "dtx cont resync", "dtx cont reindex", "dtx cont open" all need to hold a refcount of the "cache", once the refcount drops to zero, the cache will be reset. Does it sound simpler? Of course, I'm fine with current approach either.

The cache is vos level, and attached to vos container. Usually we will not reset the cache unless the vos container is closed. But because the cache may take too much DRAM, it is expected to release DRAM even if the vos container is opened but without upper layer user. So the reference on such cache maybe inconvenient.

@Nasf-Fan
Copy link
Contributor Author

I looked dtx_cont_close() closer, looks the a "dtx_cont_open()" could be called before "stop_dtx_reindex_ult()", then we need to check dtx_cont_openend() here.

Maybe it's better to move this check out of stop_dtx_reintdex_ult(). (like checking dtx_cont_opened() before calling stop_dtx_reindex_ult() in dtx_cont_close()), and put an assert here, since the stop_dtx_reindex_ult() is called in cont_child_stop() as well.

There are serval possible cases may trigger stop_dtx_reindex_ult(), moving the check of dtx_cont_opened() to callers' logic will generate more repeated code. On the other hand, if some new caller is introduced in future, it also needs to handle the open status by itself. But if we hide the check inside stop_dtx_reindex_ult(), then all callers do not need to care about that. So current implementation seems more transparent for the callers.

On the other hand, the cont_close may be yield, then the race between open and close exist. It seems more clean to me to hide the race handling inside {start, stop}_dtx_reindex_ult() instead of check these things everywhere.

@Nasf-Fan Nasf-Fan added the priority Ticket has high priority (automatically managed) label Apr 29, 2024
@Nasf-Fan Nasf-Fan requested review from a team and gnailzenh April 29, 2024 14:05
@Nasf-Fan
Copy link
Contributor Author

Nasf-Fan commented May 4, 2024

Ping @daos-stack/daos-gatekeeper , thanks!

/* Give chance to DTX reindex ULT for exit. */
while (unlikely(cont->sc_dtx_reindex))
ABT_thread_yield();

/* Make sure checksum scrubbing has stopped */
ABT_mutex_lock(cont->sc_mutex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we release the lock and take the lock again after removing the loop

@gnailzenh gnailzenh merged commit 51963e4 into master May 7, 2024
50 of 51 checks passed
@gnailzenh gnailzenh deleted the Nasf-Fan/DAOS-15627_2 branch May 7, 2024 12:26
jolivier23 pushed a commit that referenced this pull request Jul 2, 2024
DAOS-16039 object: fix EC aggregation wrong peer address (#14593)
DAOS-16009 rebuild: fix O_TRUNC file size related handling
DAOS-15056 rebuild: add rpt to the rgt list properly (#13862)
DAOS-15517 rebuild: refine lock handling for rpt list (#14064)
DAOS-13812 container: fix destroy vs lookup (#12757)
DAOS-15627 dtx: redunce stack usage for DTX resync to avoid overflow (#14189)
DAOS-14845 rebuild: do not wait for EC agg for reclaim (#13610)

Signed-off-by: Xuezhao Liu <xuezhao.liu@intel.com>
Signed-off-by: Mohamad Chaarawi <mohamad.chaarawi@intel.com>
Signed-off-by: Jeff Olivier <jeffolivier@google.com>
Signed-off-by: Wang, Di <wddi218@gmail.com>
Signed-off-by: Di Wang <di.wang@intel.com>
Signed-off-by: Wang Shilong <shilong.wang@intel.com>
Signed-off-by: Fan Yong <fan.yong@intel.com>
@jolivier23 jolivier23 mentioned this pull request Jul 2, 2024
18 tasks
jolivier23 added a commit that referenced this pull request Jul 3, 2024
DAOS-16039 object: fix EC aggregation wrong peer address (#14593)
DAOS-16009 rebuild: fix O_TRUNC file size related handling
DAOS-15056 rebuild: add rpt to the rgt list properly (#13862)
DAOS-15517 rebuild: refine lock handling for rpt list (#14064)
DAOS-13812 container: fix destroy vs lookup (#12757)
DAOS-15627 dtx: redunce stack usage for DTX resync to avoid overflow (#14189)
DAOS-14845 rebuild: do not wait for EC agg for reclaim (#13610)

Signed-off-by: Xuezhao Liu <xuezhao.liu@intel.com>
Signed-off-by: Mohamad Chaarawi <mohamad.chaarawi@intel.com>
Signed-off-by: Jeff Olivier <jeffolivier@google.com>
Signed-off-by: Wang, Di <wddi218@gmail.com>
Signed-off-by: Di Wang <di.wang@intel.com>
Signed-off-by: Wang Shilong <shilong.wang@intel.com>
Signed-off-by: Fan Yong <fan.yong@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority Ticket has high priority (automatically managed)
Development

Successfully merging this pull request may close these issues.

5 participants