DAOS-15627 dtx: redunce stack usage for DTX resync to avoid overflow #14189

Nasf-Fan · 2024-04-18T16:29:11Z

Dynamically allcated some large variables that are for DTX iteration and resync to avoid potential ULT stack overflow.

The patch also adjusts DTX re-index ULT stop logic to avoid being stopped by DTX resync too early before completed DTX re-index.

Before requesting gatekeeper:

Two review approvals and any prior change requests have been resolved.
Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
Commit messages follows the guidelines outlined here.
Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

github-actions · 2024-04-18T16:29:30Z

Ticket title is 'osa/offline_reintegration.py:OSAOfflineReintegration.test_osa_offline_reintegration_without_checksum - /usr/bin/daos_engine exited: signal: aborted (core dumped)'
Status is 'In Review'
Labels: 'ci_impact,daily_test'
https://daosio.atlassian.net/browse/DAOS-15627

daosbuild1 · 2024-04-20T21:35:17Z

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14189/4/execution/node/1409/log

daosbuild1 · 2024-04-20T23:16:21Z

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14189/4/execution/node/1455/log

Dynamically allcated some large variables that are for DTX iteration and resync to avoid potential ULT stack overflow. The patch also adjusts DTX re-index ULT stop logic to avoid being stopped by DTX resync too early before completed DTX re-index. Signed-off-by: Fan Yong <fan.yong@intel.com>

liuxuezhao · 2024-04-25T09:30:11Z

src/rebuild/scan.c

+	 * by DTX may be not released because DTX resync was in-progress at that time. When arrive
+	 * here, DTX resync must has completed globally. Let's release related resource.
+	 */
+	if (unlikely(cont_child->sc_dtx_delay_reset == 1)) {


seems the DTX resync possibly not caused by rebuild? (DTX resync may start, and no rebuild happended).
if the rebuild not started or even disabled, which part can do this kind of resource freeing?

If the DTX resync is triggered by opening container, then the DTX table cannot be released since the opened container will use it. Related resource will be released via DTX aggregation step by step.

NiuYawei · 2024-04-26T00:47:06Z

src/dtx/dtx_common.c

@@ -1743,9 +1747,15 @@ stop_dtx_reindex_ult(struct ds_cont_child *cont)
 	if (dtx_cont_opened(cont))


How can this happen? I think we should assert(!dtx_cont_opened()), right?

I looked dtx_cont_close() closer, looks the a "dtx_cont_open()" could be called before "stop_dtx_reindex_ult()", then we need to check dtx_cont_openend() here.

Maybe it's better to move this check out of stop_dtx_reintdex_ult(). (like checking dtx_cont_opened() before calling stop_dtx_reindex_ult() in dtx_cont_close()), and put an assert here, since the stop_dtx_reindex_ult() is called in cont_child_stop() as well.

stop_dtx_reindex_ult() may be called after DTX resync completed globally, then there may be race with container open. It may be not impossible to adjust related logic to reduce race, but checking here is more safe and simple.

NiuYawei · 2024-04-26T01:08:45Z

src/rebuild/scan.c

+	 */
+	if (unlikely(cont_child->sc_dtx_delay_reset == 1)) {
+		stop_dtx_reindex_ult(cont_child, true);
+		vos_dtx_cache_reset(cont_child->sc_hdl, false);


It still looks complicated to me. cont_child_stop() is only called when:

Destroy container.

Destroy pool.

Shutdown engine.

So long running ULT which accessing the container needs be aborted in cont_child_stop(). Why don't we simply move all the stuff in cont_child_destroy_one() into cont_child_stop()? That includes:

Stop resyncing ULT.

Stop reindex ULT.

Stop scrubbing ULT.

Stop rebuild container scan.

In current implementation, these cleanup are duplicated (or missed) in pool destroy & engine shutdown cases?

After discussion with Fanyong, I now understand the 'dtx_delay_reset' flag better.

It looks to me that "dtx cont resync" and "dtx cont open" both need to access the 'cache' being generated by "dtx cont reindex", and we don't enforce the order of "stop resync", "stop reindex" and "cont close".

Then my proposal is to introduce a user reference for the 'cache', and "dtx cont resync", "dtx cont reindex", "dtx cont open" all need to hold a refcount of the "cache", once the refcount drops to zero, the cache will be reset. Does it sound simpler? Of course, I'm fine with current approach either.

Just discussed with niu about related logic. Some of above cleanup work is necessary, but not related with the this patch and also will not simplify the patch. We can handle them in other ticket(s) later.

NiuYawei · 2024-04-26T02:55:39Z

src/container/srv_target.c

@@ -815,6 +815,10 @@ cont_child_stop(struct ds_cont_child *cont_child)
 	 * never be started at all
 	 */
 	cont_child->sc_stopping = 1;
+
+	/* Stop DTX reindex by force. */
+	stop_dtx_reindex_ult(cont_child, true);


So, it is ok to stop re-index ULT before stop re-sync ULT?

When we decide to force stop the re-index ULT, it means that we want to stop the container, under such case, it does not care about the in-processing DTX resync that may fail out.

NiuYawei · 2024-04-26T03:00:36Z

src/rebuild/scan.c

+	 */
+	if (unlikely(cont_child->sc_dtx_delay_reset == 1)) {
+		stop_dtx_reindex_ult(cont_child, true);
+		vos_dtx_cache_reset(cont_child->sc_hdl, false);


After discussion with Fanyong, I now understand the 'dtx_delay_reset' flag better.

It looks to me that "dtx cont resync" and "dtx cont open" both need to access the 'cache' being generated by "dtx cont reindex", and we don't enforce the order of "stop resync", "stop reindex" and "cont close".

Then my proposal is to introduce a user reference for the 'cache', and "dtx cont resync", "dtx cont reindex", "dtx cont open" all need to hold a refcount of the "cache", once the refcount drops to zero, the cache will be reset. Does it sound simpler? Of course, I'm fine with current approach either.

NiuYawei · 2024-04-26T03:05:17Z

src/container/srv_target.c

@@ -1658,7 +1658,7 @@ ds_cont_local_open(uuid_t pool_uuid, uuid_t cont_hdl_uuid, uuid_t cont_uuid,
 		  DF_UUID": %d\n", DP_UUID(cont_uuid), hdl->sch_cont->sc_open);

 	hdl->sch_cont->sc_open--;
-	dtx_cont_close(hdl->sch_cont);
+	dtx_cont_close(hdl->sch_cont, true);


I don't see why it needs a 'force' flag for this dtx_cont_close().

The caller of dtx_cont_close() may destroy the container after calling dtx_cont_close(), then it will use "force" mode.

NiuYawei · 2024-04-26T03:11:45Z

src/dtx/dtx_common.c

@@ -1743,9 +1747,15 @@ stop_dtx_reindex_ult(struct ds_cont_child *cont)
 	if (dtx_cont_opened(cont))


I looked dtx_cont_close() closer, looks the a "dtx_cont_open()" could be called before "stop_dtx_reindex_ult()", then we need to check dtx_cont_openend() here.

Maybe it's better to move this check out of stop_dtx_reintdex_ult(). (like checking dtx_cont_opened() before calling stop_dtx_reindex_ult() in dtx_cont_close()), and put an assert here, since the stop_dtx_reindex_ult() is called in cont_child_stop() as well.

Nasf-Fan · 2024-04-26T09:04:45Z

After discussion with Fanyong, I now understand the 'dtx_delay_reset' flag better.

It looks to me that "dtx cont resync" and "dtx cont open" both need to access the 'cache' being generated by "dtx cont reindex", and we don't enforce the order of "stop resync", "stop reindex" and "cont close".

Then my proposal is to introduce a user reference for the 'cache', and "dtx cont resync", "dtx cont reindex", "dtx cont open" all need to hold a refcount of the "cache", once the refcount drops to zero, the cache will be reset. Does it sound simpler? Of course, I'm fine with current approach either.

The cache is vos level, and attached to vos container. Usually we will not reset the cache unless the vos container is closed. But because the cache may take too much DRAM, it is expected to release DRAM even if the vos container is opened but without upper layer user. So the reference on such cache maybe inconvenient.

Nasf-Fan · 2024-04-26T09:20:51Z

I looked dtx_cont_close() closer, looks the a "dtx_cont_open()" could be called before "stop_dtx_reindex_ult()", then we need to check dtx_cont_openend() here.

Maybe it's better to move this check out of stop_dtx_reintdex_ult(). (like checking dtx_cont_opened() before calling stop_dtx_reindex_ult() in dtx_cont_close()), and put an assert here, since the stop_dtx_reindex_ult() is called in cont_child_stop() as well.

There are serval possible cases may trigger stop_dtx_reindex_ult(), moving the check of dtx_cont_opened() to callers' logic will generate more repeated code. On the other hand, if some new caller is introduced in future, it also needs to handle the open status by itself. But if we hide the check inside stop_dtx_reindex_ult(), then all callers do not need to care about that. So current implementation seems more transparent for the callers.

On the other hand, the cont_close may be yield, then the race between open and close exist. It seems more clean to me to hide the race handling inside {start, stop}_dtx_reindex_ult() instead of check these things everywhere.

Nasf-Fan · 2024-05-04T05:14:44Z

Ping @daos-stack/daos-gatekeeper , thanks!

gnailzenh · 2024-04-30T07:25:06Z

src/container/srv_target.c

-		/* Give chance to DTX reindex ULT for exit. */
-		while (unlikely(cont->sc_dtx_reindex))
-			ABT_thread_yield();
-
 		/* Make sure checksum scrubbing has stopped */
 		ABT_mutex_lock(cont->sc_mutex);


why do we release the lock and take the lock again after removing the loop

DAOS-16039 object: fix EC aggregation wrong peer address (#14593) DAOS-16009 rebuild: fix O_TRUNC file size related handling DAOS-15056 rebuild: add rpt to the rgt list properly (#13862) DAOS-15517 rebuild: refine lock handling for rpt list (#14064) DAOS-13812 container: fix destroy vs lookup (#12757) DAOS-15627 dtx: redunce stack usage for DTX resync to avoid overflow (#14189) DAOS-14845 rebuild: do not wait for EC agg for reclaim (#13610) Signed-off-by: Xuezhao Liu <xuezhao.liu@intel.com> Signed-off-by: Mohamad Chaarawi <mohamad.chaarawi@intel.com> Signed-off-by: Jeff Olivier <jeffolivier@google.com> Signed-off-by: Wang, Di <wddi218@gmail.com> Signed-off-by: Di Wang <di.wang@intel.com> Signed-off-by: Wang Shilong <shilong.wang@intel.com> Signed-off-by: Fan Yong <fan.yong@intel.com>

Nasf-Fan force-pushed the Nasf-Fan/DAOS-15627_2 branch from fec26d5 to 5c782bc Compare April 19, 2024 00:54

Nasf-Fan force-pushed the Nasf-Fan/DAOS-15627_2 branch from 5c782bc to f0a845e Compare April 22, 2024 08:58

Nasf-Fan marked this pull request as ready for review April 23, 2024 01:35

Nasf-Fan requested review from a team as code owners April 23, 2024 01:35

Nasf-Fan requested review from NiuYawei and liuxuezhao and removed request for a team and NiuYawei April 23, 2024 01:35

Nasf-Fan mentioned this pull request Apr 23, 2024

DAOS-15627 dtx: redunce stack usage for DTX resync to avoid overflow #14150

Closed

18 tasks

liuxuezhao reviewed Apr 25, 2024

View reviewed changes

NiuYawei reviewed Apr 26, 2024

View reviewed changes

Nasf-Fan requested review from liuxuezhao and NiuYawei April 27, 2024 02:26

NiuYawei approved these changes Apr 28, 2024

View reviewed changes

Nasf-Fan added the priority Ticket has high priority (automatically managed) label Apr 29, 2024

liuxuezhao approved these changes Apr 29, 2024

View reviewed changes

Nasf-Fan requested review from a team and gnailzenh April 29, 2024 14:05

gnailzenh approved these changes May 7, 2024

View reviewed changes

gnailzenh merged commit 51963e4 into master May 7, 2024
50 of 51 checks passed

gnailzenh deleted the Nasf-Fan/DAOS-15627_2 branch May 7, 2024 12:26

jolivier23 mentioned this pull request Jul 2, 2024

b/349170185 Backport fixes #14680

Merged

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DAOS-15627 dtx: redunce stack usage for DTX resync to avoid overflow #14189

DAOS-15627 dtx: redunce stack usage for DTX resync to avoid overflow #14189

Nasf-Fan commented Apr 18, 2024

github-actions bot commented Apr 18, 2024

daosbuild1 commented Apr 20, 2024

daosbuild1 commented Apr 20, 2024

liuxuezhao Apr 25, 2024

Nasf-Fan Apr 25, 2024

NiuYawei Apr 26, 2024

NiuYawei Apr 26, 2024

Nasf-Fan Apr 26, 2024

NiuYawei Apr 26, 2024

NiuYawei Apr 26, 2024

Nasf-Fan Apr 26, 2024

NiuYawei Apr 26, 2024

Nasf-Fan Apr 26, 2024

NiuYawei Apr 26, 2024

NiuYawei Apr 26, 2024

Nasf-Fan Apr 26, 2024

NiuYawei Apr 26, 2024

Nasf-Fan commented Apr 26, 2024

Nasf-Fan commented Apr 26, 2024

Nasf-Fan commented May 4, 2024

gnailzenh Apr 30, 2024

		@@ -1743,9 +1747,15 @@ stop_dtx_reindex_ult(struct ds_cont_child *cont)
		if (dtx_cont_opened(cont))

DAOS-15627 dtx: redunce stack usage for DTX resync to avoid overflow #14189

DAOS-15627 dtx: redunce stack usage for DTX resync to avoid overflow #14189

Conversation

Nasf-Fan commented Apr 18, 2024

Before requesting gatekeeper:

Gatekeeper:

github-actions bot commented Apr 18, 2024

daosbuild1 commented Apr 20, 2024

daosbuild1 commented Apr 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Nasf-Fan commented Apr 26, 2024

Nasf-Fan commented Apr 26, 2024

Nasf-Fan commented May 4, 2024

Choose a reason for hiding this comment