Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-16097 vos: assign persistent DTX entry in vos_dtx_prepared #14708

Merged
merged 1 commit into from
Aug 19, 2024

Conversation

Nasf-Fan
Copy link
Contributor

@Nasf-Fan Nasf-Fan commented Jul 8, 2024

Assign persistent DTX entry only via vos_dtx_prepared() that will initialize such DTX entry immediately to avoid any potential race between persistently allocating DTX entry and initializing it.

Add some check (for DTX flag) after DTX locally prepared.

Do not allow current transaction to deregister the record that is referenced by another prepared (but non-committed) DTX.

Allow-unstable-test: true

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate owners.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

Copy link

github-actions bot commented Jul 8, 2024

Ticket title is 'erasurecode/aggregation.py:EcodAggregationOff.test_ec_aggregation_time - failed to destroy container TestContainer_12: DER_IO(-2001)'
Status is 'In Progress'
Labels: '2.6.0rc1,LRZ,ci_impact,md_on_ssd,weekly_test'
https://daosio.atlassian.net/browse/DAOS-16097

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-16097_1 branch from d56a93d to d6b9f5b Compare July 8, 2024 16:11
@daosbuild1
Copy link
Collaborator

Test stage Fault injection testing on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14708/2/execution/node/1116/log

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-16097_1 branch from d6b9f5b to 20cfaf6 Compare July 9, 2024 07:17
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14708/3/execution/node/1322/log

@Nasf-Fan Nasf-Fan changed the base branch from master to Nasf-Fan/DAOS-16005_2 July 10, 2024 03:31
@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-16097_1 branch 2 times, most recently from b876ef3 to 3b8e48f Compare July 10, 2024 03:47
@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-16005_2 branch from c899d12 to 8c4db79 Compare July 11, 2024 06:09
@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-16097_1 branch from 3b8e48f to 9d7596c Compare July 11, 2024 06:12
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14708/7/execution/node/1322/log

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-16097_1 branch from 9d7596c to 7c1f413 Compare July 12, 2024 03:18
@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-16005_2 branch from 8c4db79 to a5959eb Compare July 14, 2024 08:15
@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-16097_1 branch from 7c1f413 to 94a74b0 Compare July 14, 2024 08:18
@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-16005_2 branch from a5959eb to 74de330 Compare July 15, 2024 16:01
@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-16097_1 branch from 94a74b0 to 2723b2b Compare July 15, 2024 16:02
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14708/10/execution/node/1368/log

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14708/10/execution/node/1347/log

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-16097_1 branch from 2723b2b to 11fae97 Compare July 17, 2024 07:05
@daosbuild1
Copy link
Collaborator

Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14708/11/testReport/

@daosbuild1
Copy link
Collaborator

@daosbuild1
Copy link
Collaborator

Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14708/11/testReport/

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-16097_1 branch from 11fae97 to 95949f1 Compare July 18, 2024 01:50
@Nasf-Fan Nasf-Fan marked this pull request as ready for review July 19, 2024 04:35
@Nasf-Fan Nasf-Fan requested review from a team as code owners July 19, 2024 04:35
@Nasf-Fan Nasf-Fan requested review from liuxuezhao and janekmi July 22, 2024 03:35
@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-16097_1 branch from 95949f1 to f8713a5 Compare July 22, 2024 15:08
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14708/14/execution/node/1447/log

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-16097_1 branch from f8713a5 to e0a88ff Compare July 23, 2024 15:51
@daosbuild1
Copy link
Collaborator

Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14708/15/execution/node/382/log

@daosbuild1
Copy link
Collaborator

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14708/15/execution/node/374/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14708/15/execution/node/375/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14708/15/execution/node/376/log

@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14708/15/execution/node/367/log

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-16097_1 branch from e0a88ff to d02c854 Compare July 23, 2024 16:04
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14708/16/execution/node/1446/log

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-16097_1 branch from d02c854 to 37a2256 Compare July 24, 2024 03:00
@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14708/17/execution/node/1184/log

Assign persistent DTX entry only via vos_dtx_prepared() that will
initialize such DTX entry immediately to avoid any potential race
between persistently allocating DTX entry and initializing it.

Add some check (for DTX flag) after DTX locally prepared.

Do not allow current transaction to deregister the record that is
referenced by another prepared (but non-committed) DTX.

Allow-unstable-test: true

Signed-off-by: Fan Yong <fan.yong@intel.com>
@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-16097_1 branch from 37a2256 to b3aab99 Compare July 24, 2024 13:52
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14708/18/execution/node/1416/log

@Nasf-Fan
Copy link
Contributor Author

osa_offline_reintegration failed for DAOS-16007

@Nasf-Fan Nasf-Fan requested a review from jolivier23 July 30, 2024 02:12
@Nasf-Fan
Copy link
Contributor Author

Nasf-Fan commented Aug 1, 2024

Ping reviewers, thanks!

@Nasf-Fan
Copy link
Contributor Author

Nasf-Fan commented Aug 6, 2024

@jolivier23 @NiuYawei @janekmi , would you please to help review the patch? Thanks!

@Nasf-Fan Nasf-Fan requested a review from a team August 12, 2024 06:42
@Nasf-Fan Nasf-Fan added the forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. label Aug 14, 2024
@Nasf-Fan Nasf-Fan requested a review from gnailzenh August 15, 2024 14:05
@gnailzenh gnailzenh merged commit 427d135 into master Aug 19, 2024
49 of 52 checks passed
@gnailzenh gnailzenh deleted the Nasf-Fan/DAOS-16097_1 branch August 19, 2024 11:51
Copy link
Contributor

@janekmi janekmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell this change set looks correct. I was focusing on the persistency correctness though.

Sorry for joining late to the party. I would still appreciate responses to my questions. I am very eager to use every opportunity to learn a bit more about DTX. Thanks!

dck.oid = dsp->dsp_oid;
dck.dkey_hash = dsp->dsp_dkey_hash;
rc = dtx_commit(cont, &pdte, &dck, 1);
if (rc < 0 && rc != -DER_NONEXIST && for_io)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: && for_io seems redundant considering the whole block is under this condition.

Comment on lines +1234 to +1254
if (for_io) {
rc = vos_dtx_check(cont->sc_hdl, &dsp->dsp_xid, NULL, NULL, NULL, false);
switch(rc) {
case DTX_ST_COMMITTABLE:
dck.oid = dsp->dsp_oid;
dck.dkey_hash = dsp->dsp_dkey_hash;
rc = dtx_commit(cont, &pdte, &dck, 1);
if (rc < 0 && rc != -DER_NONEXIST && for_io)
d_list_add_tail(&dsp->dsp_link, cmt_list);
else
dtx_dsp_free(dsp);
continue;
case DTX_ST_COMMITTED:
case -DER_NONEXIST: /* Aborted */
dtx_dsp_free(dsp);
continue;
default:
break;
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not too familiar with this part of DAOS, but it seems to me this change is not explained in the commit message. Or is it?

Comment on lines +5653 to +5654
rc = dtx_leader_begin(ioc.ioc_vos_coh, &odm->odm_xid, &epoch,
dcts[0].dct_shards[dmi->dmi_tgt_id].dcs_nr, version,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this bit not referred to in the commit message?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed.
Development

Successfully merging this pull request may close these issues.

6 participants