-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-16559 container: return EBUSY for container being destroyed #15154
Conversation
Ticket title is 'erasurecode/rebuild_fio.py:EcodFioRebuild.test_ec_online_rebuild_fio - daos_lru_ref_evict_wait() Assertion '!llink->ll_wait_evict' failed' |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15154/1/execution/node/295/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15154/1/execution/node/273/log |
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15154/1/execution/node/354/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15154/1/execution/node/357/log |
Test stage Build on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15154/1/execution/node/495/log |
Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15154/1/execution/node/494/log |
- Don't allow multiple callers to destroy the same container, later call should get EBUSY - Remove the loop in cont_child_destroy_one(), because it will wait for refcount dropping to zero, the loop is useless now. Signed-off-by: Liang Zhen <liang.zhen@intel.com>
66d2b61
to
c31cbb1
Compare
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15154/3/execution/node/1531/log |
Multiple tests not run in HW Medium due to a possible DNS issue: https://build.hpdd.intel.com/job/daos-stack/job/daos/view/change-requests/job/PR-15154/3/artifact/Functional%20Hardware%20Medium/launch/functional_hardware_medium/results.html
Manually kicked of a new build: https://build.hpdd.intel.com/job/daos-stack/job/daos/view/change-requests/job/PR-15154/4/ |
rc = cont_child_lookup(tls->dt_cont_cache, in->tdi_uuid, | ||
in->tdi_pool_uuid, false /* create */, &cont); | ||
if (rc == -DER_NONEXIST) | ||
D_GOTO(out_pool, rc = 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the container is in stopped state when a container destroy reaches here, the destroy will be skipped. I'm not quite sure if this could happen in today's implementation, but anyway, I think this kind of potential issues happened when target is in rebuild/reint is out of scope of this PR.
@@ -1375,7 +1363,7 @@ ds_cont_child_lookup(uuid_t pool_uuid, uuid_t cont_uuid, | |||
if (rc != 0) | |||
return rc; | |||
|
|||
if ((*ds_cont)->sc_stopping) { | |||
if ((*ds_cont)->sc_stopping || (*ds_cont)->sc_destroying) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[minor] sc_destroying check here is duplicated, cont_child_stop() will set sc_stopping anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, I think we should do more cleanup for this part of code.
@@ -2603,6 +2591,13 @@ cont_child_prop_update(void *data) | |||
return rc; | |||
} | |||
D_ASSERT(child != NULL); | |||
if (child->sc_stopping || child->sc_destroying) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks unnecessary for me, cont_child_lookup() will return ENOENT if it was stopped
) (#15158) - Don't allow multiple callers to destroy the same container, later call should get EBUSY - Remove the loop in cont_child_destroy_one(), because it will wait for refcount dropping to zero, the loop is useless now. Signed-off-by: Liang Zhen <liang.zhen@intel.com>
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: