-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-16037 pool: Fix upgrade for svc_ops #14753
Conversation
- pool_upgrade_props should take care of ds_pool_prop_svc_ops_{num,max} too. - pool_upgrade_props modifies ds_pool_prop_svc_ops_{enabled,age,max}, which are cached by pool_svc. It must also update the cached values too. - read_db_for_stepping_up should only access ds_pool_prop_svc_ops_{enabled,age,max} when the pool global version is high enough. If one of the properties is expected but absent, it should be an error; recovering the value to 0 may not be desirable. Features: pool Signed-off-by: Li Wei <wei.g.li@intel.com> Required-githooks: true
Errors are Unable to load ticket data |
@@ -1653,13 +1653,8 @@ read_db_for_stepping_up(struct pool_svc *svc, struct pool_buf **map_buf_out, | |||
struct pool_buf *map_buf; | |||
struct daos_prop_entry *svc_rf_entry; | |||
daos_prop_t *prop = NULL; | |||
uint32_t svc_ops_enabled = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change to this function seems good. No change requested since this form seems cleaner.
Just for confirmation, if the changes to this function were omitted from this patch, the issues in the ticket resulting from the pool upgrade bug would be resolved via the other change in this patch to pool_upgrade_props()? i.e., For a daos 2.4 pool whose svc_ops_{enabled,age,max} all do not exist, functionally with or without the change to this function, in both cases itt will result in the same svc->ps_ops_enabled = 0, svc->svc_ops_max = 0, svc->svc_ops_age = 0 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kccain, yes, I think things should work without the changes in read_db_for_stepping_up
. Do you prefer not to make these changes?
This question reminds me: To make the code easier to understand, I should have written an else
branch that sets enabled
, max
, and age
to 0. What do you think?
(Since the CI job has already failed during git fetch
, I'll refresh the PR to add the else
branch. But please feel free to share any concern/objection.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objection, I think your change is an improvement, including the most recent change to add the else branch and improve code readability.
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14753/2/testReport/ |
For Jenkins build 2 NLT valgrind error https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-14753/2/valgrindResult/pid=25884,0xe8/ - I have created a master PR to modify suppressions #14762 Undaunted by this and some occasional networking issues in CI, nonetheless I have started a rebuild of the PR |
Signed-off-by: Li Wei <wei.g.li@intel.com> Required-githooks: true
Features: pool Required-githooks: true
The previous CI job failed during |
@@ -1653,13 +1653,8 @@ read_db_for_stepping_up(struct pool_svc *svc, struct pool_buf **map_buf_out, | |||
struct pool_buf *map_buf; | |||
struct daos_prop_entry *svc_rf_entry; | |||
daos_prop_t *prop = NULL; | |||
uint32_t svc_ops_enabled = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objection, I think your change is an improvement, including the most recent change to add the else branch and improve code readability.
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14753/5/execution/node/1508/log |
Fail: Odd number (3) of NVMe devices seen. @JohnMalmberg says for this cluster the issue has been fixed. So I am going to stop build 5 and restart, specifying high priority in Jenkins. Also, I will specifically skip Functional Test (on VM) stages because that has already run successfully with Features: pool, and takes a considerable amount of execution time. latest build (as of this writing) is https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-14753/7/pipeline-graph/ |
@daos-stack/daos-gatekeeper FYI, when we get to landing for this, do note that the unit test, functional (VM) testing is going to be based on build 5, whereas functional HW testing results are going to come from the final Jenkins build (currently build 7), and this is due to working around this ticket https://daosio.atlassian.net/browse/DAOS-16147 |
Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14753/7/testReport/ [liw] dfuse/daos_build: https://daosio.atlassian.net/browse/DAOS-16215 |
* DAOS-16037 pool: Fix upgrade for svc_ops - pool_upgrade_props should take care of ds_pool_prop_svc_ops_{num,max} too. - pool_upgrade_props modifies ds_pool_prop_svc_ops_{enabled,age,max}, which are cached by pool_svc. It must also update the cached values too. - read_db_for_stepping_up should only access ds_pool_prop_svc_ops_{enabled,age,max} when the pool global version is high enough. If one of the properties is expected but absent, it should be an error; recovering the value to 0 may not be desirable. Signed-off-by: Li Wei <wei.g.li@intel.com>
pool_upgrade_props should take care of ds_pool_prop_svc_ops_{num,max} too.
pool_upgrade_props modifies ds_pool_prop_svc_ops_{enabled,age,max}, which are cached by pool_svc. It must also update the cached values too.
read_db_for_stepping_up should only access ds_pool_prop_svc_ops_{enabled,age,max} when the pool global version is high enough. If one of the properties is expected but absent, it should be an error; recovering the value to 0 may not be desirable.
Features: pool
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: