-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restart poller after all CES are reconciled after nephe restart. #269
Conversation
ac3581f
to
54c66b3
Compare
54c66b3
to
1358e5d
Compare
1358e5d
to
e89b6c5
Compare
pkg/accountmanager/manager.go
Outdated
if exists { | ||
accPoller.restartPoller(accNamespacedName) | ||
// wait for polling to complete after restart. | ||
retError = accPoller.waitForPollDone(accNamespacedName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the error from accPoller.waitForPollDone(accNamespacedName) will be lost.
Can you simulate an error for this scenario and verify if the CES status is updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error from "accPoller.waitForPollDone(accNamespacedName)" is not lost as we can have this kind of wait function in defer function.
However, I saw an issue setting return value of AddResourceFiltersToAccount() from defer function.
I had to return the address.
Also, I tested the negative scenario, we should be good.
pkg/accountmanager/manager.go
Outdated
@@ -159,31 +164,56 @@ func (a *AccountManager) RemoveAccount(namespacedName *types.NamespacedName) err | |||
// restart poller once done. | |||
func (a *AccountManager) AddResourceFiltersToAccount(accNamespacedName *types.NamespacedName, | |||
selectorNamespacedName *types.NamespacedName, selector *crdv1alpha1.CloudEntitySelector, replay bool) (bool, error) { | |||
var retError error = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to initialise it to nil.
pkg/accountmanager/manager.go
Outdated
@@ -61,6 +61,8 @@ type AccountManager struct { | |||
Inventory inventory.Interface | |||
accPollers map[types.NamespacedName]*accountPoller | |||
accountConfigMap map[types.NamespacedName]*accountConfig | |||
// accountToNumSelectors stores number of configured CloudEntitySelector CRs for a given account, required for handling nephe restart case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: split comment into two lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/accountmanager/manager.go
Outdated
"provider type not found", selectorNamespacedName, accNamespacedName)) | ||
retError = fmt.Errorf(fmt.Sprintf("failed to add or update selector %v, account %v: "+ | ||
"provider type not found", *selectorNamespacedName, *accNamespacedName)) | ||
return false, retError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the use of creating a new error variable here vs directly returning the error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we want the defer function to get executed too which can alter the return value.
@@ -83,10 +83,10 @@ func (v *CESMutator) Handle(_ context.Context, req admission.Request) admission. | |||
referencedAccount := &v1alpha1.CloudProviderAccount{} | |||
err = v.Client.Get(context.TODO(), *accountNameSpacedName, referencedAccount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Change to if err = (); err != nil {}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -127,7 +127,7 @@ func (ec2Cfg *ec2ServiceConfig) getManagedVpcIDs() map[string]struct{} { | |||
awsPluginLogger().V(4).Info("Cache snapshot nil", "type", providerType, "account", ec2Cfg.accountNamespacedName) | |||
return vpcIDsCopy | |||
} | |||
vpcIDsSet := snapshot.(*ec2ResourcesCacheSnapshot).managedVpcIDs | |||
vpcIDsSet := snapshot.(*ec2ResourcesCacheSnapshot).managedVpcIds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: vpcIdsSet, vpcIdsCopy.
If you are renaming the variables, please rename all the above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed all in azure_compute.go and aws_ec2.go .
@@ -69,11 +69,10 @@ func (r *CloudEntitySelectorReconciler) Reconcile(ctx context.Context, req ctrl. | |||
return ctrl.Result{}, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: return ctrl.Result{}, r.processDelete(&req.NamespacedName) and remove L68
e89b6c5
to
0900b10
Compare
pkg/accountmanager/manager.go
Outdated
@@ -158,32 +164,57 @@ func (a *AccountManager) RemoveAccount(namespacedName *types.NamespacedName) err | |||
// AddResourceFiltersToAccount add/update cloud plugin for a given account to include the new selector and | |||
// restart poller once done. | |||
func (a *AccountManager) AddResourceFiltersToAccount(accNamespacedName *types.NamespacedName, | |||
selectorNamespacedName *types.NamespacedName, selector *crdv1alpha1.CloudEntitySelector, replay bool) (bool, error) { | |||
selectorNamespacedName *types.NamespacedName, selector *crdv1alpha1.CloudEntitySelector, replay bool) (bool, *error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is error used as pointer.. Consider changing it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without it being a pointer, we cannot set return value in defer function. If it is a regular variable then the retError set in defer function doesn't get returned to the caller. As a side-effect status setting in CES wouldn't work.
pkg/accountmanager/manager.go
Outdated
@@ -102,11 +106,13 @@ func (a *AccountManager) AddAccount(namespacedName *types.NamespacedName, accoun | |||
// Create an account poller for polling cloud inventory. | |||
accPoller, exists := a.addAccountPoller(cloudInterface, namespacedName, account) | |||
if !exists { | |||
if !crd.DoesCesCrExistsForAccount(a.Client, namespacedName) { | |||
numCes := crd.GetNumberOfCesCrForAccount(a.Client, namespacedName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetCesCrForAccount
0900b10
to
0f7562d
Compare
@@ -138,13 +136,13 @@ func (r *CloudEntitySelectorReconciler) processCreateOrUpdate(selector *crdv1alp | |||
Name: selector.Spec.AccountName, | |||
} | |||
r.selectorToAccountMap[*selectorNamespacedName] = *accountNamespacedName | |||
retry, err := r.AccManager.AddResourceFiltersToAccount(accountNamespacedName, selectorNamespacedName, | |||
// No errors can be retried for a CES. | |||
_, err := r.AccManager.AddResourceFiltersToAccount(accountNamespacedName, selectorNamespacedName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Anandkumar26 Can you confirm this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The error while processing CES can only occur if an account is either not initialised successfully or account is in a bad state. I don't think there is any benefit in retrying the errors forever.
So we capture the error and store in a CES status field. (which can be replayed when account goes from an uninitialised to initialised state OR it would need a user intervention)
0f7562d
to
129405b
Compare
/nephe-test-e2e-agentless |
196f9d9
to
a395391
Compare
// Make sure reconciler retriable error cannot occur during restart, it can affect the logic in UpdatePendingCes function. | ||
if count := r.AccManager.UpdatePendingCesCount(accountNamespacedName); count == 0 { | ||
if err := r.AccManager.WaitForPollDone(accountNamespacedName); err != nil { | ||
r.Log.Error(err, "Inventory poll failed", "account", accountNamespacedName, "error", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: inventory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -144,6 +143,12 @@ func (r *CloudEntitySelectorReconciler) processCreateOrUpdate(selector *crdv1alp | |||
return err | |||
} | |||
r.updateStatus(selectorNamespacedName, err) | |||
// Make sure reconciler retriable error cannot occur during restart, it can affect the logic in UpdatePendingCes function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add detail comment on what it means..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a TODO as discussed
/nephe-test-e2e-agentless-upgrade |
c6395dc
to
57c1cb1
Compare
- On nephe restart, In order to avoid poller restart on CPA and every CES add, postpone restart till all CES attached to the CPA are reconciled. - Fix logging in CES webhook. Signed-off-by: Archana Holla <harchana@vmware.com>
57c1cb1
to
68fcff0
Compare
/nephe-test-e2e-agentless |
1 similar comment
/nephe-test-e2e-agentless |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/LGTM
No description provided.