-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
azure vmss cache fixes and improvements #4685
Conversation
@marwanad: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: marwanad The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -567,6 +579,7 @@ func (scaleSet *ScaleSet) setInstanceStatusByProviderID(providerID string, statu | |||
scaleSet.instanceCache[k].Status = &status | |||
} | |||
} | |||
scaleSet.lastInstanceRefresh = time.Now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This added recently with the same motivation too. If we proactively update the instance state to Deletion
, we don't want it to be invalidated in the next loop in case that cache is stale.
/area provider/azure |
bcd247e
to
d49a131
Compare
/lgtm |
Cherry-pick #4685, #47874 - Azure vmss cache improvements
With the merge of #3717, we lost the optimistic in memory cache we've added. The PR stopped bubbling down the
sizeRefreshPeriod
which was then removed in #4541 because it was unused.The realization is that we still need this back for the following scenario:
scaleSet.DeleteNodes()
comes in and this calls intoGetScaleSetSize
which in turn ends up reading frommanager.cache
which would have a cached count of 3 and thus will return you 3 so you end up deleting that node as wellWith this PR, we'll extend the last refresh time by
sizeRefreshPeriod
to ensure that next time we expire,manager.cache
would've had the chance to refresh and give us fresh data. That's basically the behaviour prior to #3717. See in 1.19.The PR also cleans up the logging to refer to "in-memory size" vs the one we get back from the manager cache.
This will impact 1.21+.
/area cloudprovider/azure