-
Notifications
You must be signed in to change notification settings - Fork 55
operator: simplify object creation #586
operator: simplify object creation #586
Conversation
When an assertion fails in a helper function, Ginkgo shows the error as belonging to that function, which is not very helpful because it's not visible where that function was called. It's better to skip frames in the call stack such that errors always refer to the test itself.
The behavior was that first secrets were created, then the phase changed without waiting for anything and then the next set of objects were created. This is unnecessarily complex and confusing. Now all objects get created together in the "new" phase and then the deployment goes straight to "running". When introducing the "is driver usable" check, it will first have to go into "initializing" and then to "running" after the check passes.
eaf2281
to
54c7ce2
Compare
@avalluri this passed testing, can we merge? |
|
||
// getDeploymentObjects returns all objects that are part of a driver deployment. | ||
func (d *PmemCSIDriver) getDeploymentObjects(r *ReconcileDeployment) ([]runtime.Object, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pohly By adding secrets creation to getDeploymentObjects
, calling this method more than once creates new keys/certificates for every call in case the deployment chooses to use defaults certificates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that a bad thing? It would ensure that those certificates are always fresh.
We haven't sorted out how to handle periodic updates of those certificates and the current code also doesn't have your "populate deployment from objects" code yet. Let's figure out how to avoid updating the secrets too often later, okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that a bad thing? It would ensure that those certificates are always fresh.
We haven't sorted out how to handle periodic updates of those certificates and the current code also doesn't have your "populate deployment from objects" code yet. Let's figure out how to avoid updating the secrets too often later, okay?
That depends on how often we call this method, At least I could think of on every restart of operator pod. Updating secrets also lead to the restart of all driver pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That depends on how often we call this method, At least I could think of on every restart of operator pod.
Which isn't very often. And even if it was called more often, why would that be a reason to be concerned?
Updating secrets also lead to the restart of all driver pods.
No, it doesn't: https://kubernetes.io/docs/concepts/configuration/secret/#mounted-secrets-are-updated-automatically
It's still not perfect because the certificates may get rolled out on different nodes by kubelet at different times, which leads to a time window where controller and node cannot communicate. When designing the certificate update feature we need to take that into account, for example by keeping old and new CA root as trusted for a certain period of time.
For now I think it is okay that the system eventually recovers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which isn't very often. And even if it was called more often, why would that be a reason to be concerned?
Ok, If you feel it's not an issue.
No, it doesn't: https://kubernetes.io/docs/concepts/configuration/secret/#mounted-secrets-are-updated-automatically
It's still not perfect because the certificates may get rolled out on different nodes by kubelet at different times, which leads to a time window where controller and node cannot communicate. When designing the certificate update feature we need to take that into account, for example by keeping old and new CA root as trusted for a certain period of time.
For now I think it is okay that the system eventually recovers.
Interesting, Thanks for sharing.
Spreading object creation across phases is unnecessarily complex and would have to be changed anyway when introducing the "is driver running" check.
While working on this I noticed that controller test errors were attributed to the helper functions instead of the actual call site where the assertion was made, so I changed that, too.