-
Notifications
You must be signed in to change notification settings - Fork 47
AKS: prometheus-operator gets installed before default storage class object is created #855
Comments
Another idea would be to have a post-install hook in Go, available for platforms, to wait for the storage class to show up. Similarly, we could by default wait for nodes to become ready CC @johananl |
Conceptually it could make more sense to me to have an optional pre-install hook for components rather than a post-install hook for platforms in the context of this problem. This is how I think about the problem you're describing: "Before installing component X I want to ensure that condition Y is met". Translating that into software, I would tie the logic to the component's installation rather than to the cluster's deployment even if chronologically the result is the same (at the moment!). Another thought: maybe this hints at a more generic problem called component dependencies. True, you can do whatever using a hook which allows executing arbitrary logic, but I can imagine more cases where we want to ensure the existence of components and/or their order of deployment, too. To summarize, I would first consider whether it makes sense to introduce a component dependency mechanism (can Helm help?), and only then I'd look for a less structured solution such as hooks. Lastly, if the hook is related to components, IMO it should be tied to components. |
Reading a 2nd time, I suspect the storage class is an AKS thing rather than another component here. If so, sounds like a component-specific hook could work, assuming that we want to enforce this logic for only some of the components. If we want to halt all component deployments until some condition is met, in this case it indeed makes sense to me to use a platform post-deployment hook. |
Thanks for your input @johananl.
I was thinking to shift the task more towards the platform side, as to me, this looks like after Terraform reports that AKS cluster has been created, the cluster didn't really converge yet.
I agree about pre-install hook for components, and yes, Helm should be able to help us in this case (though it will most likely require modifying the upstream chart if we decide to use that), however, I would expect more the component to fail if the default storage class is not defined, rather than wait for it indefinitely, which would be the case in this scenario.
We were thinking about dependency management for the components, but between the components and not between the component and the cluster state, which is more difficult to express (though such thing would perhaps also make sense). EDIT:
Exactly. |
To address issue with AKS having delayed creation of default StorageClass, which affects installing components which depend on the storage, this commit introduces PlatformPostApplyHook interface interface, which platforms will be able to implement to obtain kubeconfig file content after cluster is installed to run their own sanity cheks. In case of AKS, hook will be looping an waiting until the default storage class appears on the cluster. Refs #855 Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
To address issue with AKS having delayed creation of default StorageClass, which affects installing components which depend on the storage, this commit adds support for running optional PlatformPostApplyHook after cluster has been installed. Refs #855 Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
This commit implements newly introduced platform.PostApplyHook for AKS clusters, to address issue, where components depending on the storage gets installed when default storage class has not been yet created by the AKS controller, as this causes components to get stuck, which makes cluster provisioning to fail. The implemented hook lists available storage classes on the cluster and returns when class with default storage class annotation is found. Usually default storage class appears on the cluster within 5 minutes after cluster creation, so 10 minutes timeout seems like a sane default for this operation. Closes #855 Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
To address issue with AKS having delayed creation of default StorageClass, which affects installing components which depend on the storage, this commit introduces PlatformPostApplyHook interface interface, which platforms will be able to implement to obtain kubeconfig file content after cluster is installed to run their own sanity checks. In case of AKS, hook will be looping an waiting until the default storage class appears on the cluster. Refs #855 Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
To address issue with AKS having delayed creation of default StorageClass, which affects installing components which depend on the storage, this commit adds support for running optional platform.PostApplyHook after cluster has been installed. Refs #855 Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
This commit implements newly introduced platform.PostApplyHook for AKS clusters, to address issue, where components depending on the storage gets installed when default storage class has not been yet created by the AKS controller, as this causes components to get stuck, which makes cluster provisioning to fail. The implemented hook lists available storage classes on the cluster and returns when class with default storage class annotation is found. Usually default storage class appears on the cluster within 5 minutes after cluster creation, so 10 minutes timeout seems like a sane default for this operation. Closes #855 Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
This commit implements newly introduced platform.PostApplyHook for AKS clusters, to address issue, where components depending on the storage gets installed when default storage class has not been yet created by the AKS controller, as this causes components to get stuck, which makes cluster provisioning to fail. The implemented hook lists available storage classes on the cluster and returns when class with default storage class annotation is found. Usually default storage class appears on the cluster within 5 minutes after cluster creation, so 10 minutes timeout seems like a sane default for this operation. Closes #855 Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
To address issue with AKS having delayed creation of default StorageClass, which affects installing components which depend on the storage, this commit introduces PostApplyHook interface, which platforms will be able to implement to obtain kubeconfig file content after cluster is installed to run their own sanity checks. In case of AKS, hook will be looping and waiting until the default storage class appears on the cluster. Refs #855 Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
To address issue with AKS having delayed creation of default StorageClass, which affects installing components which depend on the storage, this commit adds support for running optional platform.PostApplyHook after cluster has been installed. Refs #855 Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
This commit implements newly introduced platform.PostApplyHook for AKS clusters, to address issue, where components depending on the storage gets installed when default storage class has not been yet created by the AKS controller, as this causes components to get stuck, which makes cluster provisioning to fail. The implemented hook lists available storage classes on the cluster and returns when class with default storage class annotation is found. Usually default storage class appears on the cluster within 5 minutes after cluster creation, so 10 minutes timeout seems like a sane default for this operation. Closes #855 Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
This commit implements newly introduced platform.PostApplyHook for AKS clusters, to address issue, where components depending on the storage gets installed when default storage class has not been yet created by the AKS controller, as this causes components to get stuck, which makes cluster provisioning to fail. The implemented hook lists available storage classes on the cluster and returns when class with default storage class annotation is found. Usually default storage class appears on the cluster within 5 minutes after cluster creation, so 10 minutes timeout seems like a sane default for this operation. Closes #855 Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
Created PR with a fix #886. |
To address issue with AKS having delayed creation of default StorageClass, which affects installing components which depend on the storage, this commit introduces PostApplyHook interface, which platforms will be able to implement to obtain kubeconfig file content after cluster is installed to run their own sanity checks. In case of AKS, hook will be looping and waiting until the default storage class appears on the cluster. Refs #855 Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
To address issue with AKS having delayed creation of default StorageClass, which affects installing components which depend on the storage, this commit adds support for running optional platform.PostApplyHook after cluster has been installed. Refs #855 Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
This commit implements newly introduced platform.PostApplyHook for AKS clusters, to address issue, where components depending on the storage gets installed when default storage class has not been yet created by the AKS controller, as this causes components to get stuck, which makes cluster provisioning to fail. The implemented hook lists available storage classes on the cluster and returns when class with default storage class annotation is found. Usually default storage class appears on the cluster within 5 minutes after cluster creation, so 10 minutes timeout seems like a sane default for this operation. Closes #855 Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
To address issue with AKS having delayed creation of default StorageClass, which affects installing components which depend on the storage, this commit introduces PlatformWithPostApplyHook interface, which platforms will be able to implement to obtain kubeconfig file content after cluster is installed to run their own sanity checks. In case of AKS, hook will be looping and waiting until the default storage class appears on the cluster. Refs #855 Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
To address issue with AKS having delayed creation of default StorageClass, which affects installing components which depend on the storage, this commit adds support for running optional platform.PostApplyHook after cluster has been installed. Refs #855 Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
This commit implements newly introduced platform.PlatformWithPostApplyHook for AKS clusters, to address issue, where components depending on the storage gets installed when default storage class has not been yet created by the AKS controller, as this causes components to get stuck, which makes cluster provisioning to fail. The implemented hook lists available storage classes on the cluster and returns when class with default storage class annotation is found. Usually default storage class appears on the cluster within 5 minutes after cluster creation, so 10 minutes timeout seems like a sane default for this operation. Closes #855 Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
This causes the CI to fail, as
prometheus-operator
component never converges.This is related to #559, however, to solve this one, we should perhaps make sure, that before AKS installation converges, default storage class becomes available. We could for example install the storage class on AKS explicitly via chart, to avoid waiting for the AKS to converge.
The text was updated successfully, but these errors were encountered: