-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Response Ops] RuleDataClient initialization fails if any alerts indices are snapshots #139969
Comments
Pinging @elastic/response-ops (Team:ResponseOps) |
If users have modified their alerts ILM policy to move alerts into the cold tier or frozen tiers, not being able to create new alerts is only the tip of the iceberg. The system was designed with the assumption that alerts would always be in the hot tier, so various functionality will behave erratically if alerts are moved out of this tier. The With regard to what we should do with users who have already modified their ILM policy, I don't know what the best course of action is here. We need to prompt our users that something is awry and have them fix it. While we don't need to modify the mappings of the old alerts indices at the moment, we might want to in the future, so just allowing this problem to remain un-remedied and make everything look like it's working correctly will just be kicking the can down the road. |
…0778) resolves #139969 Changes the ResourceInstaller to ignore cases when the elasticsearch simulateIndexTemplate() API returns an error or empty mappings, logging an error instead. This will hopefully allow initialization to continue to set up the alerts-as-data indices and backing resources for future indexing. Also adds _meta: { managed: true } to the ILM policy, which should show a warning in Kibana UX when attempting to make changes to the policy. Which was the cause of why simulateIndexTemplate() could return empty mappings.
…stic#140778) resolves elastic#139969 Changes the ResourceInstaller to ignore cases when the elasticsearch simulateIndexTemplate() API returns an error or empty mappings, logging an error instead. This will hopefully allow initialization to continue to set up the alerts-as-data indices and backing resources for future indexing. Also adds _meta: { managed: true } to the ILM policy, which should show a warning in Kibana UX when attempting to make changes to the policy. Which was the cause of why simulateIndexTemplate() could return empty mappings. (cherry picked from commit 01daf31)
…0778) (#141058) resolves #139969 Changes the ResourceInstaller to ignore cases when the elasticsearch simulateIndexTemplate() API returns an error or empty mappings, logging an error instead. This will hopefully allow initialization to continue to set up the alerts-as-data indices and backing resources for future indexing. Also adds _meta: { managed: true } to the ILM policy, which should show a warning in Kibana UX when attempting to make changes to the policy. Which was the cause of why simulateIndexTemplate() could return empty mappings. (cherry picked from commit 01daf31) Co-authored-by: Patrick Mueller <patrick.mueller@elastic.co>
…stic#140778) resolves elastic#139969 Changes the ResourceInstaller to ignore cases when the elasticsearch simulateIndexTemplate() API returns an error or empty mappings, logging an error instead. This will hopefully allow initialization to continue to set up the alerts-as-data indices and backing resources for future indexing. Also adds _meta: { managed: true } to the ILM policy, which should show a warning in Kibana UX when attempting to make changes to the policy. Which was the cause of why simulateIndexTemplate() could return empty mappings. (cherry picked from commit 01daf31) # Conflicts: # x-pack/plugins/rule_registry/server/rule_data_plugin_service/resource_installer.test.ts # x-pack/plugins/rule_registry/server/rule_data_plugin_service/resource_installer.ts
…0778) (#141097) resolves #139969 Changes the ResourceInstaller to ignore cases when the elasticsearch simulateIndexTemplate() API returns an error or empty mappings, logging an error instead. This will hopefully allow initialization to continue to set up the alerts-as-data indices and backing resources for future indexing. Also adds _meta: { managed: true } to the ILM policy, which should show a warning in Kibana UX when attempting to make changes to the policy. Which was the cause of why simulateIndexTemplate() could return empty mappings. (cherry picked from commit 01daf31) # Conflicts: # x-pack/plugins/rule_registry/server/rule_data_plugin_service/resource_installer.test.ts # x-pack/plugins/rule_registry/server/rule_data_plugin_service/resource_installer.ts
The first time a rule runs for a namespace and attempts to write an alert, the RuleDataClient creates the index template for that namespace and tries to apply the mappings from that template to any existing alerts indices for the namespace. Since the template we create does not explicitly specify all the mappings, instead referencing component templates, we gather the names of the existing indices and simulate the mappings that would be applied to those index names after installing the new template (here).
However, if alerts indices have moved to snapshots via ILM then the name that comes back when we fetch the existing indices will have either
restored-
orpartial-
as a prefix. When we pass these index names in to thesimulateIndexTemplate
API, the prefix causes the name not to match the installed index template, and the mappings come back empty. The empty mappings are then passed toputMapping
, which fails, throws an error, and disables writing new alerts.Example of how the names are behaving unexpectedly - the request below should restrict the response to include only index names that match
.internal.alerts-observability.metrics.alerts-default-*
, but it includespartial-.internal.alerts-observability.metrics.alerts-default-000006
:The alerts ILM policy that ships with Kibana keeps alerts indices in the hot phase indefinitely, however in some customer systems the ILM policies have been modified to move alerts to snapshots. This seems to work for customers until Kibana restarts and the RuleDataClient has to re-initialize (e.g. when they upgrade stack versions), at which point initialization fails and it appears that the upgrade broke their system.
Possible Fixes
While we don't support users making changes to the built in alerts ILM policy, a delayed failure that results in alerts not being written while the problem is debugged is a particularly bad failure mode. RuleDataClient initialization should not fail even in the presence of snapshotted alerts indices.
We don't really need to apply new mappings to old indices at the moment, since we don't have any runtime mappings that would actually affect indices that aren't the write index. So we could restrict the mapping update logic to the write index, which should never be a snapshot (hopefully?).
We could also try simulating the mappings only once using an index name we know will match the template rather than simulating each concrete index and having some of them fail. Then we could apply the mappings to every concrete index.
The text was updated successfully, but these errors were encountered: