-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RAC] [Observability] Alert documents are not updated as expected when the write index changes #110519
Comments
Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui) |
I think we can’t know for sure when the first rollover will happen after a user upgrades Kibana to 7.15.0 and new RAC indices are created. This is the default ILM policy that’s currently used for all the indices: export const defaultLifecyclePolicy = {
policy: {
phases: {
hot: {
actions: {
rollover: {
max_age: '90d',
max_size: '50gb',
},
},
},
delete: {
actions: {
delete: {},
},
},
},
},
}; Depending on the amount of alerts being written, this 50gb threshold can be reached faster than we release 7.15.1. I’d personally not take this risk and I feel like this issue should be addressed asap and shipped in 7.15.0. |
Just for additional context
You can simulate this behavior in Kibana DevTools (queries provided by @weltenwort):
The 2nd The call to
|
@weltenwort has an idea for a hotfix which makes sense to me: 1. Remove
|
We also think that exposing a raw |
Let me make sure I understand this with an example:
Is this roughly correct? If so, is this an accurate description of the proposed solution?
That would mean that an ongoing alert won't adopt the new mappings of an updated index while the alert continues to be active. I think that's ok (in fact, I assume this is exactly what we want, right?). If this is all pretty much accurate, this sounds like a great solution. And a HUGE +1 on abstracting some of this away in a slightly better API for the RuleDataService. |
@jasonrhodes I was about to comment about this line https://github.com/elastic/kibana/blob/master/x-pack/plugins/rule_registry/server/rule_data_client/rule_data_client.ts#L124, where we explicitly specify the
Indeed if the write uses the
Can somebody else confirm that @jasonrhodes's proposal to write to a |
Before jumping into the implementation I did a bit of testing on Kibana Dev tools with the suggested queries. On purpose I tested the scenario that multiple indices were writable at the same time and as I would expect I got this error.
I don't know much about the internal implementation, so I was wondering if we already prevent having multiple write indices at the same time. My guess is that either rollover mechanism or maybe the rule registry upgrade mechanism already takes care of that and we shouldn't worry about above error happening to our users. Am I right? |
@jasonrhodes yes, that's correct |
When I removed
Alerts table doesn't load any data in this case. Here's the error I get
@banderror @weltenwort any thoughts on that? One thing that is not clear to me is if I need to have this configuration |
@banderror We had a look with @weltenwort and we got it working now. Root of the problem was indeed the removal of So when first writing data, resources were not installed and this ended up in Alerts table not being rendered. So the fix was to:
|
@mgiota Sorry for the late reply. Yes, it kind of takes care about that implicitly. There are two cases where concrete indices can be created: initial index creation and rollover. In both of the cases we use deterministic names (we explicitly specify a name for the initial index and a name for the rollover target index); in both cases it's possible to have race conditions which are handled in the code, so it's not possible that 2 parallel rollovers create 2 different write indices, for example. |
🐞 Problem description
The lifecycle-aware observability rule executors use the
RuleDataClient
'sbulk()
method to both create and update the alert documents. Currently any updates targeting existing alerts will incorrectly duplicate the alert document when the write index of the alerts alias has changed (e.g. during a rollover).The reason for that is that the rule data client sets the bulk operation's target index to the alias, which resolves to the current write index. Indexing a document with an existing id by indexing again doesn't update documents in non-write indices. Instead, a new document with the id is created in the write index.
Depending on whether we expect roll-overs to happen within 7.15, this could be a bad bug. If we are confident rollovers won't happen, this could be deferred to 7.15.1.
💡 Possible solution
If the
bulk()
method of the rule data client removed therequire_alias
setting, the lifecycle helper could specify bulk update operations that target the respective concrete indices of the previously loaded alerts.The text was updated successfully, but these errors were encountered: