Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Logs UI] Register log threshold rule as lifecycle rule #104341

Merged
merged 19 commits into from
Jul 13, 2021

Conversation

weltenwort
Copy link
Member

@weltenwort weltenwort commented Jul 5, 2021

📝 Summary

This makes the log threshold alert lifecycle-aware.

closes #98379
closes #104857

🎨 Previews

image

🕵️ Review notes

  • This factors out the core of the createLifecycleRuleType factory into a createLifecycleExecutor. The benefits are a reduced API surface and less tight coupling to the alerting framework. The old helper calls the new one internally but is left in place for now to avoid unnecessary changes to the APM plugin.
  • One debatable choice made here is to construct the value of the "reason" column in the browser. To achieve this the params are serialized into a log-threshold-rule-namespaced field in the doc. Discussions are ongoing whether the reason should be generated in the executor and indexed as-is, which would prevent internationalization. I stuck with how APM does it for now.

@weltenwort weltenwort added Feature:Logs UI Logs UI feature Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v7.15.0 labels Jul 5, 2021
@weltenwort weltenwort self-assigned this Jul 5, 2021
@weltenwort weltenwort marked this pull request as ready for review July 7, 2021 16:56
@weltenwort weltenwort requested review from a team as code owners July 7, 2021 16:56
@elasticmachine
Copy link
Contributor

Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui)

@weltenwort weltenwort requested a review from mgiota July 7, 2021 16:59
@weltenwort weltenwort added release_note:skip Skip the PR/issue when compiling release notes Theme: rac label obsolete labels Jul 7, 2021
@botelastic botelastic bot added the Team:APM All issues that need APM UI Team support label Jul 7, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/apm-ui (Team:apm)

@weltenwort
Copy link
Member Author

@elasticmachine merge upstream

@weltenwort
Copy link
Member Author

@mgiota I've changed the copy of the ratio reason. This should be ready for another (hopefully final) look.

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
infra 881 887 +6

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
observability 210 214 +4
ruleRegistry 49 58 +9
total +13

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
infra 1.7MB 1.7MB +10.0B

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
observability 10 9 -1
ruleRegistry 8 10 +2
total +1

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
infra 142.5KB 147.2KB +4.7KB
Unknown metric groups

API count

id before after diff
observability 210 214 +4
ruleRegistry 49 58 +9
total +13

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @weltenwort


const ruleExecutorData = getRuleData(options);

const state = getOrElse(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weltenwort I see you refactored this part. My understanding is that this refactoring was done to accommodate generic values. I don't fully understand the isLeft part in the initial code, so can you verify that the refactored code has the same result as before?

const decodedState = wrappedStateRt.decode(previousState);

      const state = isLeft(decodedState)
        ? {
            wrapped: previousState,
            trackedAlerts: {},
          }
        : decodedState.right;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isLeft will return true if the value could not be decoded ("left"). In that case, we wrap the state returned by the wrapped executor. If it can be successfully decoded ("right"), we know that it is a state managed by the wrapper, so we return it as-is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgieselaar Thanks for clarification!

@mgiota
Copy link
Contributor

mgiota commented Jul 12, 2021

@weltenwort I created a log threshold rule type and here's what I got regarding log_threshold_rule.serialized_params
Screenshot 2021-07-12 at 23 50 46

@mgiota
Copy link
Contributor

mgiota commented Jul 12, 2021

@weltenwort I did a few tests:

Mute the rule

  • In devtools GET .alerts-pamitsop-observability.logs-000001*/_search, check the number of alerts hits.total.value
  • Mute the rule from Stack Management
  • Check hits.total.value -> it kept increasing.Is it expected?

Disable the rule

  • In devtools GET .alerts-pamitsop-observability.logs-000001*/_search, check the number of alerts hits.total.value
  • Mute the rule from Stack Management
  • Check hits.total.value -> it stopped increasing

Recover

  • Edit above rule in Stack Management to stop firing
  • Verify it says Recovered in Stack Management
  • On Devtools GET .alerts-pamitsop-observability.logs-000001/_search I manually checked kibana.rac.alert.status and it was still open. I would actually expect this value to change, since we talk about mutable alerts, right? I have 2 questions here:
  1. What is the kibana.rac.alert.status we use for recovered state?
  2. What is the Elastic search command I can use on Dev tools to filter by that state?

[logThresholdRuleDataNamespace]: {
properties: {
serialized_params: {
type: 'keyword',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, why is this a keyword?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type: "keyword" and index: false seemed to be the most parsimonious field configuration that doesn't cause a mapping explosion but is retrievable via fields (which is what the alerts table search strategy uses). I would have preferred type: "object" and enabled: false, but then the field can't be retrieved via fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afgomez Back in this PR Could this be related to the issue you have been discussing with @jasonrhodes #113003 ?

I didn't completely follow up the discussion you had there, I just though it might be worth mentioning that Felix was indexing params in this PR (https://github.com/elastic/kibana/pull/104341/files?file-filters%5B%5D=.ts#diff-4b27f4f59bd7ebc79225d9e379a663944592f8595e811a156ea75f66dfba679a) that you might want to have a look at.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgiota thanks! It seems this code has been removed since this PR was merged though

@@ -9,9 +9,14 @@
* registering a new instance of the rule data client
* in a new plugin will require updating the below data structure
* to include the index name where the alerts as data will be written to.
*
* This doesn't work in combination with the `xpack.ruleRegistry.index`
* setting, with which the user can change the index prefix.
*/
export const mapConsumerToIndexName = {
apm: '.alerts-observability-apm',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be changed to .alerts-observability.apm?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it's not what this PR is about. It needs to be changed in multiple places. It's tracked in #102089.

@@ -37,3 +38,14 @@ export function getRuleExecutorData(
[PRODUCER]: type.producer,
};
}

export function getRuleData(options: AlertExecutorOptions<any, any, any, any, any>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need the other function based on the rule type definition? (getRuleExecutorData)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I forgot to remove it 👍

Copy link
Member

@dgieselaar dgieselaar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! I do hope that we can clean up the types at some point.

@weltenwort
Copy link
Member Author

@mgiota thanks for the extensive testing!

Mute the rule -> hits.total.value kept increasing

Muting a rule just disables the actions from being scheduled, but the executor still runs. Writing the documents is "internal bookkeeping" that is performed in the executor and is independent of the user-defined actions. So yes, this is expected. The alert should still show up in the alerts tables even though notifications are muted.

Disable the rule -> hits.total.value stopped increasing

Disabling a rule stop the check from being scheduled, which means the executor doesn't run. So this is expected as well.

@mgiota does that make sense so far?

Edit a rule to resolve -> kibana.rac.alert.status remains open

I think this is a quirk of the alerting framework. It seems to forget prior state when you save an edited alerts as if the alert was new. Since the executor of the "old version" doesn't have a chance to run it can't update the alert status. I expect this would also happen if an alert is deleted altogether. @dgieselaar has this been discussed before? How do we deal with indefinitely-open alerts like this? We would need to hook into the edit/delete lifecycle of the alert somehow.

@mgiota to filter for that you could add a query section to your request as in:

GET .alerts-pamitsop-observability*/_search
{
  "query": {
    "terms": {
      "kibana.rac.alert.status": [
        "open"
      ]
    }
  },
  "sort": [
    {
      "@timestamp": {
        "order": "desc"
      }
    }
  ]
}

@weltenwort weltenwort added the auto-backport Deprecated - use backport:version if exact versions are needed label Jul 13, 2021
@weltenwort weltenwort merged commit e9f42d2 into elastic:master Jul 13, 2021
@weltenwort weltenwort deleted the rac-helpers-as-executor-wrappers branch July 13, 2021 09:59
kibanamachine added a commit to kibanamachine/kibana that referenced this pull request Jul 13, 2021
Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
7.x

This backport PR will be merged automatically after passing CI.

kibanamachine added a commit that referenced this pull request Jul 13, 2021
…05405)

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

Co-authored-by: Felix Stürmer <weltenwort@users.noreply.github.com>
jloleysens added a commit to jloleysens/kibana that referenced this pull request Jul 13, 2021
…-png-pdf-report-type

* 'master' of github.com:elastic/kibana: (292 commits)
  bring back KQL autocomplete in timeline + fix last updated (elastic#105380)
  [Maps] Change TOC pop-up wording to reflect filter change, not search bar change (elastic#105163)
  Updating urls to upstream elastic repo (elastic#105250)
  [Maps] Move new vector layer wizard card down (elastic#104797)
  Exclude registering the cases feature if not enabled (elastic#105292)
  [Uptime] Alerts - Monitor status alert - check monitor status by monitor.timespan (elastic#104541)
  updated UI copy (elastic#105184)
  Log a warning when documents of unknown types are detected during migration (elastic#105213)
  [Logs UI] Register log threshold rule as lifecycle rule (elastic#104341)
  [Ingest pipelines] add network direction processor (elastic#103436)
  [Console] Autocomplete definitions (manual backport) (elastic#105086)
  [Security Solution] User can make Exceptions for Memory protection alerts (elastic#102196)
  [Lens] Formula: add validation for multiple field/metrics (elastic#104092)
  Removing async from file upload and data visualizer plugins start lifecycle (elastic#105197)
  Fix error when validating the form with non blocking validations (elastic#103629)
  [ML] Fix "View by" swim lane with applied filter and sorting by score  (elastic#105217)
  Update dependency @elastic/charts to v32 (elastic#104625)
  [CTI] shortens large numbers on Dashboard Link Panel (elastic#105269)
  [Security Solution][Endpoint][Host Isolation] Fixes bug to remove excess host metadata status toasts on non user initiated errors (elastic#105331)
  [Cases] Fix pushing alerts count on every push to external service (elastic#105030)
  ...

# Conflicts:
#	x-pack/plugins/reporting/common/types.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed Feature:Logs UI Logs UI feature release_note:skip Skip the PR/issue when compiling release notes Team:APM All issues that need APM UI Team support Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services Theme: rac label obsolete v7.15.0
Projects
None yet
6 participants