Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Explain Log Rate Spikes: Fix error handling. #142047

Merged
merged 13 commits into from
Oct 4, 2022

Conversation

walterra
Copy link
Contributor

@walterra walterra commented Sep 28, 2022

Summary

Part of #138117.

  • Fixes error handling that before was not providing enough information for debugging purposes and support. This will now output more fine grained error information to the Kibana server log. The analysis is now more resilient to errors for individual queries. For example, we don't stop the analysis anymore if individual queries for p-values or histograms fail.
  • Moves the error callout above all other possible elements like empty prompts when the analysis doesn't return results.

Checklist

@walterra walterra added bug Fixes for quality problems that affect the customer experience release_note:fix ci:build-cloud-image labels Sep 28, 2022
@walterra walterra self-assigned this Sep 28, 2022
@walterra walterra added v8.5.0 v8.6.0 :ml Feature:ML/AIOps ML AIOps features: Change Point Detection, Log Pattern Analysis, Log Rate Analysis labels Sep 29, 2022
@walterra walterra marked this pull request as ready for review September 29, 2022 16:29
@walterra walterra requested a review from a team as a code owner September 29, 2022 16:29
@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

logger.error(
`Failed to transform field/value pairs into groups, got: \n${e.toString()}`
);
pushError(`Failed to transform field/value pairs into groups.`);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: there's an extra space here after 'pairs'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c9eed74.

} catch (e) {
logger.error(`Failed to fetch the overall histogram data, got: \n${e.toString()}`);
pushError(`Failed to fetch overall histogram data.`);
// Still continue the analysis even if loading the overall histogram failes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: failes -> fails + suggestion to internationalize this message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed typo in c9eed74. i18n will be done in a follow up.

@walterra
Copy link
Contributor Author

walterra commented Oct 3, 2022

@qn895 Thanks for the review, this is ready for another look!

@@ -35,6 +35,9 @@ export function streamReducer(
return action.reduce(streamReducer, state);
}

// eslint-disable-next-line no-console
console.log('action', action);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this console.log is here intentionally... In which case might be good to have a better log prefix message. But this is a super duper nit and is not an issue. Please feel free to merge 😄 .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not mentioning this, it's just for some temporary cloud testing and will be deleted before merging!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 2bc9fce.

@qn895
Copy link
Member

qn895 commented Oct 3, 2022

Code LGTM 🎉

Copy link
Contributor

@peteharverson peteharverson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #14 / alerting api integration spaces only Alerting runSoon should successfully run rule where scheduled task id is same as rule id

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
aiops 765.9KB 765.9KB +4.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @walterra

@walterra walterra merged commit 4753d7c into elastic:main Oct 4, 2022
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Oct 4, 2022
- Fixes error handling that before was not providing enough information for debugging purposes and support. This will now output more fine grained error information to the Kibana server log. The analysis is now more resilient to errors for individual queries. For example, we don't stop the analysis anymore if individual queries for p-values or histograms fail.
- Moves the error callout above all other possible elements like empty prompts when the analysis doesn't return results.

(cherry picked from commit 4753d7c)
@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.5

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request Oct 4, 2022
- Fixes error handling that before was not providing enough information for debugging purposes and support. This will now output more fine grained error information to the Kibana server log. The analysis is now more resilient to errors for individual queries. For example, we don't stop the analysis anymore if individual queries for p-values or histograms fail.
- Moves the error callout above all other possible elements like empty prompts when the analysis doesn't return results.

(cherry picked from commit 4753d7c)

Co-authored-by: Walter Rafelsberger <walter.rafelsberger@elastic.co>
WafaaNasr pushed a commit to WafaaNasr/kibana that referenced this pull request Oct 11, 2022
- Fixes error handling that before was not providing enough information for debugging purposes and support. This will now output more fine grained error information to the Kibana server log. The analysis is now more resilient to errors for individual queries. For example, we don't stop the analysis anymore if individual queries for p-values or histograms fail.
- Moves the error callout above all other possible elements like empty prompts when the analysis doesn't return results.
WafaaNasr pushed a commit to WafaaNasr/kibana that referenced this pull request Oct 14, 2022
- Fixes error handling that before was not providing enough information for debugging purposes and support. This will now output more fine grained error information to the Kibana server log. The analysis is now more resilient to errors for individual queries. For example, we don't stop the analysis anymore if individual queries for p-values or histograms fail.
- Moves the error callout above all other possible elements like empty prompts when the analysis doesn't return results.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience ci:build-cloud-image Feature:ML/AIOps ML AIOps features: Change Point Detection, Log Pattern Analysis, Log Rate Analysis :ml release_note:fix v8.5.0 v8.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants