Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] rollback input package install on failure #182665

Merged
merged 7 commits into from
May 8, 2024

Conversation

juliaElastic
Copy link
Contributor

@juliaElastic juliaElastic commented May 6, 2024

Summary

Closes #181032

2 improvements on input package policy creation failure handling:

  • if the package was not installed initially, rolling back on failure
  • only saving es references with the input package installation if the templates are added successfully, to prevent issues with upgrade later if the references would contain invalid template names
    • this is needed if the input package was installed before attempting to add a package policy, in this case we don't want to completely uninstall the package on failure

To verify:
Custom Logs package uninstalled:

  • add Custom Logs integration with dataset name with a * in it e.g. generic*
  • the package policy creation is expected to fail
  • verify that the Custom Logs package is not installed

Custom Logs package installed:

  • Install Custom Logs package without package policy or add integration with the default dataset name to succeed
  • try adding another policy with dataset generic*
  • the package policy creation is expected to fail
  • verify that the Custom Logs package doesn't have any installed_es references with the invalid generic* prefix
    GET .kibana_ingest/_search?q=epm-packages.name:log

Checklist

@apmmachine
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@juliaElastic
Copy link
Contributor Author

/ci

@juliaElastic juliaElastic marked this pull request as ready for review May 6, 2024 15:08
@juliaElastic juliaElastic requested a review from a team as a code owner May 6, 2024 15:08
@juliaElastic
Copy link
Contributor Author

/ci

@kpollich kpollich added the Team:Fleet Team label for Observability Data Collection Fleet team label May 6, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@jen-huang
Copy link
Contributor

On a fresh local instance, triggering a failure on adding Custom Logs for the first time does seem to install the package, I am offered the option to uninstall it if I go back to the details page. Based on the PR description it seems that installation shouldn't happen in this case?:

image image

@juliaElastic
Copy link
Contributor Author

juliaElastic commented May 7, 2024

On a fresh local instance, triggering a failure on adding Custom Logs for the first time does seem to install the package, I am offered the option to uninstall it if I go back to the details page. Based on the PR description it seems that installation shouldn't happen in this case?:

Yes, it shouldn't, I'll take a look.

Hmm, I can't reproduce it, do you see something like this in kibana logs?

[2024-05-07T09:12:38.807+02:00][ERROR][plugins.fleet] Error while creating package policy due to error: invalid_index_template_exception
        Root causes:
                invalid_index_template_exception: index_template [logs-generic*@package] invalid, cause [Validation Failed: 1: name must not contain a '*';]
[2024-05-07T09:12:38.808+02:00][INFO ][plugins.fleet] rollback log-2.3.1 package installation after error

@jen-huang
Copy link
Contributor

@juliaElastic Yes, I did get that error:

[2024-05-07T08:50:41.818-07:00][ERROR][plugins.fleet] Error while creating package policy due to error: invalid_index_template_exception
	Root causes:
		invalid_index_template_exception: index_template [logs-jen*@package] invalid, cause [Validation Failed: 1: name must not contain a '*';]
[2024-05-07T08:50:41.819-07:00][INFO ][plugins.fleet] rollback log-2.3.1 package installation after error

Funnily enough, I just tried it again after uninstalling the package and this time it wasn't installed after getting the error.

After wiping my ES and restarting ES and Kibana, I am able to reproduce it again. This time the above error does not appear in the logs. Perhaps it is something with the package cache?:

[2024-05-07T08:55:59.779-07:00][INFO ][plugins.fleet] Install with enablePackagesStateMachine - Starting installation of log@2.3.1 from registry 
[2024-05-07T08:55:59.905-07:00][WARN ][plugins.fleet] Not performing package verification as no local verification key found
[2024-05-07T08:55:59.934-07:00][INFO ][plugins.fleet] Install with enablePackagesStateMachine - Starting installation of elastic_agent@1.19.0 from registry 
[2024-05-07T08:56:00.042-07:00][WARN ][plugins.fleet] Not performing package verification as no local verification key found
[2024-05-07T08:56:00.072-07:00][INFO ][plugins.fleet] Install with enablePackagesStateMachine - Starting installation of system@1.55.2 from registry 
[2024-05-07T08:56:02.790-07:00][WARN ][http.server.Kibana] Event loop utilization for /kor/api/fleet/epm/packages/_bulk exceeded threshold of 250ms (632ms out of 3347ms) and 15% (19%) 
[2024-05-07T08:56:06.721-07:00][INFO ][plugins.fleet] Secrets storage is disabled as minimum fleet server version has not been met
[2024-05-07T08:56:06.901-07:00][WARN ][plugins.fleet] Unable to get fleet server hosts for policy 7946cf68-69be-4499-af3e-94aa4667f5ac: Default Fleet Server host is not setup
[2024-05-07T08:56:08.508-07:00][INFO ][plugins.fleet] Secrets storage is disabled as minimum fleet server version has not been met
[2024-05-07T08:56:08.591-07:00][ERROR][plugins.fleet] Error while creating package policy due to error: invalid_index_template_exception
	Root causes:
		invalid_index_template_exception: index_template [logs-jen*@package] invalid, cause [Validation Failed: 1: name must not contain a '*';]
[2024-05-07T08:56:23.216-07:00][INFO ][plugins.fleet] Secrets storage is disabled as minimum fleet server version has not been met

@juliaElastic
Copy link
Contributor Author

juliaElastic commented May 8, 2024

@jen-huang I could reproduce now, it happens when the package is not installed and the add integration also includes creation of a new agent policy.
It seems the packages are installed in one request before calling the /package_policies API. I'll try to fix it.

[2024-05-08T09:14:59.644+02:00][DEBUG][plugins.fleet] kicking off bulk install of log, system, elastic_agent
[2024-05-08T09:14:59.646+02:00][DEBUG][plugins.fleet] Kicking off install of log-2.3.1 from registry

I added a fix to remove the input package from the _bulk API request, to be able to rollback as part of the package policy creation.

Copy link
Contributor

@jen-huang jen-huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested both cases locally and code LGTM 🚢

PS now it's easier to validate ES assets on the Assets tab :)

@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #31 / alerting api integration security and spaces enabled - Group 2 Alerts legacy alerts alerts "after all" hook in "alerts"
  • [job] [logs] FTR Configs #31 / alerting api integration security and spaces enabled - Group 2 Alerts legacy alerts alerts "before all" hook in "alerts"
  • [job] [logs] Jest Tests #7 / Lens Field Item should pass add filter callback and pass result to filter manager
  • [job] [logs] Jest Tests #7 / Lens Field Item should request field stats every time the button is clicked

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
fleet 1.3MB 1.3MB +18.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @juliaElastic

@jen-huang jen-huang merged commit 0833045 into elastic:main May 8, 2024
27 checks passed
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request May 8, 2024
## Summary

Closes elastic#181032

2 improvements on input package policy creation failure handling:
- if the package was not installed initially, rolling back on failure
- only saving es references with the input package installation if the
templates are added successfully, to prevent issues with upgrade later
if the references would contain invalid template names
- this is needed if the input package was installed before attempting to
add a package policy, in this case we don't want to completely uninstall
the package on failure

To verify:
Custom Logs package uninstalled:
- add Custom Logs integration with dataset name with a * in it e.g.
`generic*`
- the package policy creation is expected to fail
- verify that the Custom Logs package is not installed

Custom Logs package installed:
- Install Custom Logs package without package policy or add integration
with the default dataset name to succeed
- try adding another policy with dataset `generic*`
- the package policy creation is expected to fail
- verify that the Custom Logs package doesn't have any `installed_es`
references with the invalid `generic*` prefix
`GET .kibana_ingest/_search?q=epm-packages.name:log`

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: Jen Huang <its.jenetic@gmail.com>
(cherry picked from commit 0833045)
@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.14

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request May 8, 2024
…182983)

# Backport

This will backport the following commits from `main` to `8.14`:
- [[Fleet] rollback input package install on failure
(#182665)](#182665)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Julia
Bardi","email":"90178898+juliaElastic@users.noreply.github.com"},"sourceCommit":{"committedDate":"2024-05-08T17:54:21Z","message":"[Fleet]
rollback input package install on failure (#182665)\n\n##
Summary\r\n\r\nCloses
https://github.com/elastic/kibana/issues/181032\r\n\r\n2 improvements on
input package policy creation failure handling:\r\n- if the package was
not installed initially, rolling back on failure\r\n- only saving es
references with the input package installation if the\r\ntemplates are
added successfully, to prevent issues with upgrade later\r\nif the
references would contain invalid template names\r\n- this is needed if
the input package was installed before attempting to\r\nadd a package
policy, in this case we don't want to completely uninstall\r\nthe
package on failure\r\n\r\nTo verify:\r\nCustom Logs package
uninstalled:\r\n- add Custom Logs integration with dataset name with a *
in it e.g.\r\n`generic*`\r\n- the package policy creation is expected to
fail\r\n- verify that the Custom Logs package is not installed
\r\n\r\nCustom Logs package installed:\r\n- Install Custom Logs package
without package policy or add integration\r\nwith the default dataset
name to succeed\r\n- try adding another policy with dataset
`generic*`\r\n- the package policy creation is expected to fail\r\n-
verify that the Custom Logs package doesn't have any
`installed_es`\r\nreferences with the invalid `generic*` prefix\r\n`GET
.kibana_ingest/_search?q=epm-packages.name:log`\r\n\r\n\r\n###
Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: Jen Huang
<its.jenetic@gmail.com>","sha":"0833045a42cd0b0f788e3a743953f9e364705350","branchLabelMapping":{"^v8.15.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","Team:Fleet","v8.14.0","v8.15.0"],"title":"[Fleet]
rollback input package install on
failure","number":182665,"url":"https://github.com/elastic/kibana/pull/182665","mergeCommit":{"message":"[Fleet]
rollback input package install on failure (#182665)\n\n##
Summary\r\n\r\nCloses
https://github.com/elastic/kibana/issues/181032\r\n\r\n2 improvements on
input package policy creation failure handling:\r\n- if the package was
not installed initially, rolling back on failure\r\n- only saving es
references with the input package installation if the\r\ntemplates are
added successfully, to prevent issues with upgrade later\r\nif the
references would contain invalid template names\r\n- this is needed if
the input package was installed before attempting to\r\nadd a package
policy, in this case we don't want to completely uninstall\r\nthe
package on failure\r\n\r\nTo verify:\r\nCustom Logs package
uninstalled:\r\n- add Custom Logs integration with dataset name with a *
in it e.g.\r\n`generic*`\r\n- the package policy creation is expected to
fail\r\n- verify that the Custom Logs package is not installed
\r\n\r\nCustom Logs package installed:\r\n- Install Custom Logs package
without package policy or add integration\r\nwith the default dataset
name to succeed\r\n- try adding another policy with dataset
`generic*`\r\n- the package policy creation is expected to fail\r\n-
verify that the Custom Logs package doesn't have any
`installed_es`\r\nreferences with the invalid `generic*` prefix\r\n`GET
.kibana_ingest/_search?q=epm-packages.name:log`\r\n\r\n\r\n###
Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: Jen Huang
<its.jenetic@gmail.com>","sha":"0833045a42cd0b0f788e3a743953f9e364705350"}},"sourceBranch":"main","suggestedTargetBranches":["8.14"],"targetPullRequestStates":[{"branch":"8.14","label":"v8.14.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.15.0","branchLabelMappingKey":"^v8.15.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/182665","number":182665,"mergeCommit":{"message":"[Fleet]
rollback input package install on failure (#182665)\n\n##
Summary\r\n\r\nCloses
https://github.com/elastic/kibana/issues/181032\r\n\r\n2 improvements on
input package policy creation failure handling:\r\n- if the package was
not installed initially, rolling back on failure\r\n- only saving es
references with the input package installation if the\r\ntemplates are
added successfully, to prevent issues with upgrade later\r\nif the
references would contain invalid template names\r\n- this is needed if
the input package was installed before attempting to\r\nadd a package
policy, in this case we don't want to completely uninstall\r\nthe
package on failure\r\n\r\nTo verify:\r\nCustom Logs package
uninstalled:\r\n- add Custom Logs integration with dataset name with a *
in it e.g.\r\n`generic*`\r\n- the package policy creation is expected to
fail\r\n- verify that the Custom Logs package is not installed
\r\n\r\nCustom Logs package installed:\r\n- Install Custom Logs package
without package policy or add integration\r\nwith the default dataset
name to succeed\r\n- try adding another policy with dataset
`generic*`\r\n- the package policy creation is expected to fail\r\n-
verify that the Custom Logs package doesn't have any
`installed_es`\r\nreferences with the invalid `generic*` prefix\r\n`GET
.kibana_ingest/_search?q=epm-packages.name:log`\r\n\r\n\r\n###
Checklist\r\n\r\n- [x] [Unit or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common
scenarios\r\n\r\n---------\r\n\r\nCo-authored-by: Jen Huang
<its.jenetic@gmail.com>","sha":"0833045a42cd0b0f788e3a743953f9e364705350"}}]}]
BACKPORT-->

Co-authored-by: Julia Bardi <90178898+juliaElastic@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:fix Team:Fleet Team label for Observability Data Collection Fleet team v8.14.0 v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Fleet] Package installation not rolled back correctly on failure
7 participants