Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7.12.0 upgrade migrations fail with timeout_exception or receive_timeout_transport_exception #95321

Closed
rudolf opened this issue Mar 24, 2021 · 5 comments · Fixed by #95305
Closed
Labels
bug Fixes for quality problems that affect the customer experience project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.12.0

Comments

@rudolf
Copy link
Contributor

rudolf commented Mar 24, 2021

v7.12.0 migrations can fail if the .kibana index has a large number of saved objects or the Elasticsearch cluster is under heavy load.

This will cause errors logs like:

[.kibana] [receive_timeout_transport_exception]: [instance-0000000002][10.42.1.112:19541][cluster:monitor/task/get] request_id [2648] timed out after [59940ms]
[.kibana] [timeout_exception]: Timed out waiting for completion of [org.elasticsearch.index.reindex.BulkByScrollTask@6a74c54]

The root cause is a bug in the v2 migrations that means Kibana will only wait for 60s for some of the steps. When a cluster is busy or there are a lot of fleet-agent-events _reindex or _update_by_query tasks can take longer than 60s leading to a timeout.

Workaround for large fleet-agent-events

If you've used fleet and have a large number of fleet-agent-events follow this workaround

  1. First establish that there are large number of fleet-agent-events:
    POST .kibana/_search?filter_path=aggregations
    {
      "aggs": {
        "saved_object_type": {
          "terms": {"field": "type"}
        }
      }
    }
    
  2. Shutdown Kibana
  3. Wait for any reindex tasks started by Kibana to complete. The output of the following request shouldn't show any reindex tasks for the .kibana* indices:
    GET  _tasks?actions=*update/byquery,reindex&group_by=parents&detailed
    # Cancel any tasks where the "description" field shows it's operating on a `.kibana*` index
    # e.g. `"description": "update-by-query [.kibana_7.12.0_001]"
    POST _tasks/<task id from above>/_cancel
    # Verify that all tasks have been cancelled:
    GET  _tasks?actions=*update/byquery,reindex&group_by=parents&detailed
    
  4. Delete fleet-agent-events
    POST .kibana,.kibana_7.12.0_*/_delete_by_query?conflicts=proceed&wait_for_completion=false
    {
      "query": {
        "bool": {
          "must": 
            {
              "term": {
                "type": "fleet-agent-events"
              }
            }
        }
      }
    }
    
    Check that the task returned from the above operation has completed before proceeding
    GET _tasks/<id>
    
  5. Restart Kibana

PR #95305

@rudolf rudolf added bug Fixes for quality problems that affect the customer experience v7.12.0 project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient labels Mar 24, 2021
@joshdover joshdover added the Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc label Mar 29, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@hueyg
Copy link

hueyg commented Jun 17, 2021

Forgive my ignorance, but how do you carry out these POST and GET commands if you cannot run Kibana to use the Console? I assume you can use CURL, but what exactly am I carrying out these commands against? What port?

I came here from the 7.13 documentation and suffering this exact issue from a cluster that I have had around since 6.x days. The upgrades where carried out from official elascticsearch repos enabled on RedHat systems.

@pgayvallet
Copy link
Contributor

but how do you carry out these POST and GET commands if you cannot run Kibana to use the Console

Any CLI or GUI client, such as curl or postman

e.g for curl

curl -XPOST "https://localhost:9200/.kibana/_search?filter_path=aggregations" -d'
{
  "aggs": {
    "saved_object_type": {
      "terms": {"field": "type"}
    }
  }'

What port?

Your configured ES port. The default is 9200

@hueyg
Copy link

hueyg commented Jun 17, 2021

but how do you carry out these POST and GET commands if you cannot run Kibana to use the Console

Any CLI or GUI client, such as curl or postman

e.g for curl

curl -XPOST "https://localhost:9200/.kibana/_search?filter_path=aggregations" -d'
{
  "aggs": {
    "saved_object_type": {
      "terms": {"field": "type"}
    }
  }'

What port?

Your configured ES port. The default is 9200

Thank you pgayvallet!

@wonderland14
Copy link

Hi @pgayvallet , may I have a sample POST for postman? I am trying to open the index, because my Kibana is not accessible because I accidentally closed the kibana_security and I think some other kibana's indices. I do research but still confused how come all indices will re-open. Sorry for my ignorance 'bout these

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.12.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants