Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide better error messages for OOM browser scenarios #605

Open
andrewvc opened this issue Aug 8, 2022 · 5 comments
Open

Provide better error messages for OOM browser scenarios #605

andrewvc opened this issue Aug 8, 2022 · 5 comments
Labels
enhancement New feature or request Team:Uptime Label for the Uptime team

Comments

@andrewvc
Copy link
Contributor

andrewvc commented Aug 8, 2022

It's easy, when using browser-based monitors, to provision a docker container with insufficient memory. In these scenarios chrome frequently crash, being killed by the OOM killer. These errors are hard to understand for users. We should enhance the error messages to provide specific guidance that lack of memory is a likely culprit. While we have other issues like elastic/beats#32317 and elastic/beats#23687 that aim to be more proactive about memory issues, when failures do occur we should provide specific guidance.

This issue proposes that we add a note about memory utilization to any errors related to chrome crashes.

@andrewvc andrewvc added enhancement New feature or request Team:Uptime Label for the Uptime team labels Aug 8, 2022
@elasticmachine
Copy link

Pinging @elastic/uptime (Team:Uptime)

@andrewvc
Copy link
Contributor Author

andrewvc commented Sep 20, 2022

As a note, today when exceeding memory limits, when the OOM killer kicks in you get the following for a node JS kill

# tested with -m 80m
{"log.level":"warn","@timestamp":"2022-09-20T00:45:21.398Z","log.origin":{"file.name":"synthexec/synthexec.go","file.line":280},"message":"Error executing command '/usr/share/heartbeat/.node/node/bin/elastic-synthetics elastic-synthetics --screenshots on --inline --rich-events' (-1): signal: killed","service.name":"heartbeat","ecs.version":"1.6.0"}

Boosting the memory slight to 120m node can start, but the browser is killed. Interestingly this causes node to succeed, but the step to fail with the following eror:

image

@andrewvc
Copy link
Contributor Author

The error event looks like:

      {
        "_index": ".ds-synthetics-browser-default-2022.08.22-000003",
        "_id": "0apiWIMBLxoT0iGx4bJq",
        "_score": null,
        "_source": {
          "summary": {
            "up": 0,
            "down": 1
          },
          "agent": {
            "name": "docker-desktop",
            "id": "959ebb20-beca-45db-a143-7acb7d0c299e",
            "type": "heartbeat",
            "ephemeral_id": "281781e7-650f-40e8-bf37-2ff1a0bc7bc4",
            "version": "8.4.1"
          },
          "@timestamp": "2022-09-20T00:53:42.146Z",
          "ecs": {
            "version": "8.0.0"
          },
          "data_stream": {
            "namespace": "default",
            "type": "synthetics",
            "dataset": "browser"
          },
          "synthetics": {
            "journey": {
              "name": "inline",
              "id": "inline",
              "tags": null
            },
            "type": "heartbeat/summary"
          },
          "monitor": {
            "duration": {
              "us": 268403
            },
            "name": "No Mem",
            "id": "no-mem",
            "timespan": {
              "lt": "2022-09-20T00:54:42.191Z",
              "gte": "2022-09-20T00:53:42.191Z"
            },
            "check_group": "ab348eef-387e-11ed-a397-f64628dbf41f",
            "type": "browser",
            "status": "down"
          },
          "error": {
            "code": "",
            "stack_trace": """page.goto: Navigation failed because page crashed!
=========================== logs ===========================
navigating to "https://www.nytimes.com/", waiting until "load"
============================================================
    at Step.eval [as callback] (eval at loadInlineScript (/usr/share/heartbeat/.node/node/lib/node_modules/@elastic/synthetics/src/loader.ts:89:20), <anonymous>:3:48)
    at Runner.runStep (/usr/share/heartbeat/.node/node/lib/node_modules/@elastic/synthetics/src/core/runner.ts:211:18)
    at async Runner.runSteps (/usr/share/heartbeat/.node/node/lib/node_modules/@elastic/synthetics/src/core/runner.ts:261:16)
    at async Runner.runJourney (/usr/share/heartbeat/.node/node/lib/node_modules/@elastic/synthetics/src/core/runner.ts:351:27)
    at async Runner.run (/usr/share/heartbeat/.node/node/lib/node_modules/@elastic/synthetics/src/core/runner.ts:447:11)
    at async Command.<anonymous> (/usr/share/heartbeat/.node/node/lib/node_modules/@elastic/synthetics/src/cli.ts:132:23)""",
            "message": "error executing step: page.goto: Navigation failed because page crashed!",
            "type": "Error"
          },
          "event": {
            "agent_id_status": "auth_metadata_missing",
            "ingested": "2022-09-20T00:53:38Z",
            "type": "heartbeat/summary",
            "dataset": "browser"
          },
          "url": {
            "path": "/",
            "scheme": "https",
            "port": 443,
            "domain": "www.nytimes.com",
            "full": "https://www.nytimes.com/"
          }
        },
        "sort": [
          1663635222146
        ]
      }

@andrewvc
Copy link
Contributor Author

I think it probably makes the most sense for the synthetics lib to categorize this error, give it a proper code, rather than heartbeat. So I'm moving it to that repo.

@andrewvc andrewvc transferred this issue from elastic/beats Sep 20, 2022
@andrewvc andrewvc changed the title [Heartbeat] Provide better error messages for OOM browser scenarios Provide better error messages for OOM browser scenarios Sep 20, 2022
@andrewvc
Copy link
Contributor Author

We should remove this from 1.0 MVP scope, and instead will document the danger of setting browser concurrency levels too high on-prem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Team:Uptime Label for the Uptime team
Projects
None yet
Development

No branches or pull requests

2 participants