Add summary to journeys which don't emit journey:end (early node subprocess exits) #29606

lucasfcosta · 2021-12-24T17:06:25Z

What does this PR do?

This PR ensures that Heartbeat will add a summary to journeys in which the node subprocess for the synthetics runner crashes before emitting a journey:end event.

This change will ensure that all journeys will contain a summary, even the ones which cause the runner itself to exit earlier.

To better understand how Heartbeat interfaces with the synthetics runner, and thus understand this fix, see the section below.

Understanding this fix: how Heartbeat works with `elastic-synthetics`

Before I explain how to test this PR, it's worth explaining how Heartbeat and the Synthetics runner work together.

Currently, heartbeat will use the elastic-synthetics executable, which comes from the synthetics runner project, to run inline journeys (like the ones you define within a heartbeat.yml file). This executable will (or at least should) be available in the PATH.

beats/x-pack/heartbeat/monitors/browser/synthexec/synthexec.go

Line 69 in bea8e45

return exec.Command("elastic-synthetics", append(extraArgs, "--inline")...)

See the executable name defined here:

https://github.com/elastic/synthetics/blob/ea6eb357953e5a4c5ec3cd5a1ab031a03a6c918a/package.json#L22

When running the inline journey using the elastic-synthetics executable, it will pass it the --rich-events flag, which causes the runner itself to emit rich JSON events.

beats/x-pack/heartbeat/monitors/browser/synthexec/synthexec.go

Line 123 in bea8e45

cmd.Args = append(cmd.Args, "--rich-events")

Heartbeat can then easily use these JSON events to create SynthEvents, which it will then try to "enrich".

beats/x-pack/heartbeat/monitors/browser/synthexec/synthexec.go

Lines 240 to 252 in bea8e45

    
           // lineToSynthEventFactory is a factory that can take a line from the scanner and transform it into a *SynthEvent. 
        
           func lineToSynthEventFactory(typ string) func(bytes []byte, text string) (res *SynthEvent, err error) { 
        
           	return func(bytes []byte, text string) (res *SynthEvent, err error) { 
        
           		logp.Info("%s: %s", typ, text) 
        
           		return &SynthEvent{ 
        
           			Type:                 typ, 
        
           			TimestampEpochMicros: float64(time.Now().UnixNano() / int64(time.Millisecond)), 
        
           			Payload: map[string]interface{}{ 
        
           				"message": text, 
        
           			}, 
        
           		}, nil 
        
           	} 
        
           }

To see the runner emitting these events, try creating a simple journey, like the one below:

step('google', () => {
  page.goto('https://www.google.com');
});

step('BBC', () => {
  page.goto('https://www.bbc.co.uk');
});

And then, run it using cat ~/Repositories/synthetics/test/test.js | npx @elastic/synthetics --inline --rich-events. You will see that for each part of the journey execution, it will emit a different type of event, as defined in the code excerpt below, which comes from the recorder:

https://github.com/elastic/synthetics/blob/30c536e53d93495fa8824e31ffd99c01a95c0376/src/core/runner.ts#L117-L141

⭐ Tip: to make these events more human-readable, try installing jq and pipe your journey run into it (as in cat ~/Repositories/synthetics/test/test.js | npx @elastic/synthetics --inline --rich-events | jq)

For each of these events, Heartbeat would try to "enrich" them with more data:

beats/x-pack/heartbeat/monitors/browser/synthexec/enrich.go

Lines 104 to 146 in bea8e45

    
           func (je *journeyEnricher) enrichSynthEvent(event *beat.Event, se *SynthEvent) error { 
        
           	var jobErr error 
        
           	if se.Error != nil { 
        
           		jobErr = stepError(se.Error) 
        
           		je.errorCount++ 
        
           		if je.firstError == nil { 
        
           			je.firstError = jobErr 
        
           		} 
        
           	} 
        
           	switch se.Type { 
        
           	case "journey/end": 
        
           		je.journeyComplete = true 
        
           		return je.createSummary(event) 
        
           	case "step/end": 
        
           		je.stepCount++ 
        
           	case "step/screenshot": 
        
           		fallthrough 
        
           	case "step/screenshot_ref": 
        
           		fallthrough 
        
           	case "screenshot/block": 
        
           		add_data_stream.SetEventDataset(event, "browser.screenshot") 
        
           	case "journey/network_info": 
        
           		add_data_stream.SetEventDataset(event, "browser.network") 
        
           	} 
        
           	if se.Id != "" { 
        
           		event.SetID(se.Id) 
        
           		// This is only relevant for screenshots, which have a specific ID 
        
           		// In that case we always want to issue an update op 
        
           		event.Meta.Put(events.FieldMetaOpType, events.OpTypeCreate) 
        
           	} 
        
           	eventext.MergeEventFields(event, se.ToMap()) 
        
           	if je.urlFields == nil { 
        
           		if urlFields, err := event.GetValue("url"); err == nil { 
        
           			if ufMap, ok := urlFields.(common.MapStr); ok { 
        
           				je.urlFields = ufMap 
        
           			} 
        
           		} 
        
           	} 
        
           	return jobErr 
        
           }

Especially for journey:end, Heartbeat would try to add a summary to the emitted document, so that we know data about the whole run.

beats/x-pack/heartbeat/monitors/browser/synthexec/enrich.go

Lines 148 to 179 in bea8e45

    
           func (je *journeyEnricher) createSummary(event *beat.Event) error { 
        
           	var up, down int 
        
           	if je.errorCount > 0 { 
        
           		up = 0 
        
           		down = 1 
        
           	} else { 
        
           		up = 1 
        
           		down = 0 
        
           	} 
        
           	if je.journeyComplete { 
        
           		eventext.MergeEventFields(event, common.MapStr{ 
        
           			"url": je.urlFields, 
        
           			"synthetics": common.MapStr{ 
        
           				"type":    "heartbeat/summary", 
        
           				"journey": je.journey, 
        
           			}, 
        
           			"monitor": common.MapStr{ 
        
           				"duration": common.MapStr{ 
        
           					"us": int64(je.end.Sub(je.start) / time.Microsecond), 
        
           				}, 
        
           			}, 
        
           			"summary": common.MapStr{ 
        
           				"up":   up, 
        
           				"down": down, 
        
           			}, 
        
           		}) 
        
           		return je.firstError 
        
           	} 
        
           	return fmt.Errorf("journey did not finish executing, %d steps ran", je.stepCount) 
        
           }

Now, the problem we've had before this PR is that journeys which cause the elastic-synthetics executable (ran in a node subprocess) to crash, would not emit a journey:end event. Consequently, we wouldn't have a summary for those journeys.

You can see that no journey:end event is emitted by the synthetics runner when you run the following journey, for example:

step('google', () => {
  page.goto('https://www.google.com');
});

step('BBC', () => {
  page.goto('https://www.bbc.co.uk');
  process.exit(1);
});

After this change, we'll use the cmd/status SynthEvent also as a type of event which can be enriched with a summary.

beats/x-pack/heartbeat/monitors/browser/synthexec/synthexec.go

Lines 196 to 202 in 1386a04

    
           if err != nil { 
        
           	str := fmt.Sprintf("command exited with status %d: %s", cmd.ProcessState.ExitCode(), err) 
        
           	mpx.writeSynthEvent(&SynthEvent{ 
        
           		Type:  "cmd/status", 
        
           		Error: &SynthError{Name: "cmdexit", Message: str}, 
        
           	}) 
        
           	logp.Warn("Error executing command '%s' (%d): %s", loggableCmd.String(), cmd.ProcessState.ExitCode(), err)

Now, a critical part of this change was that we need a timestamp for the summary events when journeys fail and we emit a cmd/status. We need this timestamp so that we can set the end of the journey.

beats/x-pack/heartbeat/monitors/browser/synthexec/enrich.go

Lines 70 to 84 in 1386a04

    
           if !se.Timestamp().IsZero() { 
        
           	event.Timestamp = se.Timestamp() 
        
           	// Record start and end so we can calculate journey duration accurately later 
        
           	switch se.Type { 
        
           	case "journey/start": 
        
           		je.firstError = nil 
        
           		je.checkGroup = makeUuid() 
        
           		je.journey = se.Journey 
        
           		je.start = event.Timestamp 
        
           	case "journey/end": 
        
           		je.end = event.Timestamp 
        
           	} 
        
           } else { 
        
           	event.Timestamp = time.Now() 
        
           }

The problem with the cmd/status SynthEvent was that it did not have a TimestampEpochMicros field. It did not have this field because we're not parsing @timestamp fields emitted by the synthetics-runner. Instead, we're just constructing a SynthEvent ourselves.

beats/x-pack/heartbeat/monitors/browser/synthexec/synthexec.go

Lines 198 to 201 in 1386a04

    
           mpx.writeSynthEvent(&SynthEvent{ 
        
           	Type:  "cmd/status", 
        
           	Error: &SynthError{Name: "cmdexit", Message: str}, 
        
           })

I then tried to use the same method we use to generate timestamps for when capturing line events, shown below:

beats/x-pack/heartbeat/monitors/browser/synthexec/synthexec.go

Line 246 in 1386a04

TimestampEpochMicros: float64(time.Now().UnixNano() / int64(time.Millisecond)),

That procedure however, didn't yield correct results, causing timestamps generated in the method below to date back to 1970.

beats/x-pack/heartbeat/monitors/browser/synthexec/synthtypes.go

Lines 89 to 95 in 1386a04

    
           func (se SynthEvent) Timestamp() time.Time { 
        
           	seconds := se.TimestampEpochMicros / 1e6 
        
           	wholeSeconds := math.Floor(seconds) 
        
           	micros := (seconds - wholeSeconds) * 1e6 
        
           	nanos := micros * 1000 
        
           	return time.Unix(int64(wholeSeconds), int64(nanos)) 
        
           }

That problem took me a lot of debugging time, until I noticed summaries were just published a long time in the past.

After figuring that out, I simply used the new UnixMicro method (as in time.Now().UnixMicro()) to generate timestamps for cmd/status events, and it all worked out fine 😄

Why is it important?

This change is important because, in Kibana, we perform queries which look at a journey's summary to determine what to display.

Previously, the lack of a summary for our journeys has caused a monitor/check-group not to be displayed in Kibana, as pointed out in elastic/kibana#116850 (comment). That problem happened because we were looking for documents with a summary as a way of indicating whether a monitor/check-group had finished running. However, monitors which caused the node subprocess to exit early would never be shown because Heartbeat would never add a summary document them.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

I'd appreciate if reviewers could:

Double check whether my understanding of the problem and desired acceptance-criteria are correct.
Double check whether my manual testing is sufficient to prove the acceptance-criteria is met
Confirm you agree with what I've written in the "Further Changes" section
Verify whether my backport labels are correct (it's my first time contributing to this repo)

How to test this PR locally

Create a journey which will cause the node subprocess to exit, like the one below:

step('google', () => {
  page.goto('https://www.google.com');
});

step('BBC', () => {
  page.goto('https://www.bbc.co.uk');
  process.exit(1);
});

Confirm that the journey above does not emit a journey:end event by running it with the synthetics runner (you must have built it before running the command below).
```
$ cat ~/Repositories/synthetics/test/test.js | node ~/Repositories/synthetics/dist/cli.js --inline --rich-events
```

Create a heartbeat.yml pointing to your desired Elastic Stack for testing, and paste that journey into an inline monitor:

output.elasticsearch:
  hosts: ["http://localhost:9200"]
  username: "elastic"
  password: "changeme"

heartbeat.monitors:
- type: browser
  id: it-exits
  name: Journey that exits
  schedule: '@every 1m'
  source:
    inline:
      script: |-
        step('google', async () => {
          await page.goto('https://www.google.com');
        });

        step('BBC', async () => {
          await page.goto('https://www.bbc.co.uk');
          process.exit(1);
        });

Checkout this branch, and build Heartbeat (mage build within x-pack/heartbeat).

Run Heartbeat using the heartbeat.yml file from step 3

ELASTIC_SYNTHETICS_CAPABLE=true ./heartbeat -c /tmp/heartbeat.yml -e -d "*"

In Kibana, look for documents which match your new monitor's ID. You should now see summaries for all command failure events too.

GET /heartbeat-*/_search
{
  "sort": [
    {
      "@timestamp": {
        "order": "desc"
      }
    }
  ], 
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "monitor.id": "it-exits-inline"
          }
        },
        {
          "match": {
            "synthetics.type": "heartbeat/summary"
          }
        }
      ]
    }
  }
}

Other recommended checks

Verify that the journey which does not complete appears in the Uptime > Monitors page
Change the monitor's name and ID and rollback this change to confirm no summaries appear without this change

Related issues

Use cases

This PR allows users to see their monitors in the Uptime > Monitors page even when those monitors crash before being finished, which means it's easier to debug what happened to them.

Further Changes

As you test this PR, you will notice that the Uptime > Monitors page in Kibana will only display the step which did not crash. This behaviour occurs because we don't have a step/end event for the step which crashed.

Additionally, you will notice that journeys with syntax errors do not even start and therefore do not have summary. That's a different case from an early exit, instead, it's not even a start. Similar problems may occur when a journey can't even start for any other reason.

Therefore, I suggest that we create the two following issues:

Emit journey/start events (or, alternatively, journey/prepare) even when there are syntax errors or other conditions which prevent journeys from even starting
Update the UI so that the we display the step which causes the synthetics runner to crash

I considered the two issues above to be out of scope for this PR. Please let me know if you agree and if you're happy with the current behaviour.

mergify · 2021-12-24T17:06:59Z

This pull request does not have a backport label. Could you fix it @lucasfcosta? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-v./d./d./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

elasticmachine · 2021-12-24T17:52:29Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2022-01-11T09:29:07.958+0000
Duration: 124 min 2 sec
Commit: a5264c2

Test stats 🧪

Test	Results
Failed	0
Passed	47848
Skipped	4284
Total	52132

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages and run the E2E tests.
/beats-tester : Run the installation tests with beats-tester.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

…tic#28770]

elasticmachine · 2021-12-29T17:31:31Z

Pinging @elastic/uptime (Team:Uptime)

vigneshshanmugam

Great work Lucas, Left some comments.

vigneshshanmugam · 2022-01-04T22:24:18Z

x-pack/heartbeat/monitors/browser/synthexec/enrich.go

@@ -112,6 +114,9 @@ func (je *journeyEnricher) enrichSynthEvent(event *beat.Event, se *SynthEvent) e
 	}

 	switch se.Type {
+	case "cmd/status":


We should guard this summary creation event. If the user has afterAll hook that throws error, we would have both journey/end followed by cmd/status.

This is a sort of 'task failed successfully' situation. I guess there's a question here, does a failing after hook invalidate the test result at all? It feels weird to, say, trigger an alert based on an afterAll hook.

I'm wondering if we really would need to introduce a new concept of warnings into the schema / UI. Thoughts?

I don't think so, Lots of test runners don't consider failures in the afterAll hooks to be failures. They mainly exist to keep some cleanup mechanism. But having warnings in the UI sounds like a cool idea.

A comment on failures on afterX hooks

The afterAll hook triggering a failure is a tricky situation. I previously did some work on Jest's hooks here (jestjs/jest#8654) where this same discussion has come up.

Given how complex the discussion on hooks semantics is, we didn't get to a conclusion in that PR (which is why it's still open and not much has happened since then).

Anyway, I do feel like failures in afterHooks should be test failures. If you think about how hooks could be implemented using a procedural API under the hood, they end up being part of the upper test scope, and thus would trigger failures. For inspiration, see Deno's RFC on their testing API: denoland/deno#10771.

Proposed way forward

I'd consider that failing a journey would be out of the scope of this PR if you agree.

For now I'll follow @vigneshshanmugam suggestion of having both events given there's not necessarily success/failure tied to that (as journey/end indicates the journey for the hook finished anyway). However, **the cmd/status event will not include a summary as it refers to a journey itself.

IMO, moving forward, it would be great to have warnings based on cmd/status events as @andrewvc mentioned too.

I'd consider that failing a journey would be out of the scope of this PR if you agree.

++, Lets do this separately and open a issue in Synthetics runner.

Anyway, I do feel like failures in afterHooks should be test failures.

Mainly, in the runner i didn't go with this idea as we didn't have a good way of reporting the error. Warnings would make a lot of sense here.

vigneshshanmugam · 2022-01-04T22:25:05Z

x-pack/heartbeat/monitors/browser/synthexec/execmultiplexer_test.go

-		})
+		// We want one of the test journeys to end with a cmd/status indicating it failed
+		if jIdx != 4 {
+			testEvents = append(testEvents, &SynthEvent{


As per previous comment, lets add both and see what happens with summary creation.

vigneshshanmugam · 2022-01-04T22:26:22Z

x-pack/heartbeat/monitors/browser/synthexec/synthexec.go

@@ -243,7 +244,7 @@ func lineToSynthEventFactory(typ string) func(bytes []byte, text string) (res *S
 		logp.Info("%s: %s", typ, text)
 		return &SynthEvent{
 			Type:                 typ,
-			TimestampEpochMicros: float64(time.Now().UnixNano() / int64(time.Millisecond)),
+			TimestampEpochMicros: float64(time.Now().UnixMicro()),


I am not sure about the intended consequence of this change, I will leave it to @andrewvc who knows it better than me.

This change is because timestamps calculated in that way would yield negative values, and thus not appear in my queries (I was filtering for events in the last X years).

UnixMicro is directly what we want, so it also makes the code a bit more concise, unless I'm missing any other behaviours from the stdlib here.

I had no idea the UnixMicro method existed, As far as I'm concerned this is a nice improvement, so I'm +1 on fixing this here.

lucasfcosta · 2022-01-10T16:40:12Z

@vigneshshanmugam @andrewvc: I've just pushed an update to this PR which I've tested very thoroughly and for which I've also added a significant amount of new unit testing, but I'd like you to please make sure that:

I understood your ACs correctly with regards to skipping the creation of a summary document when an after[Each/All] hook fails
Test locally on your machine to verify it works appropriately (and double-check whether my manual testing below makes sense)

How to test the creation of summary documents when `afterX` hooks fail

Here's what I've done to test the further changes suggested by @vigneshshanmugam. Please let me know if I missed anything.

Failing `afterX` hook test

Create a monitor with a failing afterAll hook, like the one below

const { journey, step, afterAll } = require('@elastic/synthetics');

journey('failing_hook_journey', ({ page }) => {
  step('google', async () => {
    await page.goto('https://www.google.com');
  });

  step('BBC', async () => {
    await page.goto('https://www.bbc.co.uk');
  });
});

afterAll(() => {
  process.exit(1);
});

Name this file following the usual *.journey.js pattern and put it somewhere with a package.json. Then, configure Heartbeat to run it:

# ...
heartbeat.monitors:
- type: browser
  id: local-journeys-new
  name: my-hook-fails
  schedule: '@every 1m'
  source:
    local:
      path: "/your/file/path"

Start Hearbeat and wait for the monitor to run
Ensure you only have a single heartbeat/summary document for every run.
For that you can use a query like the following one:
```
monitor.name :"my-hook-fails - failing_hook_journey'" AND synthetics.type : "heartbeat/summary" 
```
Ensure the monitor's details/status appear correctly on the monitor's page (it should be passing)

Failing test itself

Create a monitor with a failing afterAll hook, like the one below

const { journey, step } = require('@elastic/synthetics');

journey('failing_journey', ({ page }) => {
  step('google', async () => {
    await page.goto('https://www.google.com');
  });

  step('BBC', async () => {
    await page.goto('https://www.bbc.co.uk');
    process.exit(1)
  });
});

Name this file following the usual *.journey.js pattern and put it somewhere with a package.json. Then, configure Heartbeat to run it:

# ...
heartbeat.monitors:
- type: browser
  id: local-journeys-test
  name: my-test-fails
  schedule: '@every 1m'
  source:
    local:
      path: "/your/file/path"

Start Hearbeat and wait for the monitor to run
Ensure you only have a single heartbeat/summary document for every run.
For that you can use a query like the following one:
```
monitor.name :"my-test-fails - failing_journey'" AND synthetics.type : "heartbeat/summary" 
```
Ensure the monitor's details/status appear correctly on the monitor's page (it should be failing)

vigneshshanmugam

LGTM, Thanks for adding extra guards and the tests 🎉

Also, I love reading the detailed descriptions. Super useful and establishes the context nicely.

vigneshshanmugam · 2022-01-11T05:57:07Z

x-pack/heartbeat/monitors/browser/synthexec/enrich.go

+		// If a command failed _after_ the journey was complete, as it happens
+		// when an `afterAll` hook fails, for example, we don't wan't to include
+		// a summary in the cmd/status event.
+		if je.journeyComplete == false {


nit: !je.journeyComplete

Great point! Thank you @vigneshshanmugam ❤️

…rocess exits) (#29606) * update link to beats developer guide * fix: add summary to journeys which don't emit journey:end [fixes #28770] * fix: avoid cmd/status when journey has already finished (cherry picked from commit 3270ae1)

…b-for-macos * upstream/master: (172 commits) [Elastic Agent] Fix issue with ensureServiceToken. (elastic#29800) [Winlogbeat] Add provider name to Security routing pipeline check (elastic#29781) Add summary to journeys which don't emit journey:end (early node subprocess exits) (elastic#29606) Prepare 8.0.0-rc1 changelog (elastic#29795) (elastic#29806) Change docker image from CentOS 7 to Ubuntu 20.04 (elastic#29681) libbeat/processors/add_process_metadata: implement a process cache eviction policy (elastic#29717) [Automation] Update elastic stack version to 8.1.0-7004acda for testing (elastic#29783) Missing changelog entry for elastic#29773 (elastic#29791) Add a readme for k8s autodiscover provider (elastic#28213) Remove overriding of index pattern on the Kubernetes overview dashboard (elastic#29676) jjbb: remove obsoleted branches (<7.16) (elastic#29707) Add k8s metadata in state_cronjob metricset (elastic#29572) ibmmq: Fix timestamp parsing (elastic#29773) Do not add date to index if `@meta.index` is set (elastic#29775) ci: uses aliases for the branches (elastic#29706) Filebeat tests: Restore `@timestamp` field validation (elastic#29772) Forward port 7.16.3 changelog to master (elastic#29777) auditd: Store program arguments in process.args array (elastic#29601) System/socket: Support kernel_clone() replacement for _do_fork() (elastic#29744) Do not mention removal if version is not specified in `cfgwarn` messages (elastic#29727) ...

…rocess exits) (#29606) (#29812) * update link to beats developer guide * fix: add summary to journeys which don't emit journey:end [fixes #28770] * fix: avoid cmd/status when journey has already finished (cherry picked from commit 3270ae1) Co-authored-by: Lucas F. da Costa <lucas@lucasfcosta.com>

…rocess exits) (#29606) (#29813) * update link to beats developer guide * fix: add summary to journeys which don't emit journey:end [fixes #28770] * fix: avoid cmd/status when journey has already finished (cherry picked from commit 3270ae1) Co-authored-by: Lucas F. da Costa <lucas@lucasfcosta.com>

lucasfcosta added the bug label Dec 24, 2021

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Dec 24, 2021

lucasfcosta added the Team:obs-ds-hosted-services Label for the Observability Hosted Services team label Dec 24, 2021

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Dec 24, 2021

mergify bot added the backport-skip Skip notification from the automated backport with mergify label Dec 24, 2021

lucasfcosta force-pushed the fixes-no-journey-end branch from 47a1b96 to 8e67094 Compare December 29, 2021 16:31

lucasfcosta added backport-v7.16.0 Automated backport with mergify backport-7.17 Automated backport to the 7.17 branch with mergify backport-v8.0.0 Automated backport with mergify labels Dec 29, 2021

mergify bot removed the backport-skip Skip notification from the automated backport with mergify label Dec 29, 2021

lucasfcosta removed the backport-v7.16.0 Automated backport with mergify label Dec 29, 2021

lucasfcosta added 2 commits December 29, 2021 17:24

update link to beats developer guide

62c0d33

fix: add summary to journeys which don't emit journey:end [fixes elas…

d5f1daa

…tic#28770]

lucasfcosta force-pushed the fixes-no-journey-end branch from 2af725c to d5f1daa Compare December 29, 2021 17:30

lucasfcosta marked this pull request as ready for review December 29, 2021 17:31

vigneshshanmugam reviewed Jan 4, 2022

View reviewed changes

vigneshshanmugam approved these changes Jan 11, 2022

View reviewed changes

fix: avoid cmd/status when journey has already finished

a5264c2

lucasfcosta force-pushed the fixes-no-journey-end branch from d55b4f5 to a5264c2 Compare January 11, 2022 09:28

lucasfcosta merged commit 3270ae1 into elastic:master Jan 12, 2022

lucasfcosta deleted the fixes-no-journey-end branch January 12, 2022 10:27

mergify bot mentioned this pull request Jan 12, 2022

[7.17](backport #29606) Add summary to journeys which don't emit journey:end (early node subprocess exits) #29812

Merged

lucasfcosta added the backport-v8.1.0 Automated backport with mergify label Jan 12, 2022

mergify bot mentioned this pull request Jan 12, 2022

[8.0](backport #29606) Add summary to journeys which don't emit journey:end (early node subprocess exits) #29813

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add summary to journeys which don't emit journey:end (early node subprocess exits) #29606

Add summary to journeys which don't emit journey:end (early node subprocess exits) #29606

lucasfcosta commented Dec 24, 2021 •

edited

Loading

mergify bot commented Dec 24, 2021

elasticmachine commented Dec 24, 2021 •

edited by jenkins-beats-ci bot

Loading

Build stats

Test stats 🧪

elasticmachine commented Dec 29, 2021

vigneshshanmugam left a comment

vigneshshanmugam Jan 4, 2022

andrewvc Jan 5, 2022

vigneshshanmugam Jan 6, 2022

lucasfcosta Jan 10, 2022 •

edited

Loading

vigneshshanmugam Jan 11, 2022

vigneshshanmugam Jan 4, 2022

vigneshshanmugam Jan 4, 2022

lucasfcosta Jan 5, 2022

andrewvc Jan 5, 2022

lucasfcosta commented Jan 10, 2022

vigneshshanmugam left a comment

vigneshshanmugam Jan 11, 2022

lucasfcosta Jan 11, 2022

	// lineToSynthEventFactory is a factory that can take a line from the scanner and transform it into a *SynthEvent.
	func lineToSynthEventFactory(typ string) func(bytes []byte, text string) (res *SynthEvent, err error) {
	return func(bytes []byte, text string) (res *SynthEvent, err error) {
	logp.Info("%s: %s", typ, text)
	return &SynthEvent{
	Type: typ,
	TimestampEpochMicros: float64(time.Now().UnixNano() / int64(time.Millisecond)),
	Payload: map[string]interface{}{
	"message": text,
	},
	}, nil
	}
	}

	func (je journeyEnricher) enrichSynthEvent(event beat.Event, se *SynthEvent) error {
	var jobErr error
	if se.Error != nil {
	jobErr = stepError(se.Error)
	je.errorCount++
	if je.firstError == nil {
	je.firstError = jobErr
	}
	}

	switch se.Type {
	case "journey/end":
	je.journeyComplete = true
	return je.createSummary(event)
	case "step/end":
	je.stepCount++
	case "step/screenshot":
	fallthrough
	case "step/screenshot_ref":
	fallthrough
	case "screenshot/block":
	add_data_stream.SetEventDataset(event, "browser.screenshot")
	case "journey/network_info":
	add_data_stream.SetEventDataset(event, "browser.network")
	}

	if se.Id != "" {
	event.SetID(se.Id)
	// This is only relevant for screenshots, which have a specific ID
	// In that case we always want to issue an update op
	event.Meta.Put(events.FieldMetaOpType, events.OpTypeCreate)
	}
	eventext.MergeEventFields(event, se.ToMap())

	if je.urlFields == nil {
	if urlFields, err := event.GetValue("url"); err == nil {
	if ufMap, ok := urlFields.(common.MapStr); ok {
	je.urlFields = ufMap
	}
	}
	}
	return jobErr
	}

	func (je journeyEnricher) createSummary(event beat.Event) error {
	var up, down int
	if je.errorCount > 0 {
	up = 0
	down = 1
	} else {
	up = 1
	down = 0
	}

	if je.journeyComplete {
	eventext.MergeEventFields(event, common.MapStr{
	"url": je.urlFields,
	"synthetics": common.MapStr{
	"type": "heartbeat/summary",
	"journey": je.journey,
	},
	"monitor": common.MapStr{
	"duration": common.MapStr{
	"us": int64(je.end.Sub(je.start) / time.Microsecond),
	},
	},
	"summary": common.MapStr{
	"up": up,
	"down": down,
	},
	})
	return je.firstError
	}

	return fmt.Errorf("journey did not finish executing, %d steps ran", je.stepCount)
	}

	if err != nil {
	str := fmt.Sprintf("command exited with status %d: %s", cmd.ProcessState.ExitCode(), err)
	mpx.writeSynthEvent(&SynthEvent{
	Type: "cmd/status",
	Error: &SynthError{Name: "cmdexit", Message: str},
	})
	logp.Warn("Error executing command '%s' (%d): %s", loggableCmd.String(), cmd.ProcessState.ExitCode(), err)

	if !se.Timestamp().IsZero() {
	event.Timestamp = se.Timestamp()
	// Record start and end so we can calculate journey duration accurately later
	switch se.Type {
	case "journey/start":
	je.firstError = nil
	je.checkGroup = makeUuid()
	je.journey = se.Journey
	je.start = event.Timestamp
	case "journey/end":
	je.end = event.Timestamp
	}
	} else {
	event.Timestamp = time.Now()
	}

	func (se SynthEvent) Timestamp() time.Time {
	seconds := se.TimestampEpochMicros / 1e6
	wholeSeconds := math.Floor(seconds)
	micros := (seconds - wholeSeconds) * 1e6
	nanos := micros * 1000
	return time.Unix(int64(wholeSeconds), int64(nanos))
	}

Add summary to journeys which don't emit journey:end (early node subprocess exits) #29606

Add summary to journeys which don't emit journey:end (early node subprocess exits) #29606

Conversation

lucasfcosta commented Dec 24, 2021 • edited Loading

What does this PR do?

Understanding this fix: how Heartbeat works with elastic-synthetics

Why is it important?

Checklist

Author's Checklist

How to test this PR locally

Other recommended checks

Related issues

Use cases

Further Changes

mergify bot commented Dec 24, 2021

elasticmachine commented Dec 24, 2021 • edited by jenkins-beats-ci bot Loading

💚 Build Succeeded

Build stats

Test stats 🧪

💚 Flaky test report

🤖 GitHub comments

elasticmachine commented Dec 29, 2021

vigneshshanmugam left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lucasfcosta Jan 10, 2022 • edited Loading

Choose a reason for hiding this comment

A comment on failures on afterX hooks

Proposed way forward

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lucasfcosta commented Jan 10, 2022

How to test the creation of summary documents when afterX hooks fail

Failing afterX hook test

Failing test itself

vigneshshanmugam left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lucasfcosta commented Dec 24, 2021 •

edited

Loading

Understanding this fix: how Heartbeat works with `elastic-synthetics`

elasticmachine commented Dec 24, 2021 •

edited by jenkins-beats-ci bot

Loading

lucasfcosta Jan 10, 2022 •

edited

Loading

A comment on failures on `afterX` hooks

How to test the creation of summary documents when `afterX` hooks fail

Failing `afterX` hook test