Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Heartbeat] Private location policy update breaks scheduling #34629

Closed
emilioalvap opened this issue Feb 21, 2023 · 3 comments · Fixed by #34697
Closed

[Heartbeat] Private location policy update breaks scheduling #34629

emilioalvap opened this issue Feb 21, 2023 · 3 comments · Fixed by #34697
Assignees
Labels
bug Team:obs-ds-hosted-services Label for the Observability Hosted Services team v8.6.0

Comments

@emilioalvap
Copy link
Collaborator

  • Version: >8.6.1
  • Operating System: Docker

Summary

When running browser monitors through a private location policy, any updates to the policy prevents further iterations from running.
It also seems synthetic process are being intefered with when an agent update occurs, since running monitors at the time reach the timeout and terminate on error consistently:

{"log.level":"info","@timestamp":"2023-02-21T11:52:47.610Z","message":"Command has completed(-1): /usr/share/elastic-agent/.node/node/bin/elastic-synthetics elastic-synthetics --screenshots on --throttling 5d/3u/20l --inline --rich-events","component":{"binary":"heartbeat","dataset":"elastic_agent.heartbeat","id":"synthetics/browser-default","type":"synthetics/browser"},"log":{"source":"synthetics/browser-default"},"service.name":"heartbeat","ecs.version":"1.6.0","log.origin":{"file.line":276,"file.name":"synthexec/synthexec.go"},"ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2023-02-21T11:52:47.611Z","message":"Error executing command '/usr/share/elastic-agent/.node/node/bin/elastic-synthetics elastic-synthetics --screenshots on --throttling 5d/3u/20l --inline --rich-events' (-1): signal: killed","component":{"binary":"heartbeat","dataset":"elastic_agent.heartbeat","id":"synthetics/browser-default","type":"synthetics/browser"},"log":{"source":"synthetics/browser-default"},"service.name":"heartbeat","ecs.version":"1.6.0","log.origin":{"file.line":282,"file.name":"synthexec/synthexec.go"},"ecs.version":"1.6.0"}

When collecting elastic-agent diagnostics, it seems it's not able to serialize heartbeat integrations correctly:

root@bce4abe537ff:/usr/share/elastic-agent# elastic-agent diagnostics collect
[WARNING] Could not redact state.yaml due to unmarshalling error: yaml: invalid map key: map[interface {}]interface {}{"unitid":"synthetics/browser-default", "unittype":1}
Created diagnostics archive "elastic-agent-diagnostics-2023-02-21T12-52-40Z-00.zip"

How to repro?

  1. Create a private location policy with long running monitors, eg:
step("Test", async () => {
    await page.goto("https://google.es");
    await page.waitForTimeout(600e3);
})
  1. Enroll and agent and wait for monitors to start running:
{"log.level":"info","@timestamp":"2023-02-21T11:37:17.603Z","message":"Running command: /usr/share/elastic-
agent/.node/node/bin/elastic-synthetics elastic-synthetics --screenshots on...
  1. Update any of the monitors to trigger a policy update.
  2. Wait for next monitor iteration to happen.
@emilioalvap emilioalvap added bug Team:obs-ds-hosted-services Label for the Observability Hosted Services team v8.6.0 labels Feb 21, 2023
@elasticmachine
Copy link
Collaborator

Pinging @elastic/uptime (Team:Uptime)

@emilioalvap
Copy link
Collaborator Author

emilioalvap commented Feb 27, 2023

Adding more info, it seems that browser jobs are not being cleared correctly from the running context when the policy is updated, preventing further jobs from running.

image

@emilioalvap
Copy link
Collaborator Author

emilioalvap commented Feb 28, 2023

Additionally, two other potential issues:

  • Browser project context is not closing all related synthetics context when cancelled. This can be fixed by declaring a project context and passing it as a parent context for generated jobs.

  • config Reload() from libbeat is generating different hashes for all new monitors, so it's constantly stopping and reloading running monitors, even when config didn't change. This is probably the case because policy revision is included in the hash struct:, instead of only using the integration local revision:
    image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Team:obs-ds-hosted-services Label for the Observability Hosted Services team v8.6.0
Projects
None yet
2 participants