Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ingest Manager] New agent structure (symlinks) #20400

Merged
merged 34 commits into from
Sep 3, 2020

Conversation

michalpristas
Copy link
Contributor

What does this PR do?

Different approach to #20307
Working with symlinks turned out to be a bit tricky due to how OSes handles Working Directory and executable names.
Windows on top of that requires that service name which is used to be registered is in Abs form and it needs to match the one used to reguiter the service (hence the magic with os.Args[0] replacements in the code, os.Args[0] is used in lib as a service name)

Due to the approach of determining WD when running a binary using a symlink paths.yml is either on the symlink level (windows) or on executable level (darwin and linux).
These will get regenerated during future Upgrade/Rollback.

Why is it important?

For future upgrade work

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Same as with previous PR this is tested on linux/darwin and windws (service and direct run)
image

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Aug 3, 2020
@elasticmachine
Copy link
Collaborator

elasticmachine commented Aug 3, 2020

💔 Tests Failed

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #20400 updated]

  • Start Time: 2020-09-03T10:48:33.882+0000

  • Duration: 66 min 39 sec

Test stats 🧪

Test Results
Failed 1
Passed 19661
Skipped 1833
Total 21495

Test errors

Expand to view the tests failures

  • Name: Build and Test / Metricbeat OSS Python Integration tests / test_remote_write – metricbeat.module.prometheus.test_prometheus.TestRemoteWrite

    • Age: 1
    • Duration: 63.29
    • Error Details: beat.beat.TimeoutError: Timeout waiting for 'cond' to be true. Waited 60 seconds.

Steps errors

Expand to view the steps failures

  • Name: Install Go 1.14.7

    • Description: .ci/scripts/install-go.sh

    • Duration: 1 min 32 sec

    • Start Time: 2020-09-03T11:12:33.184+0000

    • log

  • Name: Mage pythonIntegTest

    • Description: mage pythonIntegTest

    • Duration: 37 min 23 sec

    • Start Time: 2020-09-03T11:12:49.422+0000

    • log

  • Name: Make test

    • Description: make -C generator/_templates/metricbeat test

    • Duration: 7 min 30 sec

    • Start Time: 2020-09-03T11:35:06.691+0000

    • log

  • Name: Recursively delete the current directory from the workspace

    • Description: script returned exit code 2

    • Duration: 1 min 1 sec

    • Start Time: 2020-09-03T11:42:50.123+0000

    • log

Log output

Expand to view the last 100 lines of log output

[2020-09-03T11:53:45.785Z] + rm source.tgz
[2020-09-03T11:53:45.798Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats
[2020-09-03T11:53:45.825Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Lint
[2020-09-03T11:53:45.924Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Elastic-Agent-x-pack
[2020-09-03T11:53:46.011Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Auditbeat-crosscompile
[2020-09-03T11:53:46.098Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Winlogbeat-oss
[2020-09-03T11:53:46.181Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/dockerlogbeat
[2020-09-03T11:53:46.268Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Journalbeat
[2020-09-03T11:53:46.351Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Generators-Metricbeat-Linux
[2020-09-03T11:53:46.436Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Elastic-Agent-Mac-OS-X
[2020-09-03T11:53:46.531Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Functionbeat-x-pack
[2020-09-03T11:53:46.614Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Packetbeat-Linux
[2020-09-03T11:53:46.702Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Metricbeat-OSS-Unit-tests
[2020-09-03T11:53:46.790Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Elastic-Agent-x-pack-Windows
[2020-09-03T11:53:46.871Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Filebeat-x-pack-Mac-OS-X
[2020-09-03T11:53:46.954Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Heartbeat-oss
[2020-09-03T11:53:47.035Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Auditbeat-oss-Windows
[2020-09-03T11:53:47.118Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Heartbeat-Mac-OS-X
[2020-09-03T11:53:47.205Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Auditbeat-x-pack-Mac-OS-X
[2020-09-03T11:53:47.287Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Heartbeat-Windows
[2020-09-03T11:53:47.370Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Auditbeat-x-pack-Windows
[2020-09-03T11:53:47.459Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Winlogbeat-Windows-x-pack
[2020-09-03T11:53:47.544Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Auditbeat-x-pack
[2020-09-03T11:53:47.626Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Filebeat-x-pack-Windows
[2020-09-03T11:53:47.706Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Auditbeat-oss-Mac-OS-X
[2020-09-03T11:53:47.796Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Winlogbeat-Windows
[2020-09-03T11:53:47.880Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Functionbeat-Windows
[2020-09-03T11:53:47.962Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Auditbeat-oss-Linux
[2020-09-03T11:53:48.043Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Metricbeat-crosscompile
[2020-09-03T11:53:48.124Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Libbeat-x-pack
[2020-09-03T11:53:48.206Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Filebeat-Windows
[2020-09-03T11:53:48.288Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Packetbeat-Windows
[2020-09-03T11:53:48.368Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Metricbeat-Mac-OS-X
[2020-09-03T11:53:48.453Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Metricbeat-x-pack-Mac-OS-X
[2020-09-03T11:53:48.533Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Metricbeat-x-pack-Windows
[2020-09-03T11:53:48.614Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Packetbeat-Mac-OS-X
[2020-09-03T11:53:48.695Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Metricbeat-Windows
[2020-09-03T11:53:48.777Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Functionbeat-Mac-OS-X-x-pack
[2020-09-03T11:53:48.861Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Filebeat-Mac-OS-X
[2020-09-03T11:53:48.941Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Generators-Beat-Mac-OS-X
[2020-09-03T11:53:49.023Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Generators-Beat-Linux
[2020-09-03T11:53:49.104Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Filebeat-x-pack
[2020-09-03T11:53:49.186Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Filebeat-oss
[2020-09-03T11:53:49.269Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Libbeat-oss
[2020-09-03T11:53:49.349Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Generators-Metricbeat-Mac-OS-X
[2020-09-03T11:53:49.435Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Metricbeat-OSS-Go-Integration-tests
[2020-09-03T11:53:49.548Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Libbeat-crosscompile
[2020-09-03T11:53:49.628Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Libbeat-stress-tests
[2020-09-03T11:53:49.710Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Metricbeat-OSS-Python-Integration-tests
[2020-09-03T11:53:49.792Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Metricbeat-x-pack
[2020-09-03T11:53:50.199Z] + cat
[2020-09-03T11:53:50.199Z] + /usr/local/bin/runbld ./runbld-script --job-name elastic+beats+pull-request
[2020-09-03T11:53:50.199Z] Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
[2020-09-03T11:53:56.794Z] runbld>>> runbld started
[2020-09-03T11:53:56.794Z] runbld>>> 1.6.12/f45d832f2ba0aa2722ab4ec1fda8ad140f027f8b
[2020-09-03T11:53:57.737Z] runbld>>> The following profiles matched the job 'elastic+beats+pull-request' in order of occurrence in the config (last value wins).
[2020-09-03T11:53:57.737Z] runbld>>> Matches in the system config:
[2020-09-03T11:53:57.737Z] runbld>>> - Matched ^elastic\+beats
[2020-09-03T11:53:57.737Z] runbld>>> - Matched ^elastic\+beats\+pull-request
[2020-09-03T11:53:59.121Z] runbld>>> Debug logging enabled.
[2020-09-03T11:53:59.121Z] runbld>>> Storing result
[2020-09-03T11:53:59.121Z] runbld>>> Store result: created {:total 2, :successful 2, :failed 0} 1
[2020-09-03T11:53:59.121Z] runbld>>> BUILD: https://c150076387b5421f9154dfbf536e5c60.us-west1.gcp.cloud.es.io:9243/build-1597739501209/t/20200903115358-6AE33558
[2020-09-03T11:53:59.121Z] runbld>>> Adding system facts.
[2020-09-03T11:54:00.069Z] runbld>>> Adding vcs info for the latest commit:  b2dd52701a7aa54dc1cf0cd870e1c14d2d8df8de
[2020-09-03T11:54:00.069Z] runbld>>> >>>>>>>>>>>> SCRIPT EXECUTION BEGIN >>>>>>>>>>>>
[2020-09-03T11:54:00.069Z] runbld>>> Adding /usr/lib/jvm/java-8-openjdk-amd64/bin to the path.
[2020-09-03T11:54:00.069Z] + echo 'Processing JUnit reports with runbld...'
[2020-09-03T11:54:00.069Z] Processing JUnit reports with runbld...
[2020-09-03T11:54:00.642Z] runbld>>> <<<<<<<<<<<< SCRIPT EXECUTION END <<<<<<<<<<<<
[2020-09-03T11:54:00.642Z] runbld>>> DURATION: 27ms
[2020-09-03T11:54:00.642Z] runbld>>> STDOUT: 40 bytes
[2020-09-03T11:54:00.642Z] runbld>>> STDERR: 49 bytes
[2020-09-03T11:54:00.642Z] runbld>>> WRAPPED PROCESS: SUCCESS (0)
[2020-09-03T11:54:00.642Z] runbld>>> Searching for build metadata in /var/lib/jenkins/workspace/Beats_beats_PR-20400
[2020-09-03T11:54:01.585Z] runbld>>> Storing build metadata: 
[2020-09-03T11:54:01.585Z] runbld>>> Adding test report.
[2020-09-03T11:54:01.585Z] runbld>>> Searching for junit test output files with the pattern: TEST-.*\.xml$ in: /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats
[2020-09-03T11:54:02.157Z] runbld>>> Found 140 test output files
[2020-09-03T11:54:04.708Z] runbld>>> No testsuite node found in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Metricbeat-x-pack/x-pack/metricbeat/build/TEST-go-integration-iis.xml
[2020-09-03T11:54:04.708Z] runbld>>> No testsuite node found in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Metricbeat-x-pack/x-pack/metricbeat/build/TEST-go-integration-openmetrics.xml
[2020-09-03T11:54:04.708Z] runbld>>> No testsuite node found in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Metricbeat-x-pack/x-pack/metricbeat/build/TEST-go-integration-activemq.xml
[2020-09-03T11:54:04.708Z] runbld>>> No testsuite node found in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Metricbeat-x-pack/x-pack/metricbeat/build/TEST-go-integration-istio.xml
[2020-09-03T11:54:04.708Z] runbld>>> No testsuite node found in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Metricbeat-x-pack/x-pack/metricbeat/build/TEST-go-integration-tomcat.xml
[2020-09-03T11:54:05.281Z] runbld>>> No testsuite node found in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Metricbeat-OSS-Go-Integration-tests/metricbeat/build/TEST-go-integration-graphite.xml
[2020-09-03T11:54:05.281Z] runbld>>> No testsuite node found in /var/lib/jenkins/workspace/Beats_beats_PR-20400/src/github.com/elastic/beats/Metricbeat-OSS-Go-Integration-tests/metricbeat/build/TEST-go-integration-windows.xml
[2020-09-03T11:54:05.852Z] runbld>>> Test output logs contained: Errors: 0 Failures: 1 Tests: 21342 Skipped: 1563
[2020-09-03T11:54:05.852Z] runbld>>> Storing result
[2020-09-03T11:54:05.852Z] runbld>>> FAILURES: 1
[2020-09-03T11:54:06.422Z] runbld>>> Store result: updated {:total 2, :successful 2, :failed 0} 2
[2020-09-03T11:54:06.422Z] runbld>>> BUILD: https://c150076387b5421f9154dfbf536e5c60.us-west1.gcp.cloud.es.io:9243/build-1597739501209/t/20200903115358-6AE33558
[2020-09-03T11:54:06.422Z] runbld>>> Email notification disabled by environment variable.
[2020-09-03T11:54:06.422Z] runbld>>> Slack notification disabled by environment variable.
[2020-09-03T11:54:12.171Z] Running on Jenkins in /var/lib/jenkins/workspace/Beats_beats_PR-20400
[2020-09-03T11:54:12.504Z] [INFO] getVaultSecret: Getting secrets
[2020-09-03T11:54:12.581Z] Masking supported pattern matches of $VAULT_ADDR or $VAULT_ROLE_ID or $VAULT_SECRET_ID
[2020-09-03T11:54:13.432Z] + chmod 755 generate-build-data.sh
[2020-09-03T11:54:13.432Z] + ./generate-build-data.sh https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats/PR-20400/ https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats/PR-20400/runs/9 FAILURE 3939290
[2020-09-03T11:54:13.432Z] INFO: curl https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats/PR-20400/runs/9/steps/?limit=10000 -o steps-info.json
[2020-09-03T11:54:16.661Z] INFO: curl https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats/PR-20400/runs/9/tests/?status=FAILED -o tests-errors.json

@michalpristas michalpristas changed the title Agent new structure sym [Ingest Manager] New agent structure (symlinks) Aug 3, 2020
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Aug 4, 2020
@ph ph requested a review from blakerouse August 25, 2020 12:23
@michalpristas michalpristas marked this pull request as ready for review September 1, 2020 08:11
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ingest-management (Team:Ingest Management)

@ph
Copy link
Contributor

ph commented Sep 1, 2020

@ruflin @blakerouse lets make sure we review this quickly.

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested this extensively and I think this works really good.

The only weird thing that I hit, is that running ./elastic-agent run on Windows the first time will result in the re-exec which will cause PowerShell to return and elastic-agent will keep running. This is really a side-effect of Windows and how re-exec works (which is what I wrote and un-related to this branch). I would like to see if we can improve on that in a follow-up branch.

Otherwise this is what we need to get further along with self-upgrade!

@michalpristas
Copy link
Contributor Author

@blakerouse i was thinking that the idea about install will help with this issue.

@blakerouse
Copy link
Contributor

@michalpristas I agree, I think we can solve that issue with install.

@michalpristas
Copy link
Contributor Author

@ruflin can i get your eyes on it before mergin?

@ruflin
Copy link
Member

ruflin commented Sep 2, 2020

@michalpristas Unfortunately didn't get to it yet, will have a look tomorrow but please don't block on me to get this in.

@michalpristas
Copy link
Contributor Author

@ruflin tomorrow is ok for me, fi you wont have time i will proceed

Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this on OS X with the .tar.gz and it works.

  • We should make sure, we follow up with some tests that just execute things on different platforms (I think we already do)
  • Then we should add tests that upgrade the binary between two commits to make sure upgrade works as expected.

What we do here is quite a bit of magic and code differs between platforms. To make this maintainable long term that others can also touch the code I think this needs quite a bit of additional documentation. Part of it in the code but also part of it outside to talk through how the upgrade exactly works and why. This can happen as a follow up.

dev-tools/mage/dmgbuilder.go Outdated Show resolved Hide resolved
@@ -476,6 +477,10 @@ func copyInstallScript(spec PackageSpec, script string, local *string) error {
*local = strings.TrimSuffix(*local, ".tmpl")
}

if strings.HasSuffix(*local, "."+spec.Name) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have some docs in the code around the fancy logic that copyInstallScript does as function comment or in line. So if someone touches this in the future he knows why all the special cases are here.

{{- if .linux_capabilities }}
setcap {{ .linux_capabilities }} {{ $beatBinary }} && \
{{- end }}
{{- range $i, $modulesd := .ModulesDirs }}
chmod 0770 {{ $beatHome}}/{{ $modulesd }} && \
{{- end }}
chmod 0770 {{ $beatHome }}/data {{ $beatHome }}/logs
chmod 0770 {{ $beatHome }}/data {{ $beatHome }}/data/elastic-agent-{{ commit_short }}/logs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is an officially released version like 7.9.1, will it still contain the commit hash?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this is to support snapshots, we can change function to include version or do some conditional formats

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For someone only running production builds, it will probably be unexpected to see here a hash instead of a specific version. Would be nice if we have a nice switch between the two. But then we have the problem how to separate 2 release candidates 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer we have a single way, having special case would just increase complexity IMHO.
Also, this is a good thing we could allow people to use preview build?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how debugging is going to work. You upgrade from 7.9.1 to 7.10.0 and things go wrong. You now need to figure out which hash 7.9.1. and 7.10.0 was know which directories to look into.

Perhaps the solution here is more symlinks with the exact version number. If two 7.9.1 exists, it only links to the most recent one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruflin The scenario you are describing will only be confusing if you upgrade to nightly snapshots. In the normal production environment you should only have something like this:

7.9.0-abcd
7.9.1-abcd2
7.9.2-bcd
7.10.0-huhuh

Now, concerning the rollback in debugging while using multiples snapshots, I believe you are right it might be confusing.

@michalpristas How do you plan to solve the following issue, note this might be an edge case.

  1. I run 7.10-{hash1} (snapshot generated on monday)
  2. I try to upgrade to 7.10-{hash2} It fails.
  3. Rollback to 7.10-{hash1}

We could probable solve the situation with symlinks or traces on disk.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ph I believe there is only ever 1 previous version in the data directory during upgrade. Once upgrade is successful, the new Agent will delete the other version directory.

This makes it easy to know which is the previous version, its just not the current version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, forgot about that. so its a non issue :)

func preRunCheck(flags *globalFlags) func(cmd *cobra.Command, args []string) error {
return func(cmd *cobra.Command, args []string) error {
if sn := paths.ServiceName(); sn != "" {
if !filepath.IsAbs(os.Args[0]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth a code comment.

const (
defaultConfig = "elastic-agent.yml"
hashLen = 6
commitFile = ".elastic-agent.active.commit"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this different from the build hash? Will it change during upgrade?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the change is in location, build hash file we have is in different location than our configuration files and differs based on platform/type of package.
i introduced this hidden file so it sits right next to configuration and if needed can change format

@michalpristas michalpristas merged commit 5486a21 into elastic:master Sep 3, 2020
michalpristas added a commit to michalpristas/beats that referenced this pull request Sep 3, 2020
[Ingest Manager] New agent structure (symlinks) (elastic#20400)
@ph
Copy link
Contributor

ph commented Sep 3, 2020

@EricDavisX and @rahulgupta-qasource We have changed the structure of the Elastic Agent, it so it will a good idea to tests packages again with the new snapshots. This will also be backported to 7.10 branch.

@michalpristas Can you sync with @dedemorton This could possibly have impact on the documentation, we should also explain the structure to users.

v1v added a commit to v1v/beats that referenced this pull request Sep 3, 2020
…ne-2.0

* upstream/master:
  [Metricbeat][test] Disable ec2 flaky test (elastic#20959)
  Check if tracer is active before starting a transaction (elastic#20852)
  [Elastic Agent] Add support for variable replacement from providers (elastic#20839)
  Only request wildcard expansion for hidden indices if supported (elastic#20938)
  [Ingest Manager] New agent structure (symlinks) (elastic#20400)
  [Ingest Manager] Print a message confirming shutdown (elastic#20948)
  Skip flaky test on unix input (elastic#20942)
  [Ingest Manager] Align introspect-inspect naming in code (elastic#20952)
  [Filebeat][zeek] Map new x509 fields for ssl module (elastic#20927)
  [CI] fix regression with variable name (elastic#20930)
  [Autodiscover] Handle input-not-finished errors in config reload (elastic#20915)
  [Ingest Manager] Remove Success from fleet contract (elastic#20449)
v1v added a commit to v1v/beats that referenced this pull request Sep 3, 2020
…-faster

* upstream/master:
  [Metricbeat][test] Disable ec2 flaky test (elastic#20959)
  Check if tracer is active before starting a transaction (elastic#20852)
  [Elastic Agent] Add support for variable replacement from providers (elastic#20839)
  Only request wildcard expansion for hidden indices if supported (elastic#20938)
  [Ingest Manager] New agent structure (symlinks) (elastic#20400)
  [Ingest Manager] Print a message confirming shutdown (elastic#20948)
  Skip flaky test on unix input (elastic#20942)
  [Ingest Manager] Align introspect-inspect naming in code (elastic#20952)
  [Filebeat][zeek] Map new x509 fields for ssl module (elastic#20927)
@ghost
Copy link

ghost commented Sep 7, 2020

Hi @ph /@EricDavisX

We have validated this ticket on latest Kibana 7.10.0-SNAPSHOT cloud environment.

Enrolled agent with Windows(x86_64.zip), Linux(.deb, .rpm and .tar.gz) and macOS(.tar.gz) 7.10.0-SNAPSHOT packages on corresponding Hosts.

Observations:

  1. Elastic-agent is enrolled successfully('Online' status and no error in activity logs) on Windows, Linux and macOS with above packages.
  2. Hosts(with different packages installed) are displayed under '[Metrics System] Overview ECS' dashboard.

Screenshot:
Dashboard

Queries:

  1. Could you please let us know if we have to validate the above packages on 7.9.2-SNAPSHOT or 8.0.0-SNAPSHOT also.
  2. Could you please share the exact steps to execute the upgrade/rollback scenarios (say as per comment [Ingest Manager] New agent structure (symlinks) #20400 (review) , [Ingest Manager] New agent structure (symlinks) #20400 (comment) and [Ingest Manager] New agent structure (symlinks) #20400 (comment) )

Please let us know if we are missing something to validate this ticket or need to add any other validation scenario.

@EricDavisX
Copy link
Contributor

I recommend we follow up with test work / questions in the new issue above,
#21023

michalpristas added a commit that referenced this pull request Sep 22, 2020
Cherry-pick #20400 to 7.x: New agent structure (symlinks)  (#20960)
melchiormoulin pushed a commit to melchiormoulin/beats that referenced this pull request Oct 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants