Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerized standalone Fleet Server #2359

Merged
merged 27 commits into from
Mar 6, 2023

Conversation

jsoriano
Copy link
Member

@jsoriano jsoriano commented Feb 20, 2023

What is the problem this PR solves?

Allow to run Fleet Server standalone without check in in production builds.

How does this PR solve the problem?

  • Standalone mode previously used in development mode can be also used now in release builds.
  • Elasticsearch version check is not done in standalone mode.
  • Migrations are not done in standalone mode.
  • Health is checked in standalone mode by checking that the service can read the policies index.

After this, fleet-server only needs an Elasticsearch service account to start.

How to test this PR locally

fleet-server -c ./fleet-server.yml -E output.elasticsearch.ssl.verification_mode=none -E fleet.agent.checkin=false -E output.elasticsearch.service_token=....

Or make release-docker, and use the built docker image in your docker or kubernetes environment.

Design Checklist

  • I have ensured my design is stateless and will work when multiple fleet-server instances are behind a load balancer.
  • I have or intend to scale test my changes, ensuring it will work reliably with 100K+ agents connected.
  • I have included fail safe mechanisms to limit the load on fleet-server: rate limiting, circuit breakers, caching, load shedding, etc.

Checklist

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

Comment on lines 463 to 465
if cfg.Fleet.Agent.Checkin {
sm := policy.NewSelfMonitor(cfg.Fleet, bulker, pim, cfg.Inputs[0].Policy.ID, f.reporter)
if f.standAlone {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming by now that "check in" and "standalone" mode are two different things, but I wonder if there is any use case for running standalone with check in (maybe only for development?), or to run in agent without checkin.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More about this in #2359 (comment)

Copy link
Member

@nchaulet nchaulet Feb 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think there is some value to support checkin at all in standalone, but for these we probably need to make a change in Kibana to remove the restrictions of having an healthy fleet server agent, I can make a PR that introduce that change with a feature flag if you need it for testing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @nchaulet, registration + checkin was added just to get around the kibana restrictions.
If we wanted to release a true "stand-alone" server it should not even need to enroll itself

Copy link
Member Author

@jsoriano jsoriano Feb 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can then fully remove standAloneSetup and standAloneCheckin?

Copy link
Member Author

@jsoriano jsoriano Feb 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make a PR that introduce that change with a feature flag if you need it for testing.

@nchaulet that would be great. I guess this would solve part of the "Add Fleet Server" UI Components in https://github.com/elastic/ingest-dev/issues/1530. Instead of using a feature flag we could maybe use the check mentioned there if it is already available.

fleet-server.yml Outdated
@@ -3,12 +3,15 @@ output:
elasticsearch:
hosts: '${ELASTICSEARCH_HOSTS:https://localhost:9200}'
service_token: '${ELASTICSEARCH_SERVICE_TOKEN}'
ssl.ca_trusted_fingerprint: '${ELASTICSEARCH_CA_TRUSTED_FINGERPRINT}'
# ssl.ca_trusted_fingerprint: '${ELASTICSEARCH_CA_TRUSTED_FINGERPRINT}'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this mandatory now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default elasticsearch runs in https

@jsoriano
Copy link
Member Author

@michel-laterman please take a look to this draft to check if I am going in the good direction 🙂 Thanks!

@elasticmachine
Copy link
Contributor

elasticmachine commented Feb 20, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-03-06T16:22:24.994+0000

  • Duration: 13 min 23 sec

Test stats 🧪

Test Results
Failed 0
Passed 605
Skipped 1
Total 606

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@amitkanfer
Copy link

@jsoriano won't it be best to have a new "standalone" config that will toggle the new checkin flag you're adding here? i guess that in the future we'll need to control more configs than just this one...
WDYT?

@alexsapran
Copy link
Contributor

We discussed this with @joshdover.

We could use the internal docker registry and publish the container image under the employees path.

After some quick searching in GH I noticed how it's done in endpoint-dev repo for some dockery-related builds, setting an NS (namespace) to docker.elastic.co/employees/ instead of docker.elastic.co/.

@jsoriano
Copy link
Member Author

jsoriano commented Feb 21, 2023

@jsoriano won't it be best to have a new "standalone" config that will toggle the new checkin flag you're adding here? i guess that in the future we'll need to control more configs than just this one... WDYT?

Yeah, I guess this is related to my other question in #2359 (comment). It will depend on the use cases we support, if for all standalone Fleet Servers we want to disable the checkin, then we can have a single "standalone" config that also disables the checkin.

But, currently we also have an -agent-mode flag that disables standalone, and I think that we may have cases where we want to enable or disable the checkin independently of running in agent mode or in standalone mode:

Use case -agent-mode (no standalone) Check in
Current Fleet Server managed by Agent true true
Current standalone Fleet Server for development false true
Standalone Fleet Server on premise false true
Standalone managed Fleet Server (this PR) false false

This is why, in principle, I considered to have this as an explicit setting.

@jsoriano
Copy link
Member Author

We discussed this with @joshdover.

We could use the internal docker registry and publish the container image under the employees path.

After some quick searching in GH I noticed how it's done in endpoint-dev repo for some dockery-related builds, setting an NS (namespace) to docker.elastic.co/employees/ instead of docker.elastic.co/.

@alexsapran sounds good. There is a separate issue about publishing the image being added here. Let's keep the discussion about this there. It is #2352.

@alexsapran
Copy link
Contributor

We discussed this with @joshdover.

We could use the internal docker registry and publish the container image under the employees path.

After some quick searching in GH I noticed how it's done in endpoint-dev repo for some dockery-related builds, setting an NS (namespace) to docker.elastic.co/employees/ instead of docker.elastic.co/.

@alexsapran sounds good. There is a separate issue about publishing the image being added here. Let's keep the discussion about this there. It is #2352.

Makes sense, thanks for the link, I was not aware of that other issue, will cross post my comment there.

@@ -0,0 +1,22 @@
ARG GO_VERSION
FROM golang:${GO_VERSION}-buster AS builder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using this instead of the same image as we base https://github.com/elastic/fleet-server/blob/main/Dockerfile.build off of?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both Dockerfiles have different purpouses, Dockerfile.build is used to build the release packages for all platforms, and writes the generated packages to the working copy. This one only builds the binary in the platform it is executed, without writing to the working copy, and from it builds the docker image.

I could make a common Dockerfile for both to import, but I think they wouldn't have so much common code. We would also need to somehow handle the different platforms here.

It would be good though to use here make release, to ensure that we are building the same release artifacts in both cases.

I will give another thought to this. I am mainly using this Dockerfile now to have an image for testing on Kubernetes. We could have a more final Dockerfile later.

Copy link
Member Author

@jsoriano jsoriano Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have replaced the build command in the Dockerfile to use make release-linux/amd64 instead, so we build with the same options as other release builds.

We will have to revisit this if we support other OSs or architectures.

@mergify
Copy link
Contributor

mergify bot commented Feb 27, 2023

This pull request is now in conflicts. Could you fix it @jsoriano? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b fleet-server-standalone-build upstream/fleet-server-standalone-build
git merge upstream/main
git push upstream fleet-server-standalone-build

@jsoriano jsoriano self-assigned this Mar 1, 2023
@jsoriano jsoriano marked this pull request as ready for review March 1, 2023 21:07
@jsoriano jsoriano requested review from a team as code owners March 1, 2023 21:07
@jsoriano jsoriano requested a review from AndersonQ March 1, 2023 21:07
@jsoriano
Copy link
Member Author

jsoriano commented Mar 1, 2023

Opening for review. @michel-laterman @nchaulet please take another look.

@juliaElastic
Copy link
Contributor

This is great! It would be nice to add the example command to start to the README.

@nchaulet
Copy link
Member

nchaulet commented Mar 2, 2023

Looks great to me, I tested this locally and it worked well!

Copy link
Contributor

@michel-laterman michel-laterman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm; the only other thing I can think of is to add instructions in the readme's dev section on how to run Kibana so that you can run a stand-alone fleet-server.

// you may not use this file except in compliance with the Elastic License.

//go:build !integration
// +build !integration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nitpick) I don't think we need the // + directives

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Removed from all files.

@jsoriano
Copy link
Member Author

jsoriano commented Mar 3, 2023

It would be nice to add the example command to start to the README.

the only other thing I can think of is to add instructions in the readme's dev section on how to run Kibana so that you can run a stand-alone fleet-server.

Added docs about building the docker image for standalone and about the experimental flag to be able to use the Fleet UI. @juliaElastic @michel-laterman please take another look.

Copy link
Contributor

@juliaElastic juliaElastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for adding the commands to the README.

Copy link
Contributor

@joshdover joshdover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make it very clear in migration.go that no new migrations should be added going forward. Even better if we make the application crash if a new one is added

@jsoriano jsoriano merged commit 94cccad into elastic:main Mar 6, 2023
@jsoriano jsoriano deleted the fleet-server-standalone-build branch March 6, 2023 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants