Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add disable providers by default option #4166

Merged
merged 11 commits into from
Feb 5, 2024

Conversation

eyalkraft
Copy link
Contributor

@eyalkraft eyalkraft commented Jan 31, 2024

What does this PR do?

Two main things (let me know if you prefer two separate PRs)

  • Adding the providers_default_disable configuration flag, to allow changing the default providers behavior.
  • Skipping replacing the current agent configuration with default fleet configuration upon enrollment, in case the current configuration already contains the configuration from the default fleet configuration.

Why is it important?

  • Being able to disable all providers by default is required in cases such as agentless.
  • Skipping the configuration replacement is required to allow persisting pre-enrollment configuration for a containerized agent that enrolls to fleet.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation Update providers_default_disable ingest-docs#884
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Author's Checklist

Post-merge:

How to test this PR locally

  1. Deploy elastic-agent on k8s configured with a configmap
  2. configure with fleet enabled and providers disabled by default
  3. no provider logs!

Related issues

Use cases

Screenshots

Tested on my serverless dev cluster.

Old agent image, and New image with old configmap: docker provider tries starting.

image

image

New image with new configmap: no provider is starting

image

image

New image with docker provider explicitly enabled: provider is starting

image

image

Logs

elastic-agent@agentless-58995d99d6-hm8n2:~$ elastic-agent inspect | grep provider
providers_default_disable: true

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

Copy link
Contributor

mergify bot commented Jan 31, 2024

This pull request does not have a backport label. Could you fix it @eyalkraft? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@eyalkraft eyalkraft added the enhancement New feature or request label Feb 4, 2024
@eyalkraft eyalkraft marked this pull request as ready for review February 4, 2024 20:16
@eyalkraft eyalkraft requested a review from a team as a code owner February 4, 2024 20:16
@eyalkraft eyalkraft requested a review from cmacknz February 4, 2024 20:18
@pierrehilbert pierrehilbert added the Team:Elastic-Agent Label for the Agent team label Feb 5, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@@ -65,7 +65,7 @@ func New(log *logger.Logger, c *config.Config, managed bool) (Controller, error)
contextProviders := map[string]*contextProviderState{}
for name, builder := range Providers.contextProviders {
pCfg, ok := providersCfg.Providers[name]
if ok && !pCfg.Enabled() {
if (ok && !pCfg.Enabled()) || (!ok && providersCfg.ProvidersDefaultDisable) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case a provider is configured and the ProvidersDefaultDisable is set, should we continue or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, providers can be explicitly configured. The flag just changes the default behavior

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in that test-case, what is the expected output?

		{
			cfg: map[string]interface{}{
				"agent.providers.initial_default": "true",
				"providers": map[string]interface{}{
					"env": map[string]interface{}{
						"enabled": "false",
					},
                                 },
			},
		},

# All registered providers are enabled by default.

# Disable all providers by default and only enable explicitly configured providers.
# providers_default_disable: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like this as top-level field. The agent has a couple sections in its configuration outputs, inputs, management, agent. This as a top-level option is kinda floating out in the wrong area. Being that you are changing the default option of the Elastic Agent for all provides I would think this should be under agent.*.

What about: agent.providers.initial_default: false? The initial_default can be a *bool to allow the default to be true when unset, true (explicity set), or false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the config struct is only used in a single place I think it could be cleaner to keep it as a boolean, setting its value to true before calling Unpack.
We wouldn't know if it was explicitly configured to true or just left out but I don't think it matters.

wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does work but I always worry about issues with defaults actually being initialized and not accidentally cleared to false before the unpack. I prefer to be safer with the *bool.

@eyalkraft eyalkraft requested a review from blakerouse February 5, 2024 16:18
Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the adjustments. Looks good to me.

@eyalkraft
Copy link
Contributor Author

Looks like one integration test fails during setup.

2024-02-05 17:36:07 UTC | >>> (linux-arm64-ubuntu-2204-container) Test output (sudo) (stdout): FAIL
-- | --
  | 2024-02-05 17:36:07 UTC | >>> (linux-arm64-ubuntu-2204-container) Test output (sudo) (stdout): FAIL	github.com/elastic/elastic-agent/testing/integration	2.727s
  | 2024-02-05 17:36:07 UTC | >>> (linux-arm64-ubuntu-2204-container) Test output (sudo) (stdout): === Failed
  | 2024-02-05 17:36:07 UTC | >>> (linux-arm64-ubuntu-2204-container) Test output (sudo) (stdout): === FAIL: testing/integration TestContainerCMD (2.35s)
  | 2024-02-05 17:36:07 UTC | >>> (linux-arm64-ubuntu-2204-container) Test output (sudo) (stdout): container_cmd_test.go:62: could not create Agent Policy: Package at top-level directory  must contain a top-level manifest.yml file.
  | 2024-02-05 17:36:07 UTC | >>> (linux-arm64-ubuntu-2204-container) Test output (sudo) (stdout): DONE 1 tests, 1 failure in 36.566s
  | 2024-02-05 17:36:07 UTC | >>> (linux-arm64-ubuntu-2204-container) Test output (sudo) (stderr): Error: go test returned a non-zero value: exit status 1
  | 2024-02-05 17:36:07 UTC | >>> (linux-arm64-ubuntu-2204-container) sudo tests failed: Process exited with status 1

// Create policy
policy, err := info.KibanaClient.CreatePolicy(ctx, createPolicyReq)
if err != nil {
t.Fatalf("could not create Agent Policy: %s", err)
}

Could the test setup be flaky? Should I re trigger CI somehow (empty commit?)

@cmacknz
Copy link
Member

cmacknz commented Feb 5, 2024

buildkite test it is the command to re-trigger that build step.

@eyalkraft
Copy link
Contributor Author

Manually tested latest changes - Works as expected 👍

@eyalkraft
Copy link
Contributor Author

buildkite test it

@eyalkraft eyalkraft enabled auto-merge (squash) February 5, 2024 22:18
Copy link

Quality Gate passed Quality Gate passed

The SonarQube Quality Gate passed, but some issues were introduced.

2 New issues
0 Security Hotspots
95.0% 95.0% Coverage on New Code
0.0% 0.0% Duplication on New Code

See analysis details on SonarQube

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip enhancement New feature or request Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Properly support disabling providers and changing init time settings in containers
6 participants