Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cmd/opampsupervisor] RemoteConfigStatus is not populated with failed on invalid config #34785

Open
Asarew opened this issue Aug 21, 2024 · 5 comments
Labels

Comments

@Asarew
Copy link

Asarew commented Aug 21, 2024

Component(s)

cmd/opampsupervisor

Is your feature request related to a problem? Please describe.

When passing down "invalid" remote configuration from the otel controller to the supervisor, the supervisor doesn't report back in the RemoteConfigStatus status == failed. It does report back Unhealthy in the ComponentHealth with a LastError, but relying on that seems to break the opamp specification and it doesn't specify any details.

What is happening:

  1. Pushed down valid yaml but with invalid collector config:
    &protobufs.AgentRemoteConfig{
        Config: &protobufs.AgentConfigMap{
          ConfigHash: []byte("abc123")
          ConfigMap: map[string]*protobufs.AgentConfigFile{
            "": &protobufs.AgentConfigFile{
                  ContentType: "text/yaml"
                  Body: []byte(`
                    receivers:
                      nop:
                    exporters:
                      nop:
                    service:
                      pipelines:
                        traces/3:
                          receivers: [nop]
                          exporters: [nop]
                    force_invalid:
                      config:
                        because: "of unknown fields"
                  `)
          },
        },
    }
  2. First message send by supervisor has RemoteConfigStatus: (with corresponding LastRemoteConfigHash)
    &protobufs.RemoteConfigStatus{
        LastRemoteConfigHash: "abc123"
        Status: protobufs.RemoteConfigStatuses_RemoteConfigStatuses_APPLIED
    }
  3. receive ComponentHealth.Healthy == false every 5 seconds with ComponentHealth.LastError:
    Agent process PID={*} exited unexpectedly, exit code=1. Will restart in a bit...
    
  4. agent.log file gets rewritten every 5 seconds with:
    Error: failed to get config: cannot unmarshal the configuration: decoding failed due to the following error(s):
    
    '' has invalid keys: force_invalid
    2024/08/21 13:01:42 collector server run finished with error: failed to get config: cannot unmarshal the configuration: decoding failed due to the following error(s):
    
    '' has invalid keys: force_invalid
    
    

Describe the solution you'd like

Call the collector validate command before starting and the agent. if that fails report the error message back in the RemoteConfigStatus.ErrorMessage with the correct status of Failed.

Describe alternatives you've considered

"Reuse" the ComponentHealth as the RemoteConfigStatus for now, but in my opinion that's a bad implementation of the opamp spec from both the controller as the supervisor.

Additional context

No response

@Asarew Asarew added enhancement New feature or request needs triage New item requiring triage labels Aug 21, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@BinaryFissionGames
Copy link
Contributor

Yep, this is absolutely something that's missing right now. It's tracked here:
#21079

Looks like there was a PR opened for this but it slipped through the cracks somehow.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Dec 11, 2024
@Asarew
Copy link
Author

Asarew commented Dec 11, 2024

not stale :(

@atoulme atoulme removed the Stale label Dec 12, 2024
@atoulme
Copy link
Contributor

atoulme commented Dec 12, 2024

@Asarew I'm not well placed to help here as I am not familiar with the spec enough. Do you have a link to the section of the spec this behavior breaks? Additionally, would it make sense to have a spec change to add the behavior you are asking for, or is it already specified there?

Lastly, would you be open to offer an integration test based on what you offered? That'd help heaps move this forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants