Skip to content

Commit

Permalink
[HCP Observability] Init OTELSink in Telemetry (#17162)
Browse files Browse the repository at this point in the history
* Move hcp client to subpackage hcpclient (#16800)

* [HCP Observability] New MetricsClient (#17100)

* Client configured with TLS using HCP config and retry/throttle

* Add tests and godoc for metrics client

* close body after request

* run go mod tidy

* Remove one abstraction to use the config from deps

* Address PR feedback

* remove clone

* Extract CloudConfig and mock for future PR

* Switch to hclog.FromContext

* [HCP Observability] New MetricsClient (#17100)

* Client configured with TLS using HCP config and retry/throttle

* Add tests and godoc for metrics client

* close body after request

* run go mod tidy

* Remove one abstraction to use the config from deps

* Address PR feedback

* remove clone

* Extract CloudConfig and mock for future PR

* Switch to hclog.FromContext

* [HCP Observability] New MetricsClient (#17100)

* Client configured with TLS using HCP config and retry/throttle

* Add tests and godoc for metrics client

* close body after request

* run go mod tidy

* Remove one abstraction to use the config from deps

* Address PR feedback

* remove clone

* Extract CloudConfig and mock for future PR

* Switch to hclog.FromContext

* Client configured with TLS using HCP config and retry/throttle

* run go mod tidy

* Remove one abstraction to use the config from deps

* Address PR feedback

* Client configured with TLS using HCP config and retry/throttle

* run go mod tidy

* Create new OTELExporter which uses the MetricsClient
Add transform because the conversion is in an /internal package

* Fix lint error

* early return when there are no metrics

* Add NewOTELExporter() function

* Downgrade to metrics SDK version: v1.15.0-rc.1

* Fix imports

* fix small nits with comments and url.URL

* Fix tests by asserting actual error for context cancellation, fix parallel, and make mock more versatile

* Cleanup error handling and clarify empty metrics case

* Fix input/expected naming in otel_transform_test.go

* add comment for metric tracking

* Add a general isEmpty method

* Add clear error types

* update to latest version 1.15.0 of OTEL

* Client configured with TLS using HCP config and retry/throttle

* run go mod tidy

* Remove one abstraction to use the config from deps

* Address PR feedback

* Initialize OTELSink with sync.Map for all the instrument stores.

* Moved PeriodicReader init to NewOtelReader function. This allows us to use a ManualReader for tests.

* Switch to mutex instead of sync.Map to avoid type assertion

* Add gauge store

* Clarify comments

* return concrete sink type

* Fix lint errors

* Move gauge store to be within sink

* Use context.TODO,rebase and clenaup opts handling

* Rebase onto otl exporter to downgrade metrics API to v1.15.0-rc.1

* Fix imports

* Update to latest stable version by rebasing on cc-4933, fix import, remove mutex init, fix opts error messages and use logger from ctx

* Add lots of documentation to the OTELSink

* Fix gauge store comment and check ok

* Add select and ctx.Done() check to gauge callback

* use require.Equal for attributes

* Fixed import naming

* Remove float64 calls and add a NewGaugeStore method

* Change name Store to Set in gaugeStore, add concurrency tests in both OTELSink and gauge store

* Generate 100 gauge operations

* Seperate the labels into goroutines in sink test

* Generate kv store for the test case keys to avoid using uuid

* Added a race test with 300 samples for OTELSink

* [HCP Observability] OTELExporter (#17128)

* Client configured with TLS using HCP config and retry/throttle

* run go mod tidy

* Remove one abstraction to use the config from deps

* Address PR feedback

* Client configured with TLS using HCP config and retry/throttle

* run go mod tidy

* Create new OTELExporter which uses the MetricsClient
Add transform because the conversion is in an /internal package

* Fix lint error

* early return when there are no metrics

* Add NewOTELExporter() function

* Downgrade to metrics SDK version: v1.15.0-rc.1

* Fix imports

* fix small nits with comments and url.URL

* Fix tests by asserting actual error for context cancellation, fix parallel, and make mock more versatile

* Cleanup error handling and clarify empty metrics case

* Fix input/expected naming in otel_transform_test.go

* add comment for metric tracking

* Add a general isEmpty method

* Add clear error types

* update to latest version 1.15.0 of OTEL

* Do not pass in waitgroup and use error channel instead.

* Using SHA 7dea2225a218872e86d2f580e82c089b321617b0 to avoid build failures in otel

* Rebase onto otl exporter to downgrade metrics API to v1.15.0-rc.1

* Initialize OTELSink with sync.Map for all the instrument stores.

* Added telemetry agent to client and init sink in deps

* Fixed client

* Initalize sink in deps

* init sink in telemetry library

* Init deps before telemetry

* Use concrete telemetry.OtelSink type

* add /v1/metrics

* Avoid returning err for telemetry init

* move sink init within the IsCloudEnabled()

* Use HCPSinkOpts in deps instead

* update golden test for configuration file

* Switch to using extra sinks in the telemetry library

* keep name MetricsConfig

* fix log in verifyCCMRegistration

* Set logger in context

* pass around MetricSink in deps

* Fix imports

* Rebased onto otel sink pr

* Fix URL in test

* [HCP Observability] OTELSink (#17159)

* Client configured with TLS using HCP config and retry/throttle

* run go mod tidy

* Remove one abstraction to use the config from deps

* Address PR feedback

* Client configured with TLS using HCP config and retry/throttle

* run go mod tidy

* Create new OTELExporter which uses the MetricsClient
Add transform because the conversion is in an /internal package

* Fix lint error

* early return when there are no metrics

* Add NewOTELExporter() function

* Downgrade to metrics SDK version: v1.15.0-rc.1

* Fix imports

* fix small nits with comments and url.URL

* Fix tests by asserting actual error for context cancellation, fix parallel, and make mock more versatile

* Cleanup error handling and clarify empty metrics case

* Fix input/expected naming in otel_transform_test.go

* add comment for metric tracking

* Add a general isEmpty method

* Add clear error types

* update to latest version 1.15.0 of OTEL

* Client configured with TLS using HCP config and retry/throttle

* run go mod tidy

* Remove one abstraction to use the config from deps

* Address PR feedback

* Initialize OTELSink with sync.Map for all the instrument stores.

* Moved PeriodicReader init to NewOtelReader function. This allows us to use a ManualReader for tests.

* Switch to mutex instead of sync.Map to avoid type assertion

* Add gauge store

* Clarify comments

* return concrete sink type

* Fix lint errors

* Move gauge store to be within sink

* Use context.TODO,rebase and clenaup opts handling

* Rebase onto otl exporter to downgrade metrics API to v1.15.0-rc.1

* Fix imports

* Update to latest stable version by rebasing on cc-4933, fix import, remove mutex init, fix opts error messages and use logger from ctx

* Add lots of documentation to the OTELSink

* Fix gauge store comment and check ok

* Add select and ctx.Done() check to gauge callback

* use require.Equal for attributes

* Fixed import naming

* Remove float64 calls and add a NewGaugeStore method

* Change name Store to Set in gaugeStore, add concurrency tests in both OTELSink and gauge store

* Generate 100 gauge operations

* Seperate the labels into goroutines in sink test

* Generate kv store for the test case keys to avoid using uuid

* Added a race test with 300 samples for OTELSink

* Do not pass in waitgroup and use error channel instead.

* Using SHA 7dea2225a218872e86d2f580e82c089b321617b0 to avoid build failures in otel

* Fix nits

* pass extraSinks as function param instead

* Add default interval as package export

* remove verifyCCM func

* Add clusterID

* Fix import and add t.Parallel() for missing tests

* Kick Vercel CI

* Remove scheme from endpoint path, and fix error logging

* return metrics.MetricSink for sink method

* Update SDK
  • Loading branch information
Achooo committed May 29, 2023
1 parent fb92de5 commit 68e6e21
Show file tree
Hide file tree
Showing 14 changed files with 427 additions and 34 deletions.
67 changes: 65 additions & 2 deletions agent/hcp/client/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,23 +12,44 @@ import (
httptransport "github.com/go-openapi/runtime/client"
"github.com/go-openapi/strfmt"

"github.com/hashicorp/consul/agent/hcp/config"
"github.com/hashicorp/consul/version"
hcptelemetry "github.com/hashicorp/hcp-sdk-go/clients/cloud-consul-telemetry-gateway/preview/2023-04-14/client/consul_telemetry_service"
hcpgnm "github.com/hashicorp/hcp-sdk-go/clients/cloud-global-network-manager-service/preview/2022-02-15/client/global_network_manager_service"
gnmmod "github.com/hashicorp/hcp-sdk-go/clients/cloud-global-network-manager-service/preview/2022-02-15/models"
"github.com/hashicorp/hcp-sdk-go/httpclient"
"github.com/hashicorp/hcp-sdk-go/resource"

"github.com/hashicorp/consul/agent/hcp/config"
"github.com/hashicorp/consul/version"
)

// metricsGatewayPath is the default path for metrics export request on the Telemetry Gateway.
const metricsGatewayPath = "/v1/metrics"

// Client interface exposes HCP operations that can be invoked by Consul
//
//go:generate mockery --name Client --with-expecter --inpackage
type Client interface {
FetchBootstrap(ctx context.Context) (*BootstrapConfig, error)
FetchTelemetryConfig(ctx context.Context) (*TelemetryConfig, error)
PushServerStatus(ctx context.Context, status *ServerStatus) error
DiscoverServers(ctx context.Context) ([]string, error)
}

// MetricsConfig holds metrics specific configuration for the TelemetryConfig.
// The endpoint field overrides the TelemetryConfig endpoint.
type MetricsConfig struct {
Filters []string
Endpoint string
}

// TelemetryConfig contains configuration for telemetry data forwarded by Consul servers
// to the HCP Telemetry gateway.
type TelemetryConfig struct {
Endpoint string
Labels map[string]string
MetricsConfig *MetricsConfig
}

type BootstrapConfig struct {
Name string
BootstrapExpect int
Expand All @@ -44,6 +65,7 @@ type hcpClient struct {
hc *httptransport.Runtime
cfg config.CloudConfig
gnm hcpgnm.ClientService
tgw hcptelemetry.ClientService
resource resource.Resource
}

Expand All @@ -64,6 +86,8 @@ func NewClient(cfg config.CloudConfig) (Client, error) {
}

client.gnm = hcpgnm.New(client.hc, nil)
client.tgw = hcptelemetry.New(client.hc, nil)

return client, nil
}

Expand All @@ -79,6 +103,29 @@ func httpClient(c config.CloudConfig) (*httptransport.Runtime, error) {
})
}

// FetchTelemetryConfig obtains telemetry configuration from the Telemetry Gateway.
func (c *hcpClient) FetchTelemetryConfig(ctx context.Context) (*TelemetryConfig, error) {
params := hcptelemetry.NewAgentTelemetryConfigParamsWithContext(ctx).
WithLocationOrganizationID(c.resource.Organization).
WithLocationProjectID(c.resource.Project).
WithClusterID(c.resource.ID)

resp, err := c.tgw.AgentTelemetryConfig(params, nil)
if err != nil {
return nil, err
}

payloadConfig := resp.Payload.TelemetryConfig
return &TelemetryConfig{
Endpoint: payloadConfig.Endpoint,
Labels: payloadConfig.Labels,
MetricsConfig: &MetricsConfig{
Filters: payloadConfig.Metrics.IncludeList,
Endpoint: payloadConfig.Metrics.Endpoint,
},
}, nil
}

func (c *hcpClient) FetchBootstrap(ctx context.Context) (*BootstrapConfig, error) {
version := version.GetHumanVersion()
params := hcpgnm.NewAgentBootstrapConfigParamsWithContext(ctx).
Expand Down Expand Up @@ -233,3 +280,19 @@ func (c *hcpClient) DiscoverServers(ctx context.Context) ([]string, error) {

return servers, nil
}

// Enabled verifies if telemetry is enabled by ensuring a valid endpoint has been retrieved.
// It returns full metrics endpoint and true if a valid endpoint was obtained.
func (t *TelemetryConfig) Enabled() (string, bool) {
endpoint := t.Endpoint
if override := t.MetricsConfig.Endpoint; override != "" {
endpoint = override
}

if endpoint == "" {
return "", false
}

// The endpoint from Telemetry Gateway is a domain without scheme, and without the metrics path, so they must be added.
return endpoint + metricsGatewayPath, true
}
75 changes: 75 additions & 0 deletions agent/hcp/client/client_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
package client

import (
"context"
"testing"

"github.com/stretchr/testify/mock"
"github.com/stretchr/testify/require"
)

func TestFetchTelemetryConfig(t *testing.T) {
t.Parallel()
for name, test := range map[string]struct {
metricsEndpoint string
expect func(*MockClient)
disabled bool
}{
"success": {
expect: func(mockClient *MockClient) {
mockClient.EXPECT().FetchTelemetryConfig(mock.Anything).Return(&TelemetryConfig{
Endpoint: "https://test.com",
MetricsConfig: &MetricsConfig{
Endpoint: "",
},
}, nil)
},
metricsEndpoint: "https://test.com/v1/metrics",
},
"overrideMetricsEndpoint": {
expect: func(mockClient *MockClient) {
mockClient.EXPECT().FetchTelemetryConfig(mock.Anything).Return(&TelemetryConfig{
Endpoint: "https://test.com",
MetricsConfig: &MetricsConfig{
Endpoint: "https://test.com",
},
}, nil)
},
metricsEndpoint: "https://test.com/v1/metrics",
},
"disabledWithEmptyEndpoint": {
expect: func(mockClient *MockClient) {
mockClient.EXPECT().FetchTelemetryConfig(mock.Anything).Return(&TelemetryConfig{
Endpoint: "",
MetricsConfig: &MetricsConfig{
Endpoint: "",
},
}, nil)
},
disabled: true,
},
} {
test := test
t.Run(name, func(t *testing.T) {
t.Parallel()

mock := NewMockClient(t)
test.expect(mock)

telemetryCfg, err := mock.FetchTelemetryConfig(context.Background())
require.NoError(t, err)

if test.disabled {
endpoint, ok := telemetryCfg.Enabled()
require.False(t, ok)
require.Empty(t, endpoint)
return
}

endpoint, ok := telemetryCfg.Enabled()

require.True(t, ok)
require.Equal(t, test.metricsEndpoint, endpoint)
})
}
}
81 changes: 78 additions & 3 deletions agent/hcp/client/mock_Client.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion agent/hcp/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,6 @@ func (c *CloudConfig) HCPConfig(opts ...hcpcfg.HCPConfigOption) (hcpcfg.HCPConfi
if c.ScadaAddress != "" {
opts = append(opts, hcpcfg.WithSCADA(c.ScadaAddress, c.TLSConfig))
}
opts = append(opts, hcpcfg.FromEnv())
opts = append(opts, hcpcfg.FromEnv(), hcpcfg.WithoutBrowserLogin())
return hcpcfg.NewHCPConfig(opts...)
}
Loading

0 comments on commit 68e6e21

Please sign in to comment.