Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8sclusterreceiver]: Remove the receiver.k8sclusterreceiver.reportCpuMetricsAsDouble feature gate #10838

Merged
merged 5 commits into from
Jun 29, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@

## 🛑 Breaking changes 🛑

- `k8sclusterreceiver`: The `receiver.k8sclusterreceiver.reportCpuMetricsAsDouble` feature gate has been removed (#10838)
- If users were disabling this feature gate, they may have to update
monitoring for a few Kubernetes cpu metrics. For more details see [feature-gate-configurations](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.54.0/receiver/k8sclusterreceiver#feature-gate-configurations).
- `prometheusexporter`: Automatically rename metrics with units to follow Prometheus naming convention (#8950)

### 🚩 Deprecations 🚩
Expand Down Expand Up @@ -324,7 +327,7 @@ https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/9278.
- `attributesprocessor`: Remove log names from filters (#9131)
- `k8sclusterreceiver`: The `receiver.k8sclusterreceiver.reportCpuMetricsAsDouble` feature gate is now enabled by default (#9367)
- Users may have to update monitoring for a few Kubernetes cpu metrics, for
more details see [feature-gate-configurations](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/k8sclusterreceiver#feature-gate-configurations).
more details see [feature-gate-configurations](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/v0.54.0/receiver/k8sclusterreceiver#feature-gate-configurations).

### 🚩 Deprecations 🚩

Expand Down
39 changes: 0 additions & 39 deletions receiver/k8sclusterreceiver/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,45 +56,6 @@ Example:
The full list of settings exposed for this receiver are documented [here](./config.go)
with detailed sample configurations [here](./testdata/config.yaml).

### Feature Gate Configurations
- `receiver.k8sclusterreceiver.reportCpuMetricsAsDouble`
- Description
- The k8s container and node cpu metrics being reported by the k8sclusterreceiver are transitioning from being
reported as integer millicpu units to being reported as double cpu units to adhere to opentelemetry cpu metric
specifications. Please update any monitoring this might affect, the change will cause cpu metrics to be double
instead of integer values as well as metric values will be scaled down by 1000x. You can control whether the
k8sclusterreceiver reports container and node cpu metrics in double cpu units instead of integer millicpu units
with the feature gate listed below.
- Affected Metrics
- k8s.container.cpu_request
- k8s.container.cpu_limit
- k8s.node.allocatable_cpu
- Stages and Timeline
- Alpha
- In this stage the feature gate is disabled by default and must be enabled by the user. This allows users to preemptively opt in and start using the bug fix by enabling the feature gate.
- Collector version: v0.47.0
- Release Date: Late March 2022
- Beta (current stage)
- In this stage the feature gate is enabled by default and can be disabled by the user.
- Users could experience some friction in this stage, they may need to update monitoring for the affected metrics or opt out of using the bug fix by disabling the feature gate.
- Target Collector version: v0.50.0
- Target Release Date: Early May 2022
- Generally Available
- In this stage the feature gate is permanently enabled and the feature gate is no longer available for anyone.
- Users could experience some friction in this stage, they may have to update monitoring for the affected metrics or be blocked from upgrading the collector to versions v0.53.0 and newer.
- Target Collector version: v0.53.0
- Target Release Date: Mid June 2022
- Usage
- Feature gate identifiers prefixed with - will disable the gate and prefixing with + or with no prefix will enable the gate.
- Start the otelcol with the feature gate enabled:
- otelcol {other_arguments} --feature-gates=receiver.k8sclusterreceiver.reportCpuMetricsAsDouble
- Start the otelcol with the feature gate disabled:
- otelcol {other_arguments} --feature-gates=-receiver.k8sclusterreceiver.reportCpuMetricsAsDouble
- More Information
- [collector.go where the the feature gate is registered](./internal/collection/collector.go)
- https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/8115
- https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/semantic_conventions/system-metrics.md#systemcpu---processor-metrics

### node_conditions_to_report

For example, with the config below the receiver will emit two metrics
Expand Down
29 changes: 0 additions & 29 deletions receiver/k8sclusterreceiver/internal/collection/collector.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ import (
agentmetricspb "github.com/census-instrumentation/opencensus-proto/gen-go/agent/metrics/v1"
quotav1 "github.com/openshift/api/quota/v1"
"go.opentelemetry.io/collector/pdata/pmetric"
"go.opentelemetry.io/collector/service/featuregate"
"go.uber.org/zap"
appsv1 "k8s.io/api/apps/v1"
autoscalingv2beta2 "k8s.io/api/autoscaling/v2beta2"
Expand Down Expand Up @@ -67,9 +66,6 @@ const (
k8sKindReplicationController = "ReplicationController"
k8sKindReplicaSet = "ReplicaSet"
k8sStatefulSet = "StatefulSet"

// ID for a temporary feature gate
reportCPUMetricsAsDoubleFeatureGateID = "receiver.k8sclusterreceiver.reportCpuMetricsAsDouble"
)

// DataCollector wraps around a metricsStore and a metadaStore exposing
Expand All @@ -84,33 +80,8 @@ type DataCollector struct {
allocatableTypesToReport []string
}

var reportCPUMetricsAsDoubleFeatureGate = featuregate.Gate{
ID: reportCPUMetricsAsDoubleFeatureGateID,
Enabled: true,
Description: "The k8s container and node cpu metrics being reported by the k8sclusterreceiver are transitioning " +
"from being reported as integer millicpu units to being reported as double cpu units to adhere to " +
"opentelemetry cpu metric specifications. You can control whether the k8sclusterreceiver reports container " +
"and node cpu metrics in double cpu units instead of integer millicpu units with the " +
"receiver.k8sclusterreceiver.reportCpuMetricsAsDouble feature gate. " +
"For more details see: " +
"https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/k8sclusterreceiver/README.md#feature-gate-configurations ",
}

func init() {
featuregate.GetRegistry().MustRegister(reportCPUMetricsAsDoubleFeatureGate)
}

// NewDataCollector returns a DataCollector.
func NewDataCollector(logger *zap.Logger, nodeConditionsToReport, allocatableTypesToReport []string) *DataCollector {
if featuregate.GetRegistry().IsEnabled(reportCPUMetricsAsDoubleFeatureGateID) {
logger.Info("The receiver.k8sclusterreceiver.reportCpuMetricsAsDouble feature gate is enabled. This " +
"otel collector will report double cpu units, which is good for future support!")
} else {
logger.Info("WARNING - Breaking Change: " + reportCPUMetricsAsDoubleFeatureGate.Description)
logger.Info("The receiver.k8sclusterreceiver.reportCpuMetricsAsDouble feature gate is disabled. This " +
"otel collector will report integer cpu units, be aware this will not be supported in the future.")
}

return &DataCollector{
logger: logger,
metricsStore: &metricsStore{
Expand Down
11 changes: 3 additions & 8 deletions receiver/k8sclusterreceiver/internal/collection/containers.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ import (
metricspb "github.com/census-instrumentation/opencensus-proto/gen-go/metrics/v1"
resourcepb "github.com/census-instrumentation/opencensus-proto/gen-go/resource/v1"
conventions "go.opentelemetry.io/collector/semconv/v1.6.1"
"go.opentelemetry.io/collector/service/featuregate"
"go.uber.org/zap"
corev1 "k8s.io/api/core/v1"

Expand Down Expand Up @@ -114,13 +113,9 @@ func getSpecMetricsForContainer(c corev1.Container) []*metricspb.Metric {
val := utils.GetInt64TimeSeries(v.Value())
valType := metricspb.MetricDescriptor_GAUGE_INT64
if k == corev1.ResourceCPU {
if featuregate.GetRegistry().IsEnabled(reportCPUMetricsAsDoubleFeatureGateID) {
// cpu metrics must be of the double type to adhere to opentelemetry system.cpu metric specifications
valType = metricspb.MetricDescriptor_GAUGE_DOUBLE
val = utils.GetDoubleTimeSeries(float64(v.MilliValue()) / 1000.0)
} else {
val = utils.GetInt64TimeSeries(v.MilliValue())
}
// cpu metrics must be of the double type to adhere to opentelemetry system.cpu metric specifications
valType = metricspb.MetricDescriptor_GAUGE_DOUBLE
val = utils.GetDoubleTimeSeries(float64(v.MilliValue()) / 1000.0)
}
metrics = append(metrics,
&metricspb.Metric{
Expand Down
19 changes: 3 additions & 16 deletions receiver/k8sclusterreceiver/internal/collection/nodes.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ import (
resourcepb "github.com/census-instrumentation/opencensus-proto/gen-go/resource/v1"
"github.com/iancoleman/strcase"
conventions "go.opentelemetry.io/collector/semconv/v1.6.1"
"go.opentelemetry.io/collector/service/featuregate"
"go.uber.org/zap"
corev1 "k8s.io/api/core/v1"

Expand Down Expand Up @@ -75,22 +74,10 @@ func getMetricsForNode(node *corev1.Node, nodeConditionTypesToReport, allocatabl
continue
}
val := utils.GetInt64TimeSeries(quantity.Value())

if featuregate.GetRegistry().IsEnabled(reportCPUMetricsAsDoubleFeatureGateID) {
if v1NodeAllocatableTypeValue == corev1.ResourceCPU {
// cpu metrics must be of the double type to adhere to opentelemetry system.cpu metric specifications
if v1NodeAllocatableTypeValue == corev1.ResourceCPU {
val = utils.GetDoubleTimeSeries(float64(quantity.MilliValue()) / 1000.0)
valType = metricspb.MetricDescriptor_GAUGE_DOUBLE
}
} else {
// metrics will be skipped if metric not present in node or value is not convertable to int64
valInt64, ok := quantity.AsInt64()
if !ok {
logger.Debug(fmt.Errorf("metric %s has value %v which is not convertable to int64",
v1NodeAllocatableTypeValue, node.GetName()).Error())
continue
}
val = utils.GetInt64TimeSeries(valInt64)
val = utils.GetDoubleTimeSeries(float64(quantity.MilliValue()) / 1000.0)
valType = metricspb.MetricDescriptor_GAUGE_DOUBLE
}
metrics = append(metrics, &metricspb.Metric{
MetricDescriptor: &metricspb.MetricDescriptor{
Expand Down
71 changes: 3 additions & 68 deletions receiver/k8sclusterreceiver/internal/collection/nodes_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ import (

metricspb "github.com/census-instrumentation/opencensus-proto/gen-go/metrics/v1"
"github.com/stretchr/testify/require"
"go.opentelemetry.io/collector/service/featuregate"
"go.uber.org/zap"
corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/resource"
Expand All @@ -29,42 +28,7 @@ import (
"github.com/open-telemetry/opentelemetry-collector-contrib/receiver/k8sclusterreceiver/internal/testutils"
)

func TestNodeMetricsReportCPUMetricsAsInt(t *testing.T) {
// disable the feature gate
featuregate.GetRegistry().Apply(map[string]bool{reportCPUMetricsAsDoubleFeatureGateID: false})
n := newNode("1")

actualResourceMetrics := getMetricsForNode(n, []string{"Ready", "MemoryPressure"}, []string{"cpu", "memory", "ephemeral-storage", "storage"}, zap.NewNop())

require.Equal(t, 1, len(actualResourceMetrics))

require.Equal(t, 5, len(actualResourceMetrics[0].metrics))
testutils.AssertResource(t, actualResourceMetrics[0].resource, k8sType,
map[string]string{
"k8s.node.uid": "test-node-1-uid",
"k8s.node.name": "test-node-1",
},
)

testutils.AssertMetricsInt(t, actualResourceMetrics[0].metrics[0], "k8s.node.condition_ready",
metricspb.MetricDescriptor_GAUGE_INT64, 1)

testutils.AssertMetricsInt(t, actualResourceMetrics[0].metrics[1], "k8s.node.condition_memory_pressure",
metricspb.MetricDescriptor_GAUGE_INT64, 0)

testutils.AssertMetricsInt(t, actualResourceMetrics[0].metrics[2], "k8s.node.allocatable_cpu",
metricspb.MetricDescriptor_GAUGE_INT64, 123)

testutils.AssertMetricsInt(t, actualResourceMetrics[0].metrics[3], "k8s.node.allocatable_memory",
metricspb.MetricDescriptor_GAUGE_INT64, 456)

testutils.AssertMetricsInt(t, actualResourceMetrics[0].metrics[4], "k8s.node.allocatable_ephemeral_storage",
metricspb.MetricDescriptor_GAUGE_INT64, 1234)
}

func TestNodeMetricsReportCPUMetricsAsDouble(t *testing.T) {
// enable the feature gate
featuregate.GetRegistry().Apply(map[string]bool{reportCPUMetricsAsDoubleFeatureGateID: true})
func TestNodeMetricsReportCPUMetrics(t *testing.T) {
n := newNode("1")

actualResourceMetrics := getMetricsForNode(n, []string{"Ready", "MemoryPressure"}, []string{"cpu", "memory", "ephemeral-storage", "storage"}, zap.NewNop())
Expand All @@ -86,7 +50,7 @@ func TestNodeMetricsReportCPUMetricsAsDouble(t *testing.T) {
metricspb.MetricDescriptor_GAUGE_INT64, 0)

testutils.AssertMetricsDouble(t, actualResourceMetrics[0].metrics[2], "k8s.node.allocatable_cpu",
metricspb.MetricDescriptor_GAUGE_DOUBLE, 3.14)
metricspb.MetricDescriptor_GAUGE_DOUBLE, 0.123)

testutils.AssertMetricsInt(t, actualResourceMetrics[0].metrics[3], "k8s.node.allocatable_memory",
metricspb.MetricDescriptor_GAUGE_INT64, 456)
Expand All @@ -96,35 +60,6 @@ func TestNodeMetricsReportCPUMetricsAsDouble(t *testing.T) {
}

func newNode(id string) *corev1.Node {
if featuregate.GetRegistry().IsEnabled(reportCPUMetricsAsDoubleFeatureGateID) {
return &corev1.Node{
ObjectMeta: v1.ObjectMeta{
Name: "test-node-" + id,
UID: types.UID("test-node-" + id + "-uid"),
Labels: map[string]string{
"foo": "bar",
"foo1": "",
},
},
Status: corev1.NodeStatus{
Conditions: []corev1.NodeCondition{
{
Type: corev1.NodeReady,
Status: corev1.ConditionTrue,
},
{
Status: corev1.ConditionFalse,
Type: corev1.NodeMemoryPressure,
},
},
Allocatable: corev1.ResourceList{
corev1.ResourceCPU: *resource.NewMilliQuantity(3140, resource.DecimalSI),
corev1.ResourceMemory: *resource.NewQuantity(456, resource.DecimalSI),
corev1.ResourceEphemeralStorage: *resource.NewQuantity(1234, resource.DecimalSI),
},
},
}
}
return &corev1.Node{
ObjectMeta: v1.ObjectMeta{
Name: "test-node-" + id,
Expand All @@ -146,7 +81,7 @@ func newNode(id string) *corev1.Node {
},
},
Allocatable: corev1.ResourceList{
corev1.ResourceCPU: *resource.NewQuantity(123, resource.DecimalSI),
corev1.ResourceCPU: *resource.NewMilliQuantity(123, resource.DecimalSI),
corev1.ResourceMemory: *resource.NewQuantity(456, resource.DecimalSI),
corev1.ResourceEphemeralStorage: *resource.NewQuantity(1234, resource.DecimalSI),
},
Expand Down
66 changes: 1 addition & 65 deletions receiver/k8sclusterreceiver/internal/collection/pods_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ import (
metricspb "github.com/census-instrumentation/opencensus-proto/gen-go/metrics/v1"
resourcepb "github.com/census-instrumentation/opencensus-proto/gen-go/resource/v1"
"github.com/stretchr/testify/require"
"go.opentelemetry.io/collector/service/featuregate"
"go.uber.org/zap"
"go.uber.org/zap/zapcore"
"go.uber.org/zap/zaptest/observer"
Expand All @@ -35,70 +34,7 @@ import (
"github.com/open-telemetry/opentelemetry-collector-contrib/receiver/k8sclusterreceiver/internal/testutils"
)

func TestPodAndContainerMetricsReportCPUMetricsAsInt(t *testing.T) {
// disable the feature gate
featuregate.GetRegistry().Apply(map[string]bool{reportCPUMetricsAsDoubleFeatureGateID: false})

pod := newPodWithContainer(
"1",
podSpecWithContainer("container-name"),
podStatusWithContainer("container-name", containerIDWithPreifx("container-id")),
)
dc := NewDataCollector(zap.NewNop(), []string{}, []string{})

dc.SyncMetrics(pod)
actualResourceMetrics := dc.metricsStore.metricsCache

rms := actualResourceMetrics["test-pod-1-uid"]
require.NotNil(t, rms)

rm := rms[0]
require.Equal(t, 1, len(rm.Metrics))
testutils.AssertResource(t, rm.Resource, k8sType,
map[string]string{
"k8s.pod.uid": "test-pod-1-uid",
"k8s.pod.name": "test-pod-1",
"k8s.node.name": "test-node",
"k8s.namespace.name": "test-namespace",
},
)

testutils.AssertMetricsInt(t, rm.Metrics[0], "k8s.pod.phase",
metricspb.MetricDescriptor_GAUGE_INT64, 3)

rm = rms[1]

require.Equal(t, 4, len(rm.Metrics))
testutils.AssertResource(t, rm.Resource, "container",
map[string]string{
"container.id": "container-id",
"k8s.container.name": "container-name",
"container.image.name": "container-image-name",
"container.image.tag": "latest",
"k8s.pod.uid": "test-pod-1-uid",
"k8s.pod.name": "test-pod-1",
"k8s.node.name": "test-node",
"k8s.namespace.name": "test-namespace",
},
)

testutils.AssertMetricsInt(t, rm.Metrics[0], "k8s.container.restarts",
metricspb.MetricDescriptor_GAUGE_INT64, 3)

testutils.AssertMetricsInt(t, rm.Metrics[1], "k8s.container.ready",
metricspb.MetricDescriptor_GAUGE_INT64, 1)

testutils.AssertMetricsInt(t, rm.Metrics[2], "k8s.container.cpu_request",
metricspb.MetricDescriptor_GAUGE_INT64, 10000)

testutils.AssertMetricsInt(t, rm.Metrics[3], "k8s.container.cpu_limit",
metricspb.MetricDescriptor_GAUGE_INT64, 20000)
}

func TestPodAndContainerMetricsReportCPUMetricsAsDouble(t *testing.T) {
// enable the feature gate
featuregate.GetRegistry().Apply(map[string]bool{reportCPUMetricsAsDoubleFeatureGateID: true})

func TestPodAndContainerMetricsReportCPUMetrics(t *testing.T) {
pod := newPodWithContainer(
"1",
podSpecWithContainer("container-name"),
Expand Down