Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collector 0.42.0 - googlecloudexporter - One or more points were written more frequently than the maximum sampling period configured for the metric #18039

Closed
zisom-hc opened this issue Jan 25, 2023 · 11 comments
Labels
bug Something isn't working exporter/googlecloud

Comments

@zisom-hc
Copy link

Component(s)

exporter/googlecloud

What happened?

Description

Errors seen when collector is configured to export metrics through the googlecloudexporter. Not sure if the errors experienced can be resolved through configuration of the collector itself, or if the issue may be with the instrumentation of the telemetry within the TFC Agent (application) emitting the telemetry. Insight into how to resolve is greatly appreciated.

Why is version v0.42.0 of the collector being used? This is because this version of the collector has been deemed internally by the authors of the TFC Agent to be most compatible with the instrumentation libraries currently used by the TFC agent, which the version of the otel libraries used is 0.19.0

Steps to Reproduce

  1. Create a configuration yaml file for the collector named collector-gcp.yml using the one i provided within your working directory
  2. Run a docker container of version 0.42.0 of the opentelemetry-collector-contrib, using the command below within your working directory:
sudo docker run -d\
--name collector \
--volume $(pwd)/collector-gcp.yml:/etc/otel/config.yaml  \
-p 4317:4317  \
-p 4318:4318  \
-p 55681:55681  \
--rm otel/opentelemetry-collector-contrib:0.42.0 \
--config /etc/otel/config.yaml
  1. Configure and run a docker container of the Terraform Cloud Agent to emit telemetry to the provisioned collector:
    • You would need an installation of Terraform Enterprise, or a paid subscription to Terraform Cloud to connect the agent to, which is an application that HashiCorp only supplies to paid customers, so you most likely will not be able to perform this step.
docker run -d \
--name agent \
-e TFC_ADDRESS=https://TFE_HOSTNAME_OR_DELETE_THIS_ENV_VAR_IF_USING_TFC  \
-e TFC_AGENT_TOKEN="USER_API_TOKEN" \
-e TFC_AGENT_NAME=agent \
-e TFC_AGENT_LOG_LEVEL=TRACE \
-e TFC_AGENT_OTLP_ADDRESS=172.17.0.2:4317 \
hashicorp/tfc-agent:latest

Expected Result

ingested metrics are successfully ingested and exported by the collector to Google Cloud Monitoring service

Actual Result

Errors seen for multiple metrics ingested by the collector from a TFC agent:

  • One or more points were written more frequently than the maximum sampling period configured for the metric
  • One or more TimeSeries could not be written: Field timeSeries[9].points[0].value had an invalid value: A point has an unrecognized value type.

Collector version

v0.42.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
      http:

exporters:
  googlecloud:
   retry_on_failure:
    enabled: false

processors:
  batch:
    send_batch_max_size: 200
    send_batch_size: 200

service:
  pipelines:
   traces:
    receivers: [otlp]
    processors: [batch]
    exporters: [googlecloud]
   metrics:
    receivers: [otlp]
    processors: [batch]
    exporters: [googlecloud]

Log output

root@instance-1:~/docker# sudo docker run --name collector --volume $(pwd)/collector-gcp.yml:/etc/otel/config.yaml  -p 4317:4317  -p 4318:4318  -p 55681:55681  --rm otel/opentelemetry-collector-contrib:0.42.0 --config /etc/otel/config.yaml
2023-01-25T18:32:13.214Z        info    builder/exporters_builder.go:255        Exporter was built.     {"kind": "exporter", "name": "googlecloud"}
2023-01-25T18:32:13.215Z        info    builder/pipelines_builder.go:223        Pipeline was built.     {"name": "pipeline", "name": "metrics"}
2023-01-25T18:32:13.215Z        info    builder/pipelines_builder.go:223        Pipeline was built.     {"name": "pipeline", "name": "traces"}
2023-01-25T18:32:13.215Z        info    builder/receivers_builder.go:226        Receiver was built.     {"kind": "receiver", "name": "otlp", "datatype": "metrics"}
2023-01-25T18:32:13.216Z        info    builder/receivers_builder.go:226        Receiver was built.     {"kind": "receiver", "name": "otlp", "datatype": "traces"}
2023-01-25T18:32:13.216Z        info    service/service.go:82   Starting extensions...
2023-01-25T18:32:13.216Z        info    service/service.go:87   Starting exporters...
2023-01-25T18:32:13.216Z        info    builder/exporters_builder.go:40 Exporter is starting... {"kind": "exporter", "name": "googlecloud"}
2023-01-25T18:32:13.216Z        info    builder/exporters_builder.go:48 Exporter started.       {"kind": "exporter", "name": "googlecloud"}
2023-01-25T18:32:13.217Z        info    service/service.go:92   Starting processors...
2023-01-25T18:32:13.217Z        info    builder/pipelines_builder.go:54 Pipeline is starting... {"name": "pipeline", "name": "traces"}
2023-01-25T18:32:13.221Z        info    builder/pipelines_builder.go:65 Pipeline is started.    {"name": "pipeline", "name": "traces"}
2023-01-25T18:32:13.221Z        info    builder/pipelines_builder.go:54 Pipeline is starting... {"name": "pipeline", "name": "metrics"}
2023-01-25T18:32:13.221Z        info    builder/pipelines_builder.go:65 Pipeline is started.    {"name": "pipeline", "name": "metrics"}
2023-01-25T18:32:13.221Z        info    service/service.go:97   Starting receivers...
2023-01-25T18:32:13.221Z        info    builder/receivers_builder.go:68 Receiver is starting... {"kind": "receiver", "name": "otlp"}
2023-01-25T18:32:13.222Z        info    otlpreceiver/otlp.go:69 Starting GRPC server on endpoint 0.0.0.0:4317   {"kind": "receiver", "name": "otlp"}
2023-01-25T18:32:13.223Z        info    otlpreceiver/otlp.go:87 Starting HTTP server on endpoint 0.0.0.0:4318   {"kind": "receiver", "name": "otlp"}
2023-01-25T18:32:13.223Z        info    otlpreceiver/otlp.go:147        Setting up a second HTTP listener on legacy endpoint 0.0.0.0:55681      {"kind": "receiver", "name": "otlp"}
2023-01-25T18:32:13.223Z        info    otlpreceiver/otlp.go:87 Starting HTTP server on endpoint 0.0.0.0:55681  {"kind": "receiver", "name": "otlp"}
2023-01-25T18:32:13.223Z        info    builder/receivers_builder.go:73 Receiver started.       {"kind": "receiver", "name": "otlp"}
2023-01-25T18:32:13.223Z        info    service/telemetry.go:95 Setting up own telemetry...
2023-01-25T18:32:13.225Z        info    service/telemetry.go:115        Serving Prometheus metrics      {"address": ":8888", "level": "basic", "service.instance.id": "d4fdbdf3-d28e-4d8b-b462-e113b6684f0f", "service.version": "latest"}
2023-01-25T18:32:13.226Z        info    service/collector.go:229        Starting otelcol-contrib...     {"Version": "0.42.0", "NumCPU": 2}
2023-01-25T18:32:13.226Z        info    service/collector.go:124        Everything is ready. Begin running and processing data.
2023-01-25T18:32:22.609Z        error   exporterhelper/queued_retry.go:149      Exporting failed. Try enabling retry_on_failure config option.  {"kind": "exporter", "name": "googlecloud", "error": "rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Field timeSeries[9].points[0].value had an invalid value: A point has an unrecognized value type.\nerror details: name = Unknown  desc = total_point_count:14  success_point_count:13  errors:{status:{code:3}  point_count:1}"}
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
        go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/queued_retry.go:149
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
        go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/metrics.go:134
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
        go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/queued_retry_inmemory.go:105
go.opentelemetry.io/collector/exporter/exporterhelper/internal.consumerFunc.consume
        go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/internal/bounded_memory_queue.go:99
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func2
        go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/internal/bounded_memory_queue.go:78
2023-01-25T18:32:34.081Z        error   exporterhelper/queued_retry.go:149      Exporting failed. Try enabling retry_on_failure config option.  {"kind": "exporter", "name": "googlecloud", "error": "rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Field timeSeries[1].points[0].value had an invalid value: A point has an unrecognized value type.\nerror details: name = Unknown  desc = total_point_count:14  success_point_count:13  errors:{status:{code:3}  point_count:1}"}
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
        go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/queued_retry.go:149
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
        go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/metrics.go:134
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
        go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/queued_retry_inmemory.go:105
go.opentelemetry.io/collector/exporter/exporterhelper/internal.consumerFunc.consume
        go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/internal/bounded_memory_queue.go:99
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func2
        go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/internal/bounded_memory_queue.go:78
2023-01-25T18:32:44.123Z        error   exporterhelper/queued_retry.go:149      Exporting failed. Try enabling retry_on_failure config option.  {"kind": "exporter", "name": "googlecloud", "error": "rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[13]: custom.googleapis.com/opencensus/tfc-agent.runtime.go.goroutines.count{agent_pool_id:apool-Y6yKKYDPBGGLKnqa,agent_name:zisom-gcp-1,agent_core_version:1.4.0,agent_id:agent-1LiQJKrkdt6nbftr}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[3]: custom.googleapis.com/opencensus/tfc-agent.runtime.go.gc.pause-total.nanoseconds{agent_core_version:1.4.0,agent_id:agent-1LiQJKrkdt6nbftr,agent_pool_id:apool-Y6yKKYDPBGGLKnqa,agent_name:zisom-gcp-1}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[2]: custom.googleapis.com/opencensus/tfc-agent.runtime.go.mem.heap-objects.count{agent_id:agent-1LiQJKrkdt6nbftr,agent_pool_id:apool-Y6yKKYDPBGGLKnqa,agent_name:zisom-gcp-1,agent_core_version:1.4.0}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[6]: custom.googleapis.com/opencensus/tfc-agent.status.idle{agent_pool_id:apool-Y6yKKYDPBGGLKnqa,agent_id:agent-1LiQJKrkdt6nbftr,agent_core_version:1.4.0,agent_name:zisom-gcp-1}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[9]: custom.googleapis.com/opencensus/tfc-agent.runtime.go.mem.heap-idle.bytes{agent_name:zisom-gcp-1,agent_core_version:1.4.0,agent_pool_id:apool-Y6yKKYDPBGGLKnqa,agent_id:agent-1LiQJKrkdt6nbftr}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[1]: custom.googleapis.com/opencensus/tfc-agent.runtime.go.mem.heap-released.bytes{agent_pool_id:apool-Y6yKKYDPBGGLKnqa,agent_id:agent-1LiQJKrkdt6nbftr,agent_name:zisom-gcp-1,agent_core_version:1.4.0}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[7]: custom.googleapis.com/opencensus/tfc-agent.runtime.go.mem.heap-sys.bytes{agent_id:agent-1LiQJKrkdt6nbftr,agent_name:zisom-gcp-1,agent_core_version:1.4.0,agent_pool_id:apool-Y6yKKYDPBGGLKnqa}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[5]: custom.googleapis.com/opencensus/tfc-agent.runtime.go.mem.malloc.count{agent_id:agent-1LiQJKrkdt6nbftr,agent_pool_id:apool-Y6yKKYDPBGGLKnqa,agent_core_version:1.4.0,agent_name:zisom-gcp-1}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[8]: custom.googleapis.com/opencensus/tfc-agent.runtime.uptime.milliseconds{agent_core_version:1.4.0,agent_id:agent-1LiQJKrkdt6nbftr,agent_pool_id:apool-Y6yKKYDPBGGLKnqa,agent_name:zisom-gcp-1}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[12]: custom.googleapis.com/opencensus/tfc-agent.runtime.go.gc.count{agent_id:agent-1LiQJKrkdt6nbftr,agent_pool_id:apool-Y6yKKYDPBGGLKnqa,agent_name:zisom-gcp-1,agent_core_version:1.4.0}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[4]: custom.googleapis.com/opencensus/tfc-agent.runtime.go.mem.heap-inuse.bytes{agent_name:zisom-gcp-1,agent_pool_id:apool-Y6yKKYDPBGGLKnqa,agent_id:agent-1LiQJKrkdt6nbftr,agent_core_version:1.4.0}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[0]: custom.googleapis.com/opencensus/tfc-agent.runtime.go.mem.heap-alloc.bytes{agent_id:agent-1LiQJKrkdt6nbftr,agent_pool_id:apool-Y6yKKYDPBGGLKnqa,agent_name:zisom-gcp-1,agent_core_version:1.4.0}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[10]: custom.googleapis.com/opencensus/tfc-agent.runtime.go.mem.free.count{agent_core_version:1.4.0,agent_id:agent-1LiQJKrkdt6nbftr,agent_pool_id:apool-Y6yKKYDPBGGLKnqa,agent_name:zisom-gcp-1}; Field timeSeries[11].points[0].value had an invalid value: A point has an unrecognized value type.\nerror details: name = Unknown  desc = total_point_count:14  errors:{status:{code:9}  point_count:13}  errors:{status:{code:3}  point_count:1}"}
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
        go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/queued_retry.go:149
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
        go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/metrics.go:134
go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
        go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/queued_retry_inmemory.go:105
go.opentelemetry.io/collector/exporter/exporterhelper/internal.consumerFunc.consume
        go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/internal/bounded_memory_queue.go:99
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func2
        go.opentelemetry.io/collector@v0.42.0/exporter/exporterhelper/internal/bounded_memory_queue.go:78

Additional context

No response

@zisom-hc zisom-hc added bug Something isn't working needs triage New item requiring triage labels Jan 25, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@dashpole
Copy link
Contributor

If you can update to a more recent version of the collector, that would be helpful. v0.70.0 is the latest version, and you are on v0.42.0.

Otherwise, the error means you are writing a metric with the same name and set of labels more than once every 5 seconds: https://cloud.google.com/monitoring/quotas#custom_metrics_quotas.

This often occurs because you are missing labels required to distinguish points. For example, if I am collecting the same metric from 2 kubernetes pods, but don't distinguish them via a label (e.g. the pod name), they would both be written with the same set of labels, and would "collide".

I notice you are writing to the "global" monitored resource. Unless you are collecting metrics from a single entity, that usually indicates that you aren't differentiating between the sources of metrics using the monitored resource. Depending on where you are running, a resource detector may be helpful for getting information about the environment you are running in.

@zisom-hc
Copy link
Author

zisom-hc commented Jan 26, 2023

The quick response is greatly appreciated dashpole; thank you for the insight into what is going on here 🙏

So i implemented a resourcedetection processor into my configuration, and the result changed slightly, where i was not seeing the One or more TimeSeries could not be written: One or more points were written more frequently than the maximum sampling period configured for the metric error as often, but was instead getting an increase of these:

Field timeSeries[110] had an invalid value: Duplicate TimeSeries encountered. Only one point can be written per TimeSeries per request.; Field timeSeries[123] had an invalid value: Duplicate TimeSeries encountered. Only one point can be written per TimeSeries per request.

Config Used to Implement the resourcedetection processor:

receivers:
  otlp:
    protocols:
      grpc:
      http:

exporters:
  googlecloud:
   retry_on_failure:
    enabled: false

processors:
  batch:
   send_batch_max_size: 200
   send_batch_size: 200
  resourcedetection:
    detectors: [gce]
    timeout: 10s
service:
  pipelines:
   traces:
    receivers: [otlp]
    processors: [batch]
    exporters: [googlecloud]
   metrics:
    receivers: [otlp]
    processors: [batch, resourcedetection]
    exporters: [googlecloud]

After reading about the resourcedetection processor some more, i'm taking the main issue is that none of the metadata it obtains from gce, in my particular situation, is unique enough to individualize the ingested metrics.

I then took to playing around with the metricstransform processor to see if i can have a unique label added to each ingested metric, but have not figured out a way to get a randomized value to be generated for a new label i want added to each ingest metric. reading the documentation, and thinking about it some, i'm taking that this is just not possible, at the very least within this version of the processor, is that correct?

If it is correct in that there is not a way to generate a random value for a label one wants to add to each ingested metric through the collector, would you know if there is a way within the instrumentation that a unique identifier of some sort can be generated for each metric? I've dug through the source code trying to uncover this, but am not well versed in Golang, so at this point i have not found that ability myself, but would appreciate a confirmation as to whether that may be possible, if known. i attempted to see if there was a way to configure the Sample Period, but have yet to find that ability as well.

If the issue we're encountering is known to be something that a later version of the OpenTelemetry Collector might better handle, it'd be great if you wouldn't mind providing any links to source code, or release notes that may prove this, so then i may have the data i need to get the internal team responsible for this application to upgrade the instrumentation libraries, which in turn will give us the ability to use later versions of the OpenTelemetry Collector.

Config Used to Implement the metricstransform processor:

receivers:
  otlp:
    protocols:
      grpc:
      http:

exporters:
  googlecloud:
   retry_on_failure:
    enabled: false

processors:
  batch:
   send_batch_max_size: 200
   send_batch_size: 200
  metricstransform/global_metric_labels:
    transforms:
      - include: ^(.*)$
      match_type: regexp
      action: update
      operations:
        - action: add_label
        new_label: env
        new_value: "stage"
service:
  pipelines:
   traces:
    receivers: [otlp]
    processors: [batch]
    exporters: [googlecloud]
   metrics:
    receivers: [otlp]
    processors: [batch, metricstransform/global_metric_labels]
    exporters: [googlecloud]

@dashpole
Copy link
Contributor

reading the documentation, and thinking about it some, i'm taking that this is just not possible, at the very least within this version of the processor, is that correct?

Yes, I don't think that is possible.

Would you mind sharing which GCP platform you are running on?

You just have to make sure:

  1. You can uniquely identify where the metrics are coming from. (Resource Attributes)
  2. Within a single source, each metric is unique. (Metric Attributes)

I'm assuming 2 is already solved, as that would be quite odd... To solve 1, you can add resource attributes to whatever is sending them to the collector as one option. For example, you could use the OTEL_RESOURCE_ATTRIBUTES on each of the applications sending to the collector to set an attribute to make it unique.

However, the googlecloud exporter doesn't attach all resource attributes by default, so you would need to use the metric. resource_filters in our config to add the resource attributes to each metric as a label. However, that config option will require a newer version of our exporter.

@zisom-hc
Copy link
Author

zisom-hc commented Jan 27, 2023

Would you mind sharing which GCP platform you are running on? I'm running a docker container of the collector within a VM instance i installed docker on, through the GCE service, within Google Cloud Platform. If that didn't answer your question let me know.

I wasn't the person who put together the instrumentation for our company's application (Terraform Cloud Agent), but my hunch is that we actually do not have #2 solved, and the attributes of the metrics not being unique enough is the issue, and all the metrics are coming from a single source (A single Terraform Cloud Agent, run as a docker container, within the same VM instance that i have the Collector's docker container running within). Would you happen to know of a way at the instrumentation level that a unique attribute could be created for every metric, or possibly a way to modify this sampling period the error talks about to something greater than 5 seconds, as it desires? If not that's okay. I figure these questions are probably better asked within the instrumentation repo, and at the very least, you helped me deduce what level this problem lays in.

I'm not positive if this is at all related, but i noticed while looking around the web that this google doc mentioned a add_unique_identifier that could be used potentially within the instrumentation otel to resolve this same error, but so far i haven't been able to locate this in the source code, granted, i'm not well versed in Golang.

@dashpole
Copy link
Contributor

Would you happen to know of a way at the instrumentation level that a unique attribute could be created for every metric, or possibly a way to modify this sampling period the error talks about to something greater than 5 seconds, as it desires?

For otel instrumentation, you should be able to add any attribute (including one with a UUID as a value). But that would be done by hand. You should also be able to modify the interval of the periodic reader (see https://opentelemetry.io/docs/reference/specification/metrics/sdk/#periodic-exporting-metricreader).

The add_unique_identifier option could be used if it is written in python, and is recommended if the duplicate metrics are coming from different threads, but it isn't available in other languages.

Another possibility is that the duplicate metrics have different instrumentation scopes. In newer versions of the exporter, we add instrumentation_library and instrumentation_version labels to differentiate these (see instrumentation_library_labels in the README).

@atoulme atoulme removed the needs triage New item requiring triage label Mar 8, 2023
@zisom-hc
Copy link
Author

zisom-hc commented Mar 17, 2023

Hey dashpole as well as any other lurkers of this issue,

Our team successfully upgraded the otel instrumentation libraries in use by our application to 0.36.0 recently, which allowed me to try exporting the same telemetry through version 0.73.0 of the opentelemetry-collector-contrib docker container. As probably expected, i'm encountering the same error. I'll attempt to utilize metic.resource_filters and instrumentation_library_labels to see if those can make the metric metadata unique enough to circumvent the issue, then report back the results. In the meantime, if anyone thinks of another way to force uniqueness to these metrics please let me know.

UPDATE: neither of these two settings changed the result; error still occurred

@zisom-hc
Copy link
Author

Through some more testing we were able to determine that upgrading the instrumentation libraries from 0.19.0 to 0.36.0 so that we could utilize the latest version of the OpenTelemetry Collector, 0.73.0, as well as changing the hard-coded collection period of the metric controller from 2s to 5s has resolved the error from occurring.

Essentially changing the original:

// The interval for pushing metrics to the OpenTelemetry collector. This value
// is also used by the runtime profiler to configure the frequency it runs at.
const metricsInterval = 2 * time.Second

To this:

// The interval for pushing metrics to the OpenTelemetry collector. This value
// is also used by the runtime profiler to configure the frequency it runs at.
const metricsInterval = 5 * time.Second

which determines the configuration of:

controller.WithCollectPeriod(metricsInterval) and go metrics.EmitRuntimeMetrics(runtimeCtx, metricsInterval)

The insight provided by you, @dashpole, was absolutely appreciated, so thank you again for the assistance in helping us understand this issue.

@AkselAllas
Copy link

@dashpole

I am experiencing something very similar:

One or more points were written more frequently than the maximum sampling period configured for the metric.: prometheus_target{cluster:__run__,namespace:,job:otelcol,location:us-central1,instance:127.0.0.1:8888}

My problem here is that I have 2 cloud run instances using otel-collector's gcp resourcedetector, but the Resource label of instance is 127.0.0.1:8888.

I have verified in metrics explorer that I have correct metric label named service_instance_id with correct instance id and resource label named instance as 127.0.0.1:8888.

I read that metric.resource_filter can be used to add from resource label to metric label. I would need to specify metric label as resource label or have gcp resourcedetector set resource label correctly.

I have:

receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'otelcol'
          scrape_interval: 30s
          static_configs:
            - targets: [ '127.0.0.1:8888' ]
          # Sample from https://opentelemetry.io/docs/collector/internal-telemetry/
          metric_relabel_configs:
            - source_labels: [ __name__ ]
              regex: '.*grpc_io.*'
              action: drop
processors:
  batch:
    send_batch_max_size: 200
    send_batch_size: 200
    timeout: 10s
  resourcedetection:
    detectors: [ gcp ]
    timeout: 10s
service:
  pipelines:
    metrics:
      receivers: [ prometheus ]
      processors: [ batch, memory_limiter, resourcedetection ]
      exporters: [ googlemanagedprometheus ]

@dashpole
Copy link
Contributor

@AkselAllas you can delete the service.instance.id resource attribute, which makes the exporter fall back to faas.instance for `instance

@AkselAllas
Copy link

AkselAllas commented Sep 13, 2024

@dashpole I can confirm.

Adding the following processor to the end of my processors fixed my case 🙇 ❤️

  resource:
    attributes:
    - key: service.instance.id
      action: delete

Maybe makes sense to write something about this into GCP exporter readme examples?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exporter/googlecloud
Projects
None yet
Development

No branches or pull requests

4 participants